2019-05-20 17:08:01 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2018-10-19 23:57:59 +00:00
|
|
|
/* AFS fileserver probing
|
|
|
|
*
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
* Copyright (C) 2018, 2020 Red Hat, Inc. All Rights Reserved.
|
2018-10-19 23:57:59 +00:00
|
|
|
* Written by David Howells (dhowells@redhat.com)
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/sched.h>
|
|
|
|
#include <linux/slab.h>
|
|
|
|
#include "afs_fs.h"
|
|
|
|
#include "internal.h"
|
afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server
AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.
This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.
Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.
This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.
Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.
This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
This can be observed in the server's FileLog with something like the
following appearing:
Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5
Note the file position of 18446744071815340032. This is the requested file
position sign-extended.
Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
2021-09-09 23:01:52 +00:00
|
|
|
#include "protocol_afs.h"
|
2018-10-19 23:57:59 +00:00
|
|
|
#include "protocol_yfs.h"
|
|
|
|
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
static unsigned int afs_fs_probe_fast_poll_interval = 30 * HZ;
|
|
|
|
static unsigned int afs_fs_probe_slow_poll_interval = 5 * 60 * HZ;
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
struct afs_endpoint_state *afs_get_endpoint_state(struct afs_endpoint_state *estate,
|
|
|
|
enum afs_estate_trace where)
|
|
|
|
{
|
|
|
|
if (estate) {
|
|
|
|
int r;
|
|
|
|
|
|
|
|
__refcount_inc(&estate->ref, &r);
|
|
|
|
trace_afs_estate(estate->server_id, estate->probe_seq, r, where);
|
|
|
|
}
|
|
|
|
return estate;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void afs_endpoint_state_rcu(struct rcu_head *rcu)
|
|
|
|
{
|
|
|
|
struct afs_endpoint_state *estate = container_of(rcu, struct afs_endpoint_state, rcu);
|
|
|
|
|
|
|
|
trace_afs_estate(estate->server_id, estate->probe_seq, refcount_read(&estate->ref),
|
|
|
|
afs_estate_trace_free);
|
|
|
|
afs_put_addrlist(estate->addresses, afs_alist_trace_put_estate);
|
|
|
|
kfree(estate);
|
|
|
|
}
|
|
|
|
|
|
|
|
void afs_put_endpoint_state(struct afs_endpoint_state *estate, enum afs_estate_trace where)
|
|
|
|
{
|
|
|
|
if (estate) {
|
|
|
|
unsigned int server_id = estate->server_id, probe_seq = estate->probe_seq;
|
|
|
|
bool dead;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
dead = __refcount_dec_and_test(&estate->ref, &r);
|
|
|
|
trace_afs_estate(server_id, probe_seq, r, where);
|
|
|
|
if (dead)
|
|
|
|
call_rcu(&estate->rcu, afs_endpoint_state_rcu);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
/*
|
|
|
|
* Start the probe polling timer. We have to supply it with an inc on the
|
|
|
|
* outstanding server count.
|
|
|
|
*/
|
|
|
|
static void afs_schedule_fs_probe(struct afs_net *net,
|
|
|
|
struct afs_server *server, bool fast)
|
|
|
|
{
|
|
|
|
unsigned long atj;
|
|
|
|
|
|
|
|
if (!net->live)
|
|
|
|
return;
|
|
|
|
|
|
|
|
atj = server->probed_at;
|
|
|
|
atj += fast ? afs_fs_probe_fast_poll_interval : afs_fs_probe_slow_poll_interval;
|
|
|
|
|
|
|
|
afs_inc_servers_outstanding(net);
|
|
|
|
if (timer_reduce(&net->fs_probe_timer, atj))
|
|
|
|
afs_dec_servers_outstanding(net);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Handle the completion of a set of probes.
|
|
|
|
*/
|
2023-10-31 16:30:37 +00:00
|
|
|
static void afs_finished_fs_probe(struct afs_net *net, struct afs_server *server,
|
|
|
|
struct afs_endpoint_state *estate)
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
{
|
2023-10-31 16:30:37 +00:00
|
|
|
bool responded = test_bit(AFS_ESTATE_RESPONDED, &estate->flags);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
|
|
|
|
write_seqlock(&net->fs_lock);
|
2020-05-02 12:39:57 +00:00
|
|
|
if (responded) {
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
list_add_tail(&server->probe_link, &net->fs_probe_slow);
|
2020-05-02 12:39:57 +00:00
|
|
|
} else {
|
|
|
|
server->rtt = UINT_MAX;
|
|
|
|
clear_bit(AFS_SERVER_FL_RESPONDING, &server->flags);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
list_add_tail(&server->probe_link, &net->fs_probe_fast);
|
2020-05-02 12:39:57 +00:00
|
|
|
}
|
2023-10-31 16:30:37 +00:00
|
|
|
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
write_sequnlock(&net->fs_lock);
|
|
|
|
|
|
|
|
afs_schedule_fs_probe(net, server, !responded);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Handle the completion of a probe.
|
|
|
|
*/
|
2023-10-31 16:30:37 +00:00
|
|
|
static void afs_done_one_fs_probe(struct afs_net *net, struct afs_server *server,
|
|
|
|
struct afs_endpoint_state *estate)
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
{
|
|
|
|
_enter("");
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
if (atomic_dec_and_test(&estate->nr_probing))
|
|
|
|
afs_finished_fs_probe(net, server, estate);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
|
|
|
|
wake_up_all(&server->probe_wq);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Handle inability to send a probe due to ENOMEM when trying to allocate a
|
|
|
|
* call struct.
|
|
|
|
*/
|
|
|
|
static void afs_fs_probe_not_done(struct afs_net *net,
|
|
|
|
struct afs_server *server,
|
2023-10-31 16:30:37 +00:00
|
|
|
struct afs_endpoint_state *estate,
|
2023-10-20 15:13:03 +00:00
|
|
|
int index)
|
2018-10-19 23:57:59 +00:00
|
|
|
{
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
_enter("");
|
2018-10-19 23:57:59 +00:00
|
|
|
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
trace_afs_io_error(0, -ENOMEM, afs_io_error_fs_probe_fail);
|
|
|
|
spin_lock(&server->probe_lock);
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(AFS_ESTATE_LOCAL_FAILURE, &estate->flags);
|
2023-10-31 16:30:37 +00:00
|
|
|
if (estate->error == 0)
|
|
|
|
estate->error = -ENOMEM;
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(index, &estate->failed_set);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
|
|
|
|
spin_unlock(&server->probe_lock);
|
2023-10-31 16:30:37 +00:00
|
|
|
return afs_done_one_fs_probe(net, server, estate);
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Process the result of probing a fileserver. This is called after successful
|
|
|
|
* or failed delivery of an FS.GetCapabilities operation.
|
|
|
|
*/
|
|
|
|
void afs_fileserver_probe_result(struct afs_call *call)
|
|
|
|
{
|
2023-10-31 16:30:37 +00:00
|
|
|
struct afs_endpoint_state *estate = call->probe;
|
|
|
|
struct afs_addr_list *alist = estate->addresses;
|
2023-10-20 15:13:03 +00:00
|
|
|
struct afs_address *addr = &alist->addrs[call->probe_index];
|
2019-05-09 21:22:50 +00:00
|
|
|
struct afs_server *server = call->server;
|
2023-10-20 15:13:03 +00:00
|
|
|
unsigned int index = call->probe_index;
|
2023-10-30 11:43:24 +00:00
|
|
|
unsigned int rtt_us = -1, cap0;
|
2018-10-19 23:57:59 +00:00
|
|
|
int ret = call->error;
|
|
|
|
|
|
|
|
_enter("%pU,%u", &server->uuid, index);
|
|
|
|
|
2023-10-27 09:45:56 +00:00
|
|
|
WRITE_ONCE(addr->last_error, ret);
|
|
|
|
|
2018-10-19 23:57:59 +00:00
|
|
|
spin_lock(&server->probe_lock);
|
|
|
|
|
|
|
|
switch (ret) {
|
|
|
|
case 0:
|
2023-10-31 16:30:37 +00:00
|
|
|
estate->error = 0;
|
2018-10-19 23:57:59 +00:00
|
|
|
goto responded;
|
|
|
|
case -ECONNABORTED:
|
2023-10-31 16:30:37 +00:00
|
|
|
if (!test_bit(AFS_ESTATE_RESPONDED, &estate->flags)) {
|
2023-10-31 16:30:37 +00:00
|
|
|
estate->abort_code = call->abort_code;
|
|
|
|
estate->error = ret;
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
goto responded;
|
|
|
|
case -ENOMEM:
|
|
|
|
case -ENONET:
|
2023-10-31 16:30:37 +00:00
|
|
|
clear_bit(index, &estate->responsive_set);
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(AFS_ESTATE_LOCAL_FAILURE, &estate->flags);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
trace_afs_io_error(call->debug_id, ret, afs_io_error_fs_probe_fail);
|
2018-10-19 23:57:59 +00:00
|
|
|
goto out;
|
|
|
|
case -ECONNRESET: /* Responded, but call expired. */
|
2018-11-13 23:20:28 +00:00
|
|
|
case -ERFKILL:
|
|
|
|
case -EADDRNOTAVAIL:
|
2018-10-19 23:57:59 +00:00
|
|
|
case -ENETUNREACH:
|
|
|
|
case -EHOSTUNREACH:
|
2018-11-13 23:20:28 +00:00
|
|
|
case -EHOSTDOWN:
|
2018-10-19 23:57:59 +00:00
|
|
|
case -ECONNREFUSED:
|
|
|
|
case -ETIMEDOUT:
|
|
|
|
case -ETIME:
|
|
|
|
default:
|
2023-10-31 16:30:37 +00:00
|
|
|
clear_bit(index, &estate->responsive_set);
|
|
|
|
set_bit(index, &estate->failed_set);
|
2023-10-31 16:30:37 +00:00
|
|
|
if (!test_bit(AFS_ESTATE_RESPONDED, &estate->flags) &&
|
2023-10-31 16:30:37 +00:00
|
|
|
(estate->error == 0 ||
|
|
|
|
estate->error == -ETIMEDOUT ||
|
|
|
|
estate->error == -ETIME))
|
|
|
|
estate->error = ret;
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
trace_afs_io_error(call->debug_id, ret, afs_io_error_fs_probe_fail);
|
2018-10-19 23:57:59 +00:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
responded:
|
2023-10-31 16:30:37 +00:00
|
|
|
clear_bit(index, &estate->failed_set);
|
2018-10-19 23:57:59 +00:00
|
|
|
|
|
|
|
if (call->service_id == YFS_FS_SERVICE) {
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(AFS_ESTATE_IS_YFS, &estate->flags);
|
2018-10-19 23:57:59 +00:00
|
|
|
set_bit(AFS_SERVER_FL_IS_YFS, &server->flags);
|
2023-10-26 17:13:13 +00:00
|
|
|
server->service_id = call->service_id;
|
2018-10-19 23:57:59 +00:00
|
|
|
} else {
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(AFS_ESTATE_NOT_YFS, &estate->flags);
|
|
|
|
if (!test_bit(AFS_ESTATE_IS_YFS, &estate->flags)) {
|
|
|
|
clear_bit(AFS_SERVER_FL_IS_YFS, &server->flags);
|
2023-10-26 17:13:13 +00:00
|
|
|
server->service_id = call->service_id;
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server
AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.
This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.
Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.
This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.
Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.
This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
This can be observed in the server's FileLog with something like the
following appearing:
Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5
Note the file position of 18446744071815340032. This is the requested file
position sign-extended.
Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
2021-09-09 23:01:52 +00:00
|
|
|
cap0 = ntohl(call->tmp);
|
|
|
|
if (cap0 & AFS3_VICED_CAPABILITY_64BITFILES)
|
|
|
|
set_bit(AFS_SERVER_FL_HAS_FS64, &server->flags);
|
|
|
|
else
|
|
|
|
clear_bit(AFS_SERVER_FL_HAS_FS64, &server->flags);
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
|
rxrpc, afs: Allow afs to pin rxrpc_peer objects
Change rxrpc's API such that:
(1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an
rxrpc_peer record for a remote address and a corresponding function,
rxrpc_kernel_put_peer(), is provided to dispose of it again.
(2) When setting up a call, the rxrpc_peer object used during a call is
now passed in rather than being set up by rxrpc_connect_call(). For
afs, this meenat passing it to rxrpc_kernel_begin_call() rather than
the full address (the service ID then has to be passed in as a
separate parameter).
(3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can
get a pointer to the transport address for display purposed, and
another, rxrpc_kernel_remote_srx(), to gain a pointer to the full
rxrpc address.
(4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(),
is then altered to take a peer. This now returns the RTT or -1 if
there are insufficient samples.
(5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer().
(6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a
peer the caller already has.
This allows the afs filesystem to pin the rxrpc_peer records that it is
using, allowing faster lookups and pointer comparisons rather than
comparing sockaddr_rxrpc contents. It also makes it easier to get hold of
the RTT. The following changes are made to afs:
(1) The addr_list struct's addrs[] elements now hold a peer struct pointer
and a service ID rather than a sockaddr_rxrpc.
(2) When displaying the transport address, rxrpc_kernel_remote_addr() is
used.
(3) The port arg is removed from afs_alloc_addrlist() since it's always
overridden.
(4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may
now return an error that must be handled.
(5) afs_find_server() now takes a peer pointer to specify the address.
(6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{}
now do peer pointer comparison rather than address comparison.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-10-19 11:55:11 +00:00
|
|
|
rtt_us = rxrpc_kernel_get_srtt(addr->peer);
|
2023-10-31 16:30:37 +00:00
|
|
|
if (rtt_us < estate->rtt) {
|
|
|
|
estate->rtt = rtt_us;
|
2020-05-02 12:39:57 +00:00
|
|
|
server->rtt = rtt_us;
|
2018-10-19 23:57:59 +00:00
|
|
|
alist->preferred = index;
|
|
|
|
}
|
|
|
|
|
|
|
|
smp_wmb(); /* Set rtt before responded. */
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(AFS_ESTATE_RESPONDED, &estate->flags);
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(index, &estate->responsive_set);
|
2020-05-02 12:39:57 +00:00
|
|
|
set_bit(AFS_SERVER_FL_RESPONDING, &server->flags);
|
2018-10-19 23:57:59 +00:00
|
|
|
out:
|
|
|
|
spin_unlock(&server->probe_lock);
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
trace_afs_fs_probe(server, false, estate, index, call->error, call->abort_code, rtt_us);
|
|
|
|
_debug("probe[%x] %pU [%u] %pISpc rtt=%d ret=%d",
|
|
|
|
estate->probe_seq, &server->uuid, index,
|
|
|
|
rxrpc_kernel_remote_addr(alist->addrs[index].peer),
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
rtt_us, ret);
|
2018-10-19 23:57:59 +00:00
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
return afs_done_one_fs_probe(call->net, server, estate);
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2023-10-31 16:30:37 +00:00
|
|
|
* Probe all of a fileserver's addresses to find out the best route and to
|
|
|
|
* query its capabilities.
|
2018-10-19 23:57:59 +00:00
|
|
|
*/
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
void afs_fs_probe_fileserver(struct afs_net *net, struct afs_server *server,
|
2023-10-31 16:30:37 +00:00
|
|
|
struct afs_addr_list *new_alist, struct key *key)
|
2018-10-19 23:57:59 +00:00
|
|
|
{
|
2023-10-31 16:30:37 +00:00
|
|
|
struct afs_endpoint_state *estate, *old;
|
2023-10-20 15:13:03 +00:00
|
|
|
struct afs_addr_list *alist;
|
2023-10-31 16:30:37 +00:00
|
|
|
unsigned long unprobed;
|
2018-10-19 23:57:59 +00:00
|
|
|
|
|
|
|
_enter("%pU", &server->uuid);
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
estate = kzalloc(sizeof(*estate), GFP_KERNEL);
|
|
|
|
if (!estate)
|
|
|
|
return;
|
|
|
|
|
|
|
|
refcount_set(&estate->ref, 1);
|
|
|
|
estate->server_id = server->debug_id;
|
|
|
|
estate->rtt = UINT_MAX;
|
|
|
|
|
|
|
|
write_lock(&server->fs_lock);
|
|
|
|
|
|
|
|
old = rcu_dereference_protected(server->endpoint_state,
|
|
|
|
lockdep_is_held(&server->fs_lock));
|
|
|
|
estate->responsive_set = old->responsive_set;
|
|
|
|
estate->addresses = afs_get_addrlist(new_alist ?: old->addresses,
|
|
|
|
afs_alist_trace_get_estate);
|
|
|
|
alist = estate->addresses;
|
|
|
|
estate->probe_seq = ++server->probe_counter;
|
|
|
|
atomic_set(&estate->nr_probing, alist->nr_addrs);
|
|
|
|
|
|
|
|
rcu_assign_pointer(server->endpoint_state, estate);
|
2023-10-31 16:30:37 +00:00
|
|
|
set_bit(AFS_ESTATE_SUPERSEDED, &old->flags);
|
2023-10-31 16:30:37 +00:00
|
|
|
write_unlock(&server->fs_lock);
|
|
|
|
|
|
|
|
trace_afs_estate(estate->server_id, estate->probe_seq, refcount_read(&estate->ref),
|
|
|
|
afs_estate_trace_alloc_probe);
|
2018-10-19 23:57:59 +00:00
|
|
|
|
2023-10-30 11:43:24 +00:00
|
|
|
afs_get_address_preferences(net, alist);
|
|
|
|
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
server->probed_at = jiffies;
|
2023-10-31 16:30:37 +00:00
|
|
|
unprobed = (1UL << alist->nr_addrs) - 1;
|
|
|
|
while (unprobed) {
|
|
|
|
unsigned int index = 0, i;
|
|
|
|
int best_prio = -1;
|
|
|
|
|
|
|
|
for (i = 0; i < alist->nr_addrs; i++) {
|
|
|
|
if (test_bit(i, &unprobed) &&
|
|
|
|
alist->addrs[i].prio > best_prio) {
|
|
|
|
index = i;
|
|
|
|
best_prio = alist->addrs[i].prio;
|
2023-10-30 11:43:24 +00:00
|
|
|
}
|
|
|
|
}
|
2023-10-31 16:30:37 +00:00
|
|
|
__clear_bit(index, &unprobed);
|
|
|
|
|
|
|
|
trace_afs_fs_probe(server, true, estate, index, 0, 0, 0);
|
|
|
|
if (!afs_fs_get_capabilities(net, server, estate, index, key))
|
|
|
|
afs_fs_probe_not_done(net, server, estate, index);
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
afs_put_endpoint_state(old, afs_estate_trace_put_probe);
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2023-10-18 08:24:01 +00:00
|
|
|
* Wait for the first as-yet untried fileserver to respond, for the probe state
|
|
|
|
* to be superseded or for all probes to finish.
|
2018-10-19 23:57:59 +00:00
|
|
|
*/
|
2023-10-18 08:24:01 +00:00
|
|
|
int afs_wait_for_fs_probes(struct afs_operation *op, struct afs_server_state *states, bool intr)
|
2018-10-19 23:57:59 +00:00
|
|
|
{
|
2023-10-31 16:30:37 +00:00
|
|
|
struct afs_endpoint_state *estate;
|
2023-10-18 08:24:01 +00:00
|
|
|
struct afs_server_list *slist = op->server_list;
|
|
|
|
bool still_probing = true;
|
|
|
|
int ret = 0, i;
|
2018-10-19 23:57:59 +00:00
|
|
|
|
2023-10-18 08:24:01 +00:00
|
|
|
_enter("%u", slist->nr_servers);
|
2018-10-19 23:57:59 +00:00
|
|
|
|
|
|
|
for (i = 0; i < slist->nr_servers; i++) {
|
2023-10-18 08:24:01 +00:00
|
|
|
estate = states[i].endpoint_state;
|
|
|
|
if (test_bit(AFS_ESTATE_SUPERSEDED, &estate->flags))
|
|
|
|
return 2;
|
|
|
|
if (atomic_read(&estate->nr_probing))
|
|
|
|
still_probing = true;
|
|
|
|
if (estate->responsive_set & states[i].untried_addrs)
|
|
|
|
return 1;
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
2023-10-18 08:24:01 +00:00
|
|
|
if (!still_probing)
|
2018-10-19 23:57:59 +00:00
|
|
|
return 0;
|
|
|
|
|
2023-10-18 08:24:01 +00:00
|
|
|
for (i = 0; i < slist->nr_servers; i++)
|
|
|
|
add_wait_queue(&slist->servers[i].server->probe_wq, &states[i].probe_waiter);
|
2018-10-19 23:57:59 +00:00
|
|
|
|
|
|
|
for (;;) {
|
2023-10-18 08:24:01 +00:00
|
|
|
still_probing = false;
|
2018-10-19 23:57:59 +00:00
|
|
|
|
2023-10-18 08:24:01 +00:00
|
|
|
set_current_state(intr ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE);
|
2018-10-19 23:57:59 +00:00
|
|
|
for (i = 0; i < slist->nr_servers; i++) {
|
2023-10-18 08:24:01 +00:00
|
|
|
estate = states[i].endpoint_state;
|
|
|
|
if (test_bit(AFS_ESTATE_SUPERSEDED, &estate->flags)) {
|
|
|
|
ret = 2;
|
|
|
|
goto stop;
|
|
|
|
}
|
|
|
|
if (atomic_read(&estate->nr_probing))
|
|
|
|
still_probing = true;
|
|
|
|
if (estate->responsive_set & states[i].untried_addrs) {
|
|
|
|
ret = 1;
|
|
|
|
goto stop;
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-01-03 23:28:58 +00:00
|
|
|
if (!still_probing || signal_pending(current))
|
2018-10-19 23:57:59 +00:00
|
|
|
goto stop;
|
|
|
|
schedule();
|
|
|
|
}
|
|
|
|
|
|
|
|
stop:
|
|
|
|
set_current_state(TASK_RUNNING);
|
|
|
|
|
2023-10-18 08:24:01 +00:00
|
|
|
for (i = 0; i < slist->nr_servers; i++)
|
|
|
|
remove_wait_queue(&slist->servers[i].server->probe_wq, &states[i].probe_waiter);
|
2018-10-19 23:57:59 +00:00
|
|
|
|
2023-10-18 08:24:01 +00:00
|
|
|
if (!ret && signal_pending(current))
|
|
|
|
ret = -ERESTARTSYS;
|
|
|
|
return ret;
|
2018-10-19 23:57:59 +00:00
|
|
|
}
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Probe timer. We have an increment on fs_outstanding that we need to pass
|
|
|
|
* along to the work item.
|
|
|
|
*/
|
|
|
|
void afs_fs_probe_timer(struct timer_list *timer)
|
|
|
|
{
|
|
|
|
struct afs_net *net = container_of(timer, struct afs_net, fs_probe_timer);
|
|
|
|
|
2020-06-19 22:39:36 +00:00
|
|
|
if (!net->live || !queue_work(afs_wq, &net->fs_prober))
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
afs_dec_servers_outstanding(net);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Dispatch a probe to a server.
|
|
|
|
*/
|
2023-10-31 16:30:37 +00:00
|
|
|
static void afs_dispatch_fs_probe(struct afs_net *net, struct afs_server *server)
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
__releases(&net->fs_lock)
|
|
|
|
{
|
|
|
|
struct key *key = NULL;
|
|
|
|
|
|
|
|
/* We remove it from the queues here - it will be added back to
|
|
|
|
* one of the queues on the completion of the probe.
|
|
|
|
*/
|
|
|
|
list_del_init(&server->probe_link);
|
|
|
|
|
|
|
|
afs_get_server(server, afs_server_trace_get_probe);
|
|
|
|
write_sequnlock(&net->fs_lock);
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
afs_fs_probe_fileserver(net, server, NULL, key);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
afs_put_server(net, server, afs_server_trace_put_probe);
|
|
|
|
}
|
|
|
|
|
2020-04-21 23:02:46 +00:00
|
|
|
/*
|
|
|
|
* Probe a server immediately without waiting for its due time to come
|
|
|
|
* round. This is used when all of the addresses have been tried.
|
|
|
|
*/
|
|
|
|
void afs_probe_fileserver(struct afs_net *net, struct afs_server *server)
|
|
|
|
{
|
|
|
|
write_seqlock(&net->fs_lock);
|
|
|
|
if (!list_empty(&server->probe_link))
|
2023-10-31 16:30:37 +00:00
|
|
|
return afs_dispatch_fs_probe(net, server);
|
2020-04-21 23:02:46 +00:00
|
|
|
write_sequnlock(&net->fs_lock);
|
|
|
|
}
|
|
|
|
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
/*
|
|
|
|
* Probe dispatcher to regularly dispatch probes to keep NAT alive.
|
|
|
|
*/
|
|
|
|
void afs_fs_probe_dispatcher(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct afs_net *net = container_of(work, struct afs_net, fs_prober);
|
|
|
|
struct afs_server *fast, *slow, *server;
|
|
|
|
unsigned long nowj, timer_at, poll_at;
|
|
|
|
bool first_pass = true, set_timer = false;
|
|
|
|
|
afs: Fix lost servers_outstanding count
The afs_fs_probe_dispatcher() work function is passed a count on
net->servers_outstanding when it is scheduled (which may come via its
timer). This is passed back to the work_item, passed to the timer or
dropped at the end of the dispatcher function.
But, at the top of the dispatcher function, there are two checks which
skip the rest of the function: if the network namespace is being destroyed
or if there are no fileservers to probe. These two return paths, however,
do not drop the count passed to the dispatcher, and so, sometimes, the
destruction of a network namespace, such as induced by rmmod of the kafs
module, may get stuck in afs_purge_servers(), waiting for
net->servers_outstanding to become zero.
Fix this by adding the missing decrements in afs_fs_probe_dispatcher().
Fixes: f6cbb368bcb0 ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/167164544917.2072364.3759519569649459359.stgit@warthog.procyon.org.uk/
2022-12-21 14:30:48 +00:00
|
|
|
if (!net->live) {
|
|
|
|
afs_dec_servers_outstanding(net);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
return;
|
afs: Fix lost servers_outstanding count
The afs_fs_probe_dispatcher() work function is passed a count on
net->servers_outstanding when it is scheduled (which may come via its
timer). This is passed back to the work_item, passed to the timer or
dropped at the end of the dispatcher function.
But, at the top of the dispatcher function, there are two checks which
skip the rest of the function: if the network namespace is being destroyed
or if there are no fileservers to probe. These two return paths, however,
do not drop the count passed to the dispatcher, and so, sometimes, the
destruction of a network namespace, such as induced by rmmod of the kafs
module, may get stuck in afs_purge_servers(), waiting for
net->servers_outstanding to become zero.
Fix this by adding the missing decrements in afs_fs_probe_dispatcher().
Fixes: f6cbb368bcb0 ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/167164544917.2072364.3759519569649459359.stgit@warthog.procyon.org.uk/
2022-12-21 14:30:48 +00:00
|
|
|
}
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
|
|
|
|
_enter("");
|
|
|
|
|
|
|
|
if (list_empty(&net->fs_probe_fast) && list_empty(&net->fs_probe_slow)) {
|
afs: Fix lost servers_outstanding count
The afs_fs_probe_dispatcher() work function is passed a count on
net->servers_outstanding when it is scheduled (which may come via its
timer). This is passed back to the work_item, passed to the timer or
dropped at the end of the dispatcher function.
But, at the top of the dispatcher function, there are two checks which
skip the rest of the function: if the network namespace is being destroyed
or if there are no fileservers to probe. These two return paths, however,
do not drop the count passed to the dispatcher, and so, sometimes, the
destruction of a network namespace, such as induced by rmmod of the kafs
module, may get stuck in afs_purge_servers(), waiting for
net->servers_outstanding to become zero.
Fix this by adding the missing decrements in afs_fs_probe_dispatcher().
Fixes: f6cbb368bcb0 ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/167164544917.2072364.3759519569649459359.stgit@warthog.procyon.org.uk/
2022-12-21 14:30:48 +00:00
|
|
|
afs_dec_servers_outstanding(net);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
_leave(" [none]");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
again:
|
|
|
|
write_seqlock(&net->fs_lock);
|
|
|
|
|
|
|
|
fast = slow = server = NULL;
|
|
|
|
nowj = jiffies;
|
|
|
|
timer_at = nowj + MAX_JIFFY_OFFSET;
|
|
|
|
|
|
|
|
if (!list_empty(&net->fs_probe_fast)) {
|
|
|
|
fast = list_first_entry(&net->fs_probe_fast, struct afs_server, probe_link);
|
|
|
|
poll_at = fast->probed_at + afs_fs_probe_fast_poll_interval;
|
|
|
|
if (time_before(nowj, poll_at)) {
|
|
|
|
timer_at = poll_at;
|
|
|
|
set_timer = true;
|
|
|
|
fast = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!list_empty(&net->fs_probe_slow)) {
|
|
|
|
slow = list_first_entry(&net->fs_probe_slow, struct afs_server, probe_link);
|
|
|
|
poll_at = slow->probed_at + afs_fs_probe_slow_poll_interval;
|
|
|
|
if (time_before(nowj, poll_at)) {
|
|
|
|
if (time_before(poll_at, timer_at))
|
|
|
|
timer_at = poll_at;
|
|
|
|
set_timer = true;
|
|
|
|
slow = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
server = fast ?: slow;
|
|
|
|
if (server)
|
|
|
|
_debug("probe %pU", &server->uuid);
|
|
|
|
|
|
|
|
if (server && (first_pass || !need_resched())) {
|
2023-10-31 16:30:37 +00:00
|
|
|
afs_dispatch_fs_probe(net, server);
|
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 14:10:00 +00:00
|
|
|
first_pass = false;
|
|
|
|
goto again;
|
|
|
|
}
|
|
|
|
|
|
|
|
write_sequnlock(&net->fs_lock);
|
|
|
|
|
|
|
|
if (server) {
|
|
|
|
if (!queue_work(afs_wq, &net->fs_prober))
|
|
|
|
afs_dec_servers_outstanding(net);
|
|
|
|
_leave(" [requeue]");
|
|
|
|
} else if (set_timer) {
|
|
|
|
if (timer_reduce(&net->fs_probe_timer, timer_at))
|
|
|
|
afs_dec_servers_outstanding(net);
|
|
|
|
_leave(" [timer]");
|
|
|
|
} else {
|
|
|
|
afs_dec_servers_outstanding(net);
|
|
|
|
_leave(" [quiesce]");
|
|
|
|
}
|
|
|
|
}
|
2020-04-21 23:02:46 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Wait for a probe on a particular fileserver to complete for 2s.
|
|
|
|
*/
|
2023-10-31 16:30:37 +00:00
|
|
|
int afs_wait_for_one_fs_probe(struct afs_server *server, struct afs_endpoint_state *estate,
|
2023-10-18 08:24:01 +00:00
|
|
|
unsigned long exclude, bool is_intr)
|
2020-04-21 23:02:46 +00:00
|
|
|
{
|
|
|
|
struct wait_queue_entry wait;
|
|
|
|
unsigned long timo = 2 * HZ;
|
|
|
|
|
2023-10-31 16:30:37 +00:00
|
|
|
if (atomic_read(&estate->nr_probing) == 0)
|
2020-04-21 23:02:46 +00:00
|
|
|
goto dont_wait;
|
|
|
|
|
|
|
|
init_wait_entry(&wait, 0);
|
|
|
|
for (;;) {
|
|
|
|
prepare_to_wait_event(&server->probe_wq, &wait,
|
|
|
|
is_intr ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE);
|
|
|
|
if (timo == 0 ||
|
2023-10-18 08:24:01 +00:00
|
|
|
test_bit(AFS_ESTATE_SUPERSEDED, &estate->flags) ||
|
|
|
|
(estate->responsive_set & ~exclude) ||
|
2023-10-31 16:30:37 +00:00
|
|
|
atomic_read(&estate->nr_probing) == 0 ||
|
2020-04-21 23:02:46 +00:00
|
|
|
(is_intr && signal_pending(current)))
|
|
|
|
break;
|
|
|
|
timo = schedule_timeout(timo);
|
|
|
|
}
|
|
|
|
|
|
|
|
finish_wait(&server->probe_wq, &wait);
|
|
|
|
|
|
|
|
dont_wait:
|
2023-10-18 08:24:01 +00:00
|
|
|
if (test_bit(AFS_ESTATE_SUPERSEDED, &estate->flags))
|
2020-04-21 23:02:46 +00:00
|
|
|
return 0;
|
2024-09-23 15:07:49 +00:00
|
|
|
if (estate->responsive_set & ~exclude)
|
|
|
|
return 1;
|
2020-04-21 23:02:46 +00:00
|
|
|
if (is_intr && signal_pending(current))
|
|
|
|
return -ERESTARTSYS;
|
|
|
|
if (timo == 0)
|
|
|
|
return -ETIME;
|
|
|
|
return -EDESTADDRREQ;
|
|
|
|
}
|
2020-06-19 22:39:36 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Clean up the probing when the namespace is killed off.
|
|
|
|
*/
|
|
|
|
void afs_fs_probe_cleanup(struct afs_net *net)
|
|
|
|
{
|
|
|
|
if (del_timer_sync(&net->fs_probe_timer))
|
|
|
|
afs_dec_servers_outstanding(net);
|
|
|
|
}
|