To avoid the warning:
warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reducing the number of network roundtrips improves the performance
of query xattrs
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Technically 3.02 is not the dialect name although that is more familiar to
many, so we should also accept the official dialect name (3.0.2 vs. 3.02)
in vers=
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is not actually a bug but as Coverity points out we shouldn't
be doing an "|=" on a value which hasn't been set (although technically
it was memset to zero so isn't a bug) and so might as well change
"|=" to "=" in this line
Detected by CoverityScan, CID#728535 ("Unitialized scalar variable")
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
As Coverity points out le16_to_cpu(midEntry->Command) can not be
less than zero.
Detected by CoverityScan, CID#1438650 ("Macro compares unsigned to 0")
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Improve performance by reducing number of network round trips
for set xattr.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Trivial fix to clean up indentation, replace spaces with tab
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull vfs fixes from Al Viro:
"A couple of fixes - no common topic ;-)"
[ The aio spectre patch also came in from Jens, so now we have that
doubly fixed .. ]
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
aio: fix spectre gadget in lookup_ioctx
This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd.
commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.")
enabled mknod() in user namespaces for userns root if CAP_MKNOD is
available. However, these device nodes are useless since any filesystem
mounted from a non-initial user namespace will set the SB_I_NODEV flag on
the filesystem. Now, when a device node s created in a non-initial user
namespace a call to open() on said device node will fail due to:
bool may_open_dev(const struct path *path)
{
return !(path->mnt->mnt_flags & MNT_NODEV) &&
!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}
The problem with this is that as of the aforementioned commit mknod()
creates partially functional device nodes in non-initial user namespaces.
In particular, it has the consequence that as of the aforementioned commit
open() will be more privileged with respect to device nodes than mknod().
Before it was the other way around. Specifically, if mknod() succeeded
then it was transparent for any userspace application that a fatal error
must have occured when open() failed.
All of this breaks multiple userspace workloads and a widespread assumption
about how to handle mknod(). Basically, all container runtimes and systemd
live by the slogan "ask for forgiveness not permission" when running user
namespace workloads. For mknod() the assumption is that if the syscall
succeeds the device nodes are useable irrespective of whether it succeeds
in a non-initial user namespace or not. This logic was chosen explicitly
to allow for the glorious day when mknod() will actually be able to create
fully functional device nodes in user namespaces.
A specific problem people are already running into when running 4.18 rc
kernels are failing systemd services. For any distro that is run in a
container systemd services started with the PrivateDevices= property set
will fail to start since the device nodes in question cannot be
opened (cf. the arguments in [1]).
Full disclosure, Seth made the very sound argument that it is already
possible to end up with partially functional device nodes. Any filesystem
mounted with MS_NODEV set will allow mknod() to succeed but will not allow
open() to succeed. The difference to the case here is that the MS_NODEV
case is transparent to userspace since it is an explicitly set mount option
while the SB_I_NODEV case is an implicit property enforced by the kernel
and hence opaque to userspace.
[1]: https://github.com/systemd/systemd/pull/9483
Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At mount time, we allocate m_rsum_cache with the number of realtime
bitmap blocks. However, xfs_growfs_rt() can increase the number of
realtime bitmap blocks. Using the cache after this happens may access
out of the bounds of the cache. Fix it by reallocating the cache in this
case.
Fixes: 355e3532132b ("xfs: cache minimum realtime summary level")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
get_unlocked_entry() uses an exclusive wait because it is guaranteed to
eventually obtain the lock and follow on with an unlock+wakeup cycle.
The wait_entry_unlocked() path does not have the same guarantee. Rather
than open-code an extra wakeup, just switch to a non-exclusive wait.
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch removes the check from nfs_compare_mount_options to see if a
`sec' option was passed for the current mount before comparing auth
flavors and instead just always compares auth flavors.
Consider the following scenario:
You have a server with the address 192.168.1.1 and two exports /export/a
and /export/b. The first export supports `sys' and `krb5' security, the
second just `sys'.
Assume you start with no mounts from the server.
The following results in EIOs being returned as the kernel nfs client
incorrectly thinks it can share the underlying `struct nfs_server's:
$ mkdir /tmp/{a,b}
$ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a
$ sudo mount -t nfs -o vers=3 192.168.1.1:/export/b /tmp/b
$ df >/dev/null
df: ‘/tmp/b’: Input/output error
Signed-off-by: Chris Perl <cperl@janestreet.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlwcW2IACgkQiiy9cAdy
T1FaLAv/Vs0QhYATaJHkvmjk7EdFoXTZST2wYxPcPSrOfWbCaMjsKREwPVUjOCiW
W//zL/Qbi0FSVMWdoskJP0KQC5rAhUSxlvtDxrpzqszfeJprIs0oL5IRlOwxNdtT
F9/i8+s/AuYq0TTn15UtqZHS6wZdWcerdttWV1V/97hEwcO5Xg0pEtyCmLPf7k8W
wNdxCAQFZ9j2pDVyuJO3a0+Tas34dc2t/cac12h0qeXbrE6e88bQWa6bpEKSvCKr
cMU94pkajCKeelZOhq+ga7cCmlBJs6gt4sgsKEsoDn72tQCsWVH6p1N4+AxmLsZU
bR65XodusR1WHMesSth7QraUk0pIQ4ZzMRPZJCkh9bSjaa+fxX1Up/sjn74q1prf
DHJ/52rQrWK3hvETUZD2B6N9AEDN0swbqeCJRlUYlzG5OEdfit9qSgfTaRzYxVnX
+tct7j+8mjzk+rsGuTXrQupPXUndPTcpUrKFp5db9Sejcx/Gw/atKhW6mVgL3LiR
8kVIraSV
=+8yd
-----END PGP SIGNATURE-----
Merge tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb3 fix from Steve French:
"An important smb3 fix for an regression to some servers introduced by
compounding optimization to rmdir.
This fix has been tested by multiple developers (including me) with
the usual private xfstesting, but also by the new cifs/smb3 "buildbot"
xfstest VMs (thank you Ronnie and Aurelien for good work on this
automation). The automated testing has been updated so that it will
catch problems like this in the future.
Note that Pavel discovered (very recently) some unrelated but
extremely important bugs in credit handling (smb3 flow control problem
that can lead to disconnects/reconnects) when compounding, that I
would have liked to send in ASAP but the complete testing of those two
fixes may not be done in time and have to wait for 4.21"
* tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Fix rmdir compounding regression to strict servers
Adding options to growing mnt_opts. NFS kludge with passing
context= down into non-text-options mount switched to it, and
with that the last use of ->sb_parse_opts_str() is gone.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Keep void * instead, allocate on demand (in parse_str_opts, at the
moment). Eventually both selinux and smack will be better off
with private structures with several strings in those, rather than
this "counter and two pointers to dynamically allocated arrays"
ugliness. This commit allows to do that at leisure, without
disrupting anything outside of given module.
Changes:
* instead of struct security_mnt_opt use an opaque pointer
initialized to NULL.
* security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and
security_free_mnt_opts() take it as var argument (i.e. as void **);
call sites are unchanged.
* security_sb_set_mnt_opts() and security_sb_remount() take
it by value (i.e. as void *).
* new method: ->sb_free_mnt_opts(). Takes void *, does
whatever freeing that needs to be done.
* ->sb_set_mnt_opts() and ->sb_remount() might get NULL as
mnt_opts argument, meaning "empty".
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* if mount(2) passes something like "context=foo" with MS_REMOUNT
in flags (/sbin/mount.nfs will _not_ do that - you need to issue
the syscall manually), you'll get leaked copies for LSM options.
The reason is that instead of nfs_{alloc,free}_parsed_mount_data()
nfs_remount() uses kzalloc/kfree, which lacks the needed cleanup.
* selinux options are not changed on remount (as for any other
fs), but in case of NFS the failure is quiet - they are not compared
to what we used to have, with complaint in case of attempted changes.
Trivially fixed by converting to use of security_sb_remount().
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1) keeping a copy in btrfs_fs_info is completely pointless - we never
use it for anything. Getting rid of that allows for simpler calling
conventions for setup_security_options() (caller is responsible for
freeing mnt_opts in all cases).
2) on remount we want to use ->sb_remount(), not ->sb_set_mnt_opts(),
same as we would if not for FS_BINARY_MOUNTDATA. Behaviours *are*
close (in fact, selinux sb_set_mnt_opts() ought to punt to
sb_remount() in "already initialized" case), but let's handle
that uniformly. And the only reason why the original btrfs changes
didn't go for security_sb_remount() in btrfs_remount() case is that
it hadn't been exported. Let's export it for a while - it'll be
going away soon anyway.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... leaving the "is it kernel-internal" logics in the caller.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
combination of alloc_secdata(), security_sb_copy_data(),
security_sb_parse_opt_str() and free_secdata().
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This paves the way for retaining the LSM options from a common filesystem
mount context during a mount parameter parsing phase to be instituted prior
to actual mount/reconfiguration actions.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This paves the way for retaining the LSM options from a common filesystem
mount context during a mount parameter parsing phase to be instituted prior
to actual mount/reconfiguration actions.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
iomap_is_partially_uptodate() is intended to check wither blocks within
the selected range of a not-uptodate page are uptodate; if the range we
care about is up to date, it's an optimization.
However, the iomap implementation continues to check all blocks up to
from+count, which is beyond the page, and can even be well beyond the
iop->uptodate bitmap.
I think the worst that will happen is that we may eventually find a zero
bit and return "not partially uptodate" when it would have otherwise
returned true, and skip the optimization. Still, it's clearly an invalid
memory access that must be fixed.
So: fix this by limiting the search to within the page as is done in the
non-iomap variant, block_is_partially_uptodate().
Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
page system:
BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
- Kconfig dependency fixes for our new auth feature
- Fix for selecting the right compressor when creating a fs
- Bugfix for a bug in UBIFS's O_TMPFILE implementation
- Refcounting fixes for UBI
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlwb+MAWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSIjD/9zyAQUaMXJzI/ct2SXXikyTgRi
hmLbUhuIEZD/c3tDQMgb3XqL7yYkOt/QnfeMXpi/Fy9wAdKcOU9QCcL6TPfR9B+B
v22VdDcsU5pp2Hcinm2Yz6v2OyYwchb2YShfMkiZSf0CKAhcsKZhaeqhug/Zq+w5
xZ6fMPvu8HhinBavq9OgSpPx892vlF1ypXHNeQNxwslz/gWKso47jtmXpz4LDX7+
s4+p+lagVJ0MHQ3fVhBvEplWr9zPKHvX8uGohADLsDp5bAqRyGNxBwqgGm6J6H7B
UHuqUjUf19hUZEauwKugmPTM3Y0TkZjV472qsp9omx8w/XozVqfb4YgMBBE3Qwuy
ZS7GcaGxf54WJbOJFQv7pS3DUmfOjQBnU7YAR7phMBDeOUfxRXWkuzIB+RkED+0d
7JjTGWQItdbTfXl2QWfSr/1rj45LNKmVPMAI2pKUBnC9dD9pktCcB90vTcrg2vjE
3U9GbqOVORZvHh2bcR0Y8uDUBOp4v7ybCc+OOTNVWuWQpckp79E/7dkVLbpoI12R
xEgMEaCSXlwPUxQ/FrxKBWhOZprPXNMo4xw547SDBxMq0IKUniA+YoUxtN3FxhQ8
OiLkfcQ+5bove9qYfARaRC/hk5lpgMWebAGou/CkfTIPJsw3ZdlmL6HH6ag5i6fS
fBHsVBdtv0ADSLXylg==
=nZML
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS fixes from Richard Weinberger:
- Kconfig dependency fixes for our new auth feature
- Fix for selecting the right compressor when creating a fs
- Bugfix for a bug in UBIFS's O_TMPFILE implementation
- Refcounting fixes for UBI
* tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs:
ubifs: Handle re-linking of inodes correctly while recovery
ubi: Do not drop UBI device reference before using
ubi: Put MTD device after it is not used
ubifs: Fix default compression selection in ubifs
ubifs: Fix memory leak on error condition
ubifs: auth: Add CONFIG_KEYS dependency
ubifs: CONFIG_UBIFS_FS_AUTHENTICATION should depend on UBIFS_FS
ubifs: replay: Fix high stack usage
Separate just the changing of mount flags (MS_REMOUNT|MS_BIND) from full
remount because the mount data will get parsed with the new fs_context
stuff prior to doing a remount - and this causes the syscall to fail under
some circumstances.
To quote Eric's explanation:
[...] mount(..., MS_REMOUNT|MS_BIND, ...) now validates the mount options
string, which breaks systemd unit files with ProtectControlGroups=yes
(e.g. systemd-networkd.service) when systemd does the following to
change a cgroup (v1) mount to read-only:
mount(NULL, "/run/systemd/unit-root/sys/fs/cgroup/systemd", NULL,
MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL)
... when the kernel has CONFIG_CGROUPS=y but no cgroup subsystems
enabled, since in that case the error "cgroup1: Need name or subsystem
set" is hit when the mount options string is empty.
Probably it doesn't make sense to validate the mount options string at
all in the MS_REMOUNT|MS_BIND case, though maybe you had something else
in mind.
This is also worthwhile doing because we will need to add a mount_setattr()
syscall to take over the remount-bind function.
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
Only the mount namespace code that implements mount(2) should be using the
MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is
included.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION -- xfs
FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
generic/038 454s ...
run fstests generic/038 at 2018-12-20 18:43:05
XFS (sdc): Unmounting Filesystem
XFS (sdc): Mounting V5 Filesystem
XFS (sdc): Ending clean mount
BUG: Bad page state in process kswapd0 pfn:3a7fa
page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
flags: 0xfffffc0000000()
raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
raw: 0000000000000001 0000000000000000 00000000ffffffff
page dumped because: non-NULL mapping
CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
Call Trace:
dump_stack+0x67/0x90
bad_page.cold.116+0x8a/0xbd
free_pcppages_bulk+0x4bf/0x6a0
free_unref_page_list+0x10f/0x1f0
shrink_page_list+0x49d/0xf50
shrink_inactive_list+0x19d/0x3b0
shrink_node_memcg.constprop.77+0x398/0x690
? shrink_slab.constprop.81+0x278/0x3f0
shrink_node+0x7a/0x2f0
kswapd+0x34b/0x6d0
? node_reclaim+0x240/0x240
kthread+0x11f/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x24/0x30
Disabling lock debugging due to kernel taint
....
The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use __print_symbolic to print the scrub type in ftrace output.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Use __print_symbolic to print the btree type in ftrace output.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Move XFS_INODE_FORMAT_STR to libxfs so that we don't forget to keep it
updated, and add necessary TRACE_DEFINE_ENUM.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Move XFS_AG_BTREE_CMP_FORMAT_STR to libxfs so that we don't forget to
keep it updated, and TRACE_DEFINE_ENUM the values while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
ftrace's __print_symbolic() has a (very poorly documented) requirement
that any enum values used in the symbol to string translation table be
wrapped in a TRACE_DEFINE_ENUM so that the enum value can be encoded in
the ftrace ring buffer. Fix this unsatisfied requirement.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Use %pS instead of %pF in ftrace strings so that we record the actual
function address instead of the function descriptor.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
If the file system has been shut down or is read-only, then
ext4_write_inode() needs to bail out early.
Also use jbd2_complete_transaction() instead of ext4_force_commit() so
we only force a commit if it is needed.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Some time back, nfsd switched from calling vfs_fsync() to using a new
commit_metadata() hook in export_operations(). If the file system did
not provide a commit_metadata() hook, it fell back to using
sync_inode_metadata(). Unfortunately doesn't work on all file
systems. In particular, it doesn't work on ext4 due to how the inode
gets journalled --- the VFS writeback code will not always call
ext4_write_inode().
So we need to provide our own ext4_nfs_commit_metdata() method which
calls ext4_write_inode() directly.
Google-Bug-Id: 121195940
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.
This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.
For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning. A look-up of a low-level cred will
map this to a machine credential.
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Use the common 'struct cred' to pass credentials for readdir.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Rather than keying the access cache with 'struct rpc_cred',
use 'struct cred'. Then use cred_fscmp() to compare
credentials rather than comparing the raw pointer.
A benefit of this approach is that in the common case we avoid the
rpc_lookup_cred_nonblock() call which can be slow when the cred cache is large.
This also keeps many fewer items pinned in the rpc cred cache, so the
cred cache is less likely to get large.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS needs to know when a credential is about to expire so that
it can modify write-back behaviour to finish the write inside the
expiry time.
It currently uses functions in SUNRPC code which make use of a
fairly complex callback scheme and flags in the generic credientials.
As I am working to discard the generic credentials, this has to change.
This patch moves the logic into NFS, in part by finding and caching
the low-level credential in the open_context. We then make direct
cred-api calls on that.
This makes the code much simpler and removes a dependency on generic
rpc credentials.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When NFS creates a machine credential, it is a "generic" credential,
not tied to any auth protocol, and is really just a container for
the princpal name.
This doesn't get linked to a genuine credential until rpcauth_bindcred()
is called.
The lookup always succeeds, so various places that test if the machine
credential is NULL, are pointless.
As a step towards getting rid of generic credentials, this patch gets
rid of generic machine credentials. The nfs_client and rpc_client
just hold a pointer to a constant principal name.
When a machine credential is wanted, a special static 'struct rpc_cred'
pointer is used. rpcauth_bindcred() recognizes this, finds the
principal from the client, and binds the correct credential.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This lock is no longer necessary.
If nfs4_get_renew_cred() needs to hunt through the open-state
creds for a user cred, it still takes the lock to stablize
the rbtree, but otherwise there are no races.
Note that this completely removes the lock from nfs4_renew_state().
It appears that the original need for the locking here was removed
long ago, and there is no longer anything to protect.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFSv4 state management tries a root credential when no machine
credential is available, as can happen with kerberos.
It does this by replacing the cl_machine_cred with a root credential.
This means that any user of the machine credential needs to take
a lock while getting a reference to the machine credential, which is
a little cumbersome.
So introduce an explicit cl_root_cred, and never free either
credential until client shutdown. This means that no locking
is needed to reference these credentials. Future patches
will make use of this.
This is only a temporary addition. both cl_machine_cred and
cl_root_cred will disappear later in the series.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
We can use cred->groupinfo (from the 'struct cred') instead.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The SUNRPC credential framework was put together before
Linux has 'struct cred'. Now that we have it, it makes sense to
use it.
This first step just includes a suitable 'struct cred *' pointer
in every 'struct auth_cred' and almost every 'struct rpc_cred'.
The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing
else really makes sense.
For rpc_cred, the pointer is reference counted.
For auth_cred it isn't. struct auth_cred are either allocated on
the stack, in which case the thread owns a reference to the auth,
or are part of 'struct generic_cred' in which case gc_base owns the
reference, and "acred" shares it.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Please see comment to filelayout_pg_test for reference.
To: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
commit e8f25e6d6d19 "NFS: Remove the NFS v4 xdev mount function"
removed the last use of this.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If we receive a file handle, either from NFS or open_by_handle_at(2),
and it points at an inode which has not been initialized, and the file
system has metadata checksums enabled, we shouldn't try to get the
inode, discover the checksum is invalid, and then declare the file
system as being inconsistent.
This can be reproduced by creating a test file system via "mke2fs -t
ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
directory, and then running the following program.
#define _GNU_SOURCE
#include <fcntl.h>
struct handle {
struct file_handle fh;
unsigned char fid[MAX_HANDLE_SZ];
};
int main(int argc, char **argv)
{
struct handle h = {{8, 1 }, { 12, }};
open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
return 0;
}
Google-Bug-Id: 120690101
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
In ext4_expand_extra_isize_ea(), we calculate the total size of the
xattr header, plus the xattr entries so we know how much of the
beginning part of the xattrs to move when expanding the inode extra
size. We need to include the terminating u32 at the end of the xattr
entries, or else if there is uninitialized, non-zero bytes after the
xattr entries and before the xattr values, the list of xattr entries
won't be properly terminated.
Reported-by: Steve Graham <stgraham2000@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org