41948 Commits

Author SHA1 Message Date
Trond Myklebust
81d6dc8b34 NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid
Unlike read layouts, the writeable layout cannot fall back to using only
one of the mirrors. It need to write to all of them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01 15:12:11 -07:00
Trond Myklebust
388ef16640 NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()
Unlike the files layout, flexfiles does not test for the NFS_DEVICEID_INVALID
flag. Instead it relies on NFS_DEVICEID_UNAVAILABLE.
Fix is to replace with nfs4_mark_deviceid_unavailable().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01 15:12:11 -07:00
Yunlei He
01a5ad827a f2fs: upset segment_info repair
upset segment_info like this:

276000|161 0|0   4|70  3|0   3|0   0|0   0|91  4|0   4|232 4|39
276104|0   4|0   4|1   4|0   4|0   4|280 4|0   4|42  4|262 4|38
276204|179 4|89  4|39  4|24  4|0   4|96  4|3   4|428 4|0   4|118
276304|112 4|97  4|0   4|0   4|0   4|68  4|0   4|0   4|86  4|138
276404|0   4|0   0|166 5|39  4|101 0|111

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-09-01 14:45:27 -07:00
Trond Myklebust
972398fa0a NFSv4.1/flexfiles: Fix freeing of mirrors
Mirrors are now shared objects, so we should not be freeing them directly
inside ff_layout_free_lseg(). We should already be doing the right thing
in _ff_layout_free_lseg(), so just let it handle things.

Also ensure that ff_layout_free_mirror() frees the RPC credential if it
is set.

Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01 12:18:57 -07:00
J. Bruce Fields
f984a7ce58 nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
Somebody with a Solaris client was hitting this case.  We haven't
figured out why yet, and don't have a reproducer.  Meanwhile Frank
noticed that RFC 7530 actually recommends CLID_INUSE for this case.
Unlikely to help the original reporter, but may as well fix it.

Reported-by: Frank Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-01 13:53:40 -04:00
Dave Chinner
5d54b8cdea Merge branch 'xfs-misc-fixes-for-4.3-4' into for-next 2015-09-01 10:30:11 +10:00
Jeff Layton
3fcbbd244e nfsd: ensure that delegation stateid hash references are only put once
It's possible that a DELEGRETURN could race with (e.g.) client expiry,
in which case we could end up putting the delegation hash reference more
than once.

Have unhash_delegation_locked return a bool that indicates whether it
was already unhashed. In the case of destroy_delegation we only
conditionally put the hash reference if that returns true.

The other callers of unhash_delegation_locked call it while walking
list_heads that shouldn't yet be detached. If we find that it doesn't
return true in those cases, then throw a WARN_ON as that indicates that
we have a partially hashed delegation, and that something is likely very
wrong.

Tested-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:16 -04:00
Jeff Layton
e85687393f nfsd: ensure that the ol stateid hash reference is only put once
When an open or lock stateid is hashed, we take an extra reference to
it. When we unhash it, we drop that reference. The code however does
not properly account for the case where we have two callers concurrently
trying to unhash the stateid. This can lead to list corruption and the
hash reference being put more than once.

Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
list_head, and then testing to see if that list_head is empty before
releasing the hash reference. This means that some of the unhashing
wrappers now become bool return functions so we can test to see whether
the stateid was unhashed before we put the reference.

Reported-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Andrew W Elble <aweits@rit.edu>
Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:15 -04:00
Jeff Layton
51a5456859 nfsd: allow more than one laundry job to run at a time
We can potentially have several nfs4_laundromat jobs running if there
are multiple namespaces running nfsd on the box. Those are effectively
separated from one another though, so I don't see any reason to
serialize them.

Also, create_singlethread_workqueue automatically adds the
WQ_MEM_RECLAIM flag. Since we run this job on a timer, it's not really
involved in any reclaim paths. I see no need for a rescuer thread.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:14 -04:00
Paul Gortmaker
46cc8ba304 nfsd: don't WARN/backtrace for invalid container deployment.
These messages, combined with the backtrace they trigger, makes it seem
like a serious problem, though a quick search shows distros marking
it as a "won't fix" non-issue when the problem is reported by users.

The backtrace is overkill, and only really manages to show that if
you follow the code path, you can't really avoid it with bootargs
or configuration settings in the container.

Given that, lets tone it down a bit and get rid of the WARN severity,
and the associated backtrace, so people aren't needlessly alarmed.

Also, lets drop the split printk line, since they are grep unfriendly.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:08 -04:00
Randy Dunlap
7fadc59cc8 fs: fix fs/locks.c kernel-doc warning
Fix kernel-doc warnings in fs/locks.c:

Warning(..//fs/locks.c:1577): No description found for parameter 'flags'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-08-31 16:27:25 -04:00
Kinglong Mee
75976de655 NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
Security label can be set in OPEN/CREATE request, nfsd should set
the bitmask in word2 if setting success.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:16:40 -04:00
Kinglong Mee
ead8fb8c24 NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
According to rfc5661 18.16.4,
"If EXCLUSIVE4_1 was used, the client determines the attributes
 used for the verifier by comparing attrset with cva_attrs.attrmask;"

So, EXCLUSIVE4_1 also needs those bitmask used to store the verifier.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:16:39 -04:00
Kinglong Mee
7d580722c9 nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
The encode order should be as the bitmask defined order.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:16:39 -04:00
Kinglong Mee
6896f15aab nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
Currently we'll respond correctly to a request for either
FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both
attributes simultaneously.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:12:39 -04:00
Kinglong Mee
0a2050d744 NFSD: Store parent's stat in a separate value
After commit ae7095a7c4 (nfsd4: helper function for getting mounted_on
ino) we ignore the return value from get_parent_attributes().

Also, the following FATTR4_WORD2_LAYOUT_BLKSIZE uses stat.blksize, so to
avoid overwriting that, use an independent value for the parent's
attributes.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 15:11:05 -04:00
Tsutomu Itoh
527afb4493 Btrfs: cleanup: remove unnecessary check before btrfs_free_path is called
We need not check path before btrfs_free_path() is called because
path is checked in btrfs_free_path().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31 11:46:41 -07:00
Qu Wenruo
c6dd6ea557 btrfs: async_thread: Fix workqueue 'max_active' value when initializing
At initializing time, for threshold-able workqueue, it's max_active
of kernel workqueue should be 1 and grow if it hits threshold.

But due to the bad naming, there is both 'max_active' for kernel
workqueue and btrfs workqueue.
So wrong value is given at workqueue initialization.

This patch fixes it, and to avoid further misunderstanding, change the
member name of btrfs_workqueue to 'current_active' and 'limit_active'.

Also corresponding comment is added for readability.

Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31 11:46:40 -07:00
Zhao Lei
943c6e9925 btrfs: Add raid56 support for updating
num_tolerated_disk_barrier_failures in btrfs_balance

Code for updating fs_info->num_tolerated_disk_barrier_failures in
btrfs_balance() lacks raid56 support.

Reason:
 Above code was wroten in 2012-08-01, together with
 btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.

 Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated
 later to support raid56, but code in btrfs_balance() was not
 updated together.

Fix:
 Merge above similar code to a common function:
 btrfs_get_num_tolerated_disk_barrier_failures()
 and make it support both case.

 It can fix this bug with a bonus of cleanup, and make these code
 never in above no-sync state from now on.

Suggested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31 11:45:48 -07:00
Zhao Lei
2c4580454f btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failures
1: Use ARRAY_SIZE(types) to replace a static-value variant:
   int num_types = 4;

2: Use 'continue' on condition to reduce one level tab
   if (!XXX) {
       code;
       ...
   }
   ->
   if (XXX)
       continue;
   code;
   ...

3: Put setting 'num_tolerated_disk_barrier_failures = 2' to
   (num_tolerated_disk_barrier_failures > 2) condition to make
   make logic neat.
   if (num_tolerated_disk_barrier_failures > 0 && XXX)
       num_tolerated_disk_barrier_failures = 0;
   else if (num_tolerated_disk_barrier_failures > 1) {
       if (XXX)
           num_tolerated_disk_barrier_failures = 1;
       else if (XXX)
           num_tolerated_disk_barrier_failures = 2;
   ->
   if (num_tolerated_disk_barrier_failures > 0 && XXX)
       num_tolerated_disk_barrier_failures = 0;
   if (num_tolerated_disk_barrier_failures > 1 && XXX)
       num_tolerated_disk_barrier_failures = ;
   if (num_tolerated_disk_barrier_failures > 2 && XXX)
       num_tolerated_disk_barrier_failures = 2;

4: Remove comment of:
   num_mirrors - 1: if RAID1 or RAID10 is configured and more
   than 2 mirrors are used.
   which is not fit with code.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31 11:45:47 -07:00
Zhao Lei
8c204c9657 btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks and scrub_chunk
These variables are not used from introduced version, remove them.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31 11:45:46 -07:00
Zhao Lei
7955323bdc btrfs: Update out-of-date "skip parity stripe" comment
Because btrfs support scrub raid56 parity stripe now.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31 11:45:45 -07:00
Linus Torvalds
1c00038c76 Char/Misc driver patches for 4.3-rc1
Here's the "big" char/misc driver update for 4.3-rc1.
 
 Not much really interesting here, just a number of little changes all
 over the place, and some nice consolidation of the nvmem drivers to a
 common framework.  As usual, the mei drivers stand out as the largest
 "churn" to handle new devices and features in their hardware.
 
 All have been in linux-next for a while with no issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlXV844ACgkQMUfUDdst+ymYfQCgmDKjq3fsVHCxNZPxnukFYzvb
 xZkAnRb8fuub5gVQFP29A+rhyiuWD13v
 =Bq9K
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver patches from Greg KH:
 "Here's the "big" char/misc driver update for 4.3-rc1.

  Not much really interesting here, just a number of little changes all
  over the place, and some nice consolidation of the nvmem drivers to a
  common framework.  As usual, the mei drivers stand out as the largest
  "churn" to handle new devices and features in their hardware.

  All have been in linux-next for a while with no issues"

* tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
  auxdisplay: ks0108: initialize local parport variable
  extcon: palmas: Fix build break due to devm_gpiod_get_optional API change
  extcon: palmas: Support GPIO based USB ID detection
  extcon: Fix signedness bugs about break error handling
  extcon: Drop owner assignment from i2c_driver
  extcon: arizona: Simplify pdata symantics for micd_dbtime
  extcon: arizona: Declare 3-pole jack if we detect open circuit on mic
  extcon: Add exception handling to prevent the NULL pointer access
  extcon: arizona: Ensure variables are set for headphone detection
  extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio
  extcon: arizona: Add basic microphone detection DT/ACPI bindings
  extcon: arizona: Update to use the new device properties API
  extcon: palmas: Remove the mutually_exclusive array
  extcon: Remove optional print_state() function pointer of struct extcon_dev
  extcon: Remove duplicate header file in extcon.h
  extcon: max77843: Clear IRQ bits state before request IRQ
  toshiba laptop: replace ioremap_cache with ioremap
  misc: eeprom: max6875: clean up max6875_read()
  misc: eeprom: clean up eeprom_read()
  misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write
  ...
2015-08-31 08:34:13 -07:00
Trond Myklebust
2d89a1d3c9 NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
If we have a read layout, then sanity check the minimal layout length
so that it does not extend beyond the end of file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31 02:05:47 -07:00
Trond Myklebust
21b874c873 NFSv4.1/pnfs: Handle LAYOUTGET return values correctly
According to RFC5661 section 18.43.3, if the server cannot satisfy
the loga_minlength argument to LAYOUTGET, there are 2 cases:
1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER
2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31 01:33:12 -07:00
Trond Myklebust
4ae93560b1 NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31 01:33:12 -07:00
Trond Myklebust
4a1e2feb9d NFSv4.1: Fix a protocol issue with CLOSE stateids
According to RFC5661 Section 18.2.4, CLOSE is supposed to return
the zero stateid. This means that nfs_clear_open_stateid_locked()
cannot assume that the result stateid will always match the 'other'
field of the existing open stateid when trying to determine a race
with a parallel OPEN.

Instead, we look at the argument, and check for matches.

Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-30 18:45:04 -07:00
Trond Myklebust
90816d1dda NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors
If the file was fenced and/or has been deleted on the DS, then we want
to retry pNFS after a layoutreturn with error report. If the server
cannot fix the problem, then we rely on it to tell us so in the
response to the LAYOUTGET.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-30 13:10:27 -07:00
Chao Yu
54d7185642 f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
If extent cache is disable, we will encounter oops when triggering direct
IO as below:

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs]
*pdpt = 000000002bb9a001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd
sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache
usbhid hid e1000
CPU: 3 PID: 3608 Comm: dd Tainted: G           O    4.2.0-rc4 #12
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ef161600 ti: ebd5e000 task.ti: ebd5e000
EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs]
EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000
ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0
Stack:
 ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000
 ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000
 00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000
Call Trace:
 [<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs]
 [<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs]
 [<c160bb00>] ? down_read+0x30/0x50
 [<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs]
 [<c113e115>] generic_file_direct_write+0xa5/0x260
 [<c10b53f8>] ? current_fs_time+0x18/0x50
 [<c113e38b>] __generic_file_write_iter+0xbb/0x210
 [<c113e50f>] ? generic_file_write_iter+0x2f/0x320
 [<c113e63c>] generic_file_write_iter+0x15c/0x320
 [<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs]
 [<c11984d9>] __vfs_write+0xa9/0xe0
 [<c1199227>] vfs_write+0x97/0x180
 [<c119955b>] SyS_write+0x5b/0xd0
 [<c160dcd0>] sysenter_do_call+0x12/0x12
Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00
00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00
EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58
CR2: 000000000000000c
---[ end trace a38c07026a1afffd ]---

This is because when extent cache is disable, extent_tree pointer in struct
f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access
this NULL pointer directly without checking state of extent cache, then,
the oops occurs. Let's fix it by checking state of extent cache before
accessing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-28 10:14:26 -07:00
Eric Sandeen
1a7ccad88d xfs: fix error gotos in xfs_setattr_nonsize
As the code stands today, if xfs_trans_reserve() fails, we
goto out_dqrele, which does not free the allocated transaction.

Fix up the goto targets to undo everything properly.

Addresses-Coverity-Id: 145571
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28 14:51:10 +10:00
Lucas Stach
8774cf8bac xfs: add mssing inode cache attempts counter increment
Increasing the inode cache attempt counter was apparently dropped while
refactoring the cache code and so stayed at the initial 0 value. Add the
increment back to make the runtime stats more useful.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28 14:50:56 +10:00
David Jeffery
c9eb256eda xfs: return errors from partial I/O failures to files
There is an issue with xfs's error reporting in some cases of I/O partially
failing and partially succeeding. Calls like fsync() can report success even
though not all I/O was successful in partial-failure cases such as one disk of
a RAID0 array being offline.

The issue can occur when there are more than one bio per xfs_ioend struct.
Each call to xfs_end_bio() for a bio completing will write a value to
ioend->io_error.  If a successful bio completes after any failed bio, no
error is reported do to it writing 0 over the error code set by any failed bio.
The I/O error information is now lost and when the ioend is completed
only success is reported back up the filesystem stack.

xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
being clear.  ioend->io_error is initialized to 0 at allocation so only needs
to be updated by a failed bio. Also check that ioend->io_error is 0 so that
the first error reported will be the error code returned.

Cc: stable@vger.kernel.org
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28 14:50:45 +10:00
Darrick J. Wong
dfdd4ac66c libxfs: bad magic number should set da block buffer error
If xfs_da3_node_read_verify() doesn't recognize the magic number of a
buffer it's just read, set the buffer error to -EFSCORRUPTED so that
the error can be sent up to userspace.  Without this patch we'll
notice the bad magic eventually while trying to traverse or change
the block, but we really ought to fail early in the verifier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28 14:50:03 +10:00
Trond Myklebust
6669cb8bed NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload
The "FIXME" is outdated. Flexfiles does add a payload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 20:43:20 -04:00
Trond Myklebust
d13549074c NFSv4.1/flexfiles: Fix a protocol error in layoutreturn
According to the flexfiles protocol, the layoutreturn should specify an
array of errors in the following format:

struct ff_ioerr4 {
	offset4        ffie_offset;
	length4        ffie_length;
	stateid4       ffie_stateid;
	device_error4  ffie_errors<>;
};

This patch fixes up the code to ensure that our ffie_errors is indeed
encoded as an array (albeit with only a single entry).

Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 20:42:20 -04:00
Kinglong Mee
5334c5bdac NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1
Client sends a SETATTR request after OPEN for updating attributes.
For create file with S_ISGID is set, the S_ISGID in SETATTR will be
ignored at nfs server as chmod of no PERMISSION.

v3, same as v2.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 19:47:07 -04:00
Kinglong Mee
8c61282ff6 NFS: Get suppattr_exclcreat when getting server capabilities
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode
depends on suppattr_exclcreat attribut.

v3, same as v2.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 19:45:27 -04:00
Kinglong Mee
c5c3fb5f97 NFS: Make opened as optional argument in _nfs4_do_open
Check opened, only update it when non-NULL.
It's not needs define an unused value for the opened
when calling _nfs4_do_open.

v3, same as v2.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 19:44:38 -04:00
Kinglong Mee
ae57ca0f4f NFS: Check size by inode_newsize_ok in nfs_setattr
Set rlimit for NFS's files is useless right now.
For local process's rlimit, it should be checked by nfs client.

The same, CIFS also call inode_change_ok checking rlimit at its client
in cifs_setattr_nounix() and cifs_setattr_unix().

v3, fix bad using of error

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 19:44:21 -04:00
Dan Williams
cb389b9c0e dax: drop size parameter to ->direct_access()
None of the implementations currently use it.  The common
bdev_direct_access() entry point handles all the size checks before
calling ->direct_access().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27 19:40:58 -04:00
Trond Myklebust
0bdb8fa6ec NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return must notify of layout return
It's not sufficient to just mark the layout segment for layout return. We
also need to set the NFS_LAYOUT_RETURN_BEFORE_CLOSE flag in the layout header.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 19:17:33 -04:00
Bob Peterson
b3a5bbfd78 dlm: print error from kernel_sendpage
Print a dlm-specific error when a socket error occurs
when sending a dlm message.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-27 09:34:47 -05:00
Chao Yu
19b2c30d3c f2fs: update extent tree in batches
This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.

The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
   insert the mapping into extent tree as a new node.

In order to see the benefit, I add a function for stating time stamping
count as below:

uint64_t rdtsc(void)
{
	uint32_t lo, hi;
	__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
	return (uint64_t)hi << 32 | lo;
}

My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.

truncation path:	update extent cache from truncate_data_blocks_range
non-truncataion path:	update extent cache from other paths
total:			all update paths

a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M

Before:
		total		count		average
truncation:	7651022		32768		233.49

Patched:
		total		count		average
truncation:	3321		33		100.64

b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times:		5 times.

Before:
		total		count		average
truncation:	5812480.6	20911.6		277.95
non-truncation:	7783845.6	13440.8		579.12
total:		13596326.2	34352.4		395.79

Patched:
		total		count		average
truncation:	1281283.0	3041.6		421.25
non-truncation:	7355844.4	13662.8		538.38
total:		8637127.4	16704.4		517.06

1) For the updates in truncation path:
 - we can see updating in batches leads total tsc and update count reducing
   explicitly;
 - besides, for a single batched updating, punching multiple extent nodes
   in a loop, result in executing more operations, so our average tsc
   increase intensively.
2) For the updates in non-truncation path:
 - there is a little improvement, that is because for the scenario that we
   just need to update in the head or tail of extent node, new interface
   optimize to update info in extent node directly, rather than removing
   original extent node for updating and then inserting that updated one
   into cache as new node.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-26 11:50:35 -07:00
Peng Tao
1090c3bf81 nfs42: remove unused declaration
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 20:06:56 -04:00
Peng Tao
19cf633513 nfs42: decode_layoutstats does not need res parameter
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 20:06:56 -04:00
Trond Myklebust
0762ed2ced NFSv4.1/flexfiles: Allow coalescing of new layout segments and existing ones
In order to ensure atomicity of updates, we merge the old layout segments
into the new ones, and then invalidate the old ones.

Also ensure that we order the list of layout segments so that
RO segments are preferred over RW.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 19:42:43 -04:00
Trond Myklebust
03772d2f00 NFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertion
This is needed in order to allow merging of contiguous layout segments,
and also to correct the ordering of layouts for those device drivers that
don't necessarily want to place the read-write layouts first.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 19:42:43 -04:00
Tejun Heo
006a0973ed writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes()
e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
updated writeback path to avoid kicking writeback work items if there
are no inodes to be written out; unfortunately, the avoidance logic
was too aggressive and broke sync_inodes_sb().

* sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME
  inodes dont't contribute to bdi/wb_has_dirty_io() tests and were
  being skipped over.

* inodes are taken off wb->b_dirty/io/more_io lists after writeback
  starts on them.  sync_inodes_sb() skipping wait_sb_inodes() when
  bdi_has_dirty_io() breaks it by making it return while writebacks
  are in-flight.

This patch fixes the breakages by

* Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs().
  The callers are already testing the condition.

* Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that
  it always calls into bdi_split_work_to_wbs() and wait_sb_inodes().

* Making bdi_split_work_to_wbs() consider the b_dirty_time list for
  WB_SYNC_ALL writebacks.

Kudos to Eryu, Dave and Jan for tracking down the issue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com
Reported-and-bisected-by: Eryu Guan <eguan@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Ted Ts'o <tytso@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-25 14:35:09 -06:00
David Teigland
b96f465035 dlm: fix lvb copy for user locks
For a userland lock request, the previous and current
lock modes are used to decide when the lvb should be
copied back to the user.  The wrong previous value was
used, so that it always matched the current value.
This caused the lvb to be copied back to the user in
the wrong cases.

Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-25 14:41:50 -05:00
Trond Myklebust
540d9864e1 NFSv4.1/pnfs: Add sanity check for the layout range returned by the server
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 14:40:10 -04:00