We need not check path before btrfs_free_path() is called because
path is checked in btrfs_free_path().
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
At initializing time, for threshold-able workqueue, it's max_active
of kernel workqueue should be 1 and grow if it hits threshold.
But due to the bad naming, there is both 'max_active' for kernel
workqueue and btrfs workqueue.
So wrong value is given at workqueue initialization.
This patch fixes it, and to avoid further misunderstanding, change the
member name of btrfs_workqueue to 'current_active' and 'limit_active'.
Also corresponding comment is added for readability.
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
num_tolerated_disk_barrier_failures in btrfs_balance
Code for updating fs_info->num_tolerated_disk_barrier_failures in
btrfs_balance() lacks raid56 support.
Reason:
Above code was wroten in 2012-08-01, together with
btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.
Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated
later to support raid56, but code in btrfs_balance() was not
updated together.
Fix:
Merge above similar code to a common function:
btrfs_get_num_tolerated_disk_barrier_failures()
and make it support both case.
It can fix this bug with a bonus of cleanup, and make these code
never in above no-sync state from now on.
Suggested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
1: Use ARRAY_SIZE(types) to replace a static-value variant:
int num_types = 4;
2: Use 'continue' on condition to reduce one level tab
if (!XXX) {
code;
...
}
->
if (XXX)
continue;
code;
...
3: Put setting 'num_tolerated_disk_barrier_failures = 2' to
(num_tolerated_disk_barrier_failures > 2) condition to make
make logic neat.
if (num_tolerated_disk_barrier_failures > 0 && XXX)
num_tolerated_disk_barrier_failures = 0;
else if (num_tolerated_disk_barrier_failures > 1) {
if (XXX)
num_tolerated_disk_barrier_failures = 1;
else if (XXX)
num_tolerated_disk_barrier_failures = 2;
->
if (num_tolerated_disk_barrier_failures > 0 && XXX)
num_tolerated_disk_barrier_failures = 0;
if (num_tolerated_disk_barrier_failures > 1 && XXX)
num_tolerated_disk_barrier_failures = ;
if (num_tolerated_disk_barrier_failures > 2 && XXX)
num_tolerated_disk_barrier_failures = 2;
4: Remove comment of:
num_mirrors - 1: if RAID1 or RAID10 is configured and more
than 2 mirrors are used.
which is not fit with code.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
These variables are not used from introduced version, remove them.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Here's the "big" char/misc driver update for 4.3-rc1.
Not much really interesting here, just a number of little changes all
over the place, and some nice consolidation of the nvmem drivers to a
common framework. As usual, the mei drivers stand out as the largest
"churn" to handle new devices and features in their hardware.
All have been in linux-next for a while with no issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlXV844ACgkQMUfUDdst+ymYfQCgmDKjq3fsVHCxNZPxnukFYzvb
xZkAnRb8fuub5gVQFP29A+rhyiuWD13v
=Bq9K
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver patches from Greg KH:
"Here's the "big" char/misc driver update for 4.3-rc1.
Not much really interesting here, just a number of little changes all
over the place, and some nice consolidation of the nvmem drivers to a
common framework. As usual, the mei drivers stand out as the largest
"churn" to handle new devices and features in their hardware.
All have been in linux-next for a while with no issues"
* tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
auxdisplay: ks0108: initialize local parport variable
extcon: palmas: Fix build break due to devm_gpiod_get_optional API change
extcon: palmas: Support GPIO based USB ID detection
extcon: Fix signedness bugs about break error handling
extcon: Drop owner assignment from i2c_driver
extcon: arizona: Simplify pdata symantics for micd_dbtime
extcon: arizona: Declare 3-pole jack if we detect open circuit on mic
extcon: Add exception handling to prevent the NULL pointer access
extcon: arizona: Ensure variables are set for headphone detection
extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio
extcon: arizona: Add basic microphone detection DT/ACPI bindings
extcon: arizona: Update to use the new device properties API
extcon: palmas: Remove the mutually_exclusive array
extcon: Remove optional print_state() function pointer of struct extcon_dev
extcon: Remove duplicate header file in extcon.h
extcon: max77843: Clear IRQ bits state before request IRQ
toshiba laptop: replace ioremap_cache with ioremap
misc: eeprom: max6875: clean up max6875_read()
misc: eeprom: clean up eeprom_read()
misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write
...
If we have a read layout, then sanity check the minimal layout length
so that it does not extend beyond the end of file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
According to RFC5661 section 18.43.3, if the server cannot satisfy
the loga_minlength argument to LAYOUTGET, there are 2 cases:
1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER
2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
According to RFC5661 Section 18.2.4, CLOSE is supposed to return
the zero stateid. This means that nfs_clear_open_stateid_locked()
cannot assume that the result stateid will always match the 'other'
field of the existing open stateid when trying to determine a race
with a parallel OPEN.
Instead, we look at the argument, and check for matches.
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the file was fenced and/or has been deleted on the DS, then we want
to retry pNFS after a layoutreturn with error report. If the server
cannot fix the problem, then we rely on it to tell us so in the
response to the LAYOUTGET.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
As the code stands today, if xfs_trans_reserve() fails, we
goto out_dqrele, which does not free the allocated transaction.
Fix up the goto targets to undo everything properly.
Addresses-Coverity-Id: 145571
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Increasing the inode cache attempt counter was apparently dropped while
refactoring the cache code and so stayed at the initial 0 value. Add the
increment back to make the runtime stats more useful.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
There is an issue with xfs's error reporting in some cases of I/O partially
failing and partially succeeding. Calls like fsync() can report success even
though not all I/O was successful in partial-failure cases such as one disk of
a RAID0 array being offline.
The issue can occur when there are more than one bio per xfs_ioend struct.
Each call to xfs_end_bio() for a bio completing will write a value to
ioend->io_error. If a successful bio completes after any failed bio, no
error is reported do to it writing 0 over the error code set by any failed bio.
The I/O error information is now lost and when the ioend is completed
only success is reported back up the filesystem stack.
xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
being clear. ioend->io_error is initialized to 0 at allocation so only needs
to be updated by a failed bio. Also check that ioend->io_error is 0 so that
the first error reported will be the error code returned.
Cc: stable@vger.kernel.org
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
If xfs_da3_node_read_verify() doesn't recognize the magic number of a
buffer it's just read, set the buffer error to -EFSCORRUPTED so that
the error can be sent up to userspace. Without this patch we'll
notice the bad magic eventually while trying to traverse or change
the block, but we really ought to fail early in the verifier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
According to the flexfiles protocol, the layoutreturn should specify an
array of errors in the following format:
struct ff_ioerr4 {
offset4 ffie_offset;
length4 ffie_length;
stateid4 ffie_stateid;
device_error4 ffie_errors<>;
};
This patch fixes up the code to ensure that our ffie_errors is indeed
encoded as an array (albeit with only a single entry).
Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Client sends a SETATTR request after OPEN for updating attributes.
For create file with S_ISGID is set, the S_ISGID in SETATTR will be
ignored at nfs server as chmod of no PERMISSION.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode
depends on suppattr_exclcreat attribut.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Check opened, only update it when non-NULL.
It's not needs define an unused value for the opened
when calling _nfs4_do_open.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Set rlimit for NFS's files is useless right now.
For local process's rlimit, it should be checked by nfs client.
The same, CIFS also call inode_change_ok checking rlimit at its client
in cifs_setattr_nounix() and cifs_setattr_unix().
v3, fix bad using of error
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
None of the implementations currently use it. The common
bdev_direct_access() entry point handles all the size checks before
calling ->direct_access().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It's not sufficient to just mark the layout segment for layout return. We
also need to set the NFS_LAYOUT_RETURN_BEFORE_CLOSE flag in the layout header.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Print a dlm-specific error when a socket error occurs
when sending a dlm message.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.
The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
insert the mapping into extent tree as a new node.
In order to see the benefit, I add a function for stating time stamping
count as below:
uint64_t rdtsc(void)
{
uint32_t lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.
truncation path: update extent cache from truncate_data_blocks_range
non-truncataion path: update extent cache from other paths
total: all update paths
a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M
Before:
total count average
truncation: 7651022 32768 233.49
Patched:
total count average
truncation: 3321 33 100.64
b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times: 5 times.
Before:
total count average
truncation: 5812480.6 20911.6 277.95
non-truncation: 7783845.6 13440.8 579.12
total: 13596326.2 34352.4 395.79
Patched:
total count average
truncation: 1281283.0 3041.6 421.25
non-truncation: 7355844.4 13662.8 538.38
total: 8637127.4 16704.4 517.06
1) For the updates in truncation path:
- we can see updating in batches leads total tsc and update count reducing
explicitly;
- besides, for a single batched updating, punching multiple extent nodes
in a loop, result in executing more operations, so our average tsc
increase intensively.
2) For the updates in non-truncation path:
- there is a little improvement, that is because for the scenario that we
just need to update in the head or tail of extent node, new interface
optimize to update info in extent node directly, rather than removing
original extent node for updating and then inserting that updated one
into cache as new node.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In order to ensure atomicity of updates, we merge the old layout segments
into the new ones, and then invalidate the old ones.
Also ensure that we order the list of layout segments so that
RO segments are preferred over RW.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This is needed in order to allow merging of contiguous layout segments,
and also to correct the ordering of layouts for those device drivers that
don't necessarily want to place the read-write layouts first.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
updated writeback path to avoid kicking writeback work items if there
are no inodes to be written out; unfortunately, the avoidance logic
was too aggressive and broke sync_inodes_sb().
* sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME
inodes dont't contribute to bdi/wb_has_dirty_io() tests and were
being skipped over.
* inodes are taken off wb->b_dirty/io/more_io lists after writeback
starts on them. sync_inodes_sb() skipping wait_sb_inodes() when
bdi_has_dirty_io() breaks it by making it return while writebacks
are in-flight.
This patch fixes the breakages by
* Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs().
The callers are already testing the condition.
* Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that
it always calls into bdi_split_work_to_wbs() and wait_sb_inodes().
* Making bdi_split_work_to_wbs() consider the b_dirty_time list for
WB_SYNC_ALL writebacks.
Kudos to Eryu, Dave and Jan for tracking down the issue.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean")
Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com
Reported-and-bisected-by: Eryu Guan <eguan@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Ted Ts'o <tytso@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
For a userland lock request, the previous and current
lock modes are used to decide when the lvb should be
copied back to the user. The wrong previous value was
used, so that it always matched the current value.
This caused the lvb to be copied back to the user in
the wrong cases.
Signed-off-by: David Teigland <teigland@redhat.com>
Eliminate a couple of holes in the structure, and move the 2 atomics
into the same cacheline.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Allow advanced users to set the layoutstats timer in order to lengthen
or shorten the period between layoutstat transmissions to the server.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Keep the full list of mirrors in the struct nfs4_ff_layout_mirror so that
they can be shared among the layout segments that use them.
Also ensure that we send out only one copy of the layoutstats per mirror.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We do want to share mirrors between layout segments, so add a refcount
to enable that.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We do not want to update inode attributes with DS values.
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If we return delegation before closing, we fail to do roc check
during close because NFS_LAYOUT_ROC is cleared by delegreturn
and it causes layouts to be still hanging around after delegreturn
+ close, which is a voilation against protocol.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the ctime or mtime or change attribute have changed because
of an operation we initiated, we should make sure that we force
an attribute update. However we do not want to mark the page cache
for revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
There seem to be a couple of new set-but-unused build warnings
that gcc 4.9.3 is now warning about. These are not regressions, just
the compiler being more picky.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The allocsize and biosize mount options are handled identically,
other than allocsize accepting suffixes. suffix_kstrtoint handles
bare numbers just fine too, so these can be collapsed.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Users have occasionally reported that file type for some directory
entries is wrong. This mostly happened after updating libraries some
libraries. After some debugging the problem was traced down to
xfs_dir2_node_replace(). The function uses args->filetype as a file type
to store in the replaced directory entry however it also calls
xfs_da3_node_lookup_int() which will store file type of the current
directory entry in args->filetype. Thus we fail to change file type of a
directory entry to a proper type.
Fix the problem by storing new file type in a local variable before
calling xfs_da3_node_lookup_int().
cc: <stable@vger.kernel.org> # 3.16 - 4.x
Reported-by: Giacomo Comes <comes@naic.edu>
Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>