38206 Commits

Author SHA1 Message Date
Eric Sandeen
04dd1a0d4b xfs: fix crc field handling in xfs_sb_to/from_disk
I discovered this in userspace, but the same change applies
to the kernel.

If we xfs_mdrestore an image from a non-crc filesystem, lo
and behold the restored image has gained a CRC:

# db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img
# xfs_db -c "sb 0" -c "p crc" /dev/sdc1
crc = 0 (correct)
# xfs_db -c "sb 0" -c "p crc" test.img
crc = 0xb6f8d6a0 (correct)

This is because xfs_sb_from_disk doesn't fill in sb_crc,
but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory
CRC to disk - so we get uninitialized memory on disk.

Fix this by always initializing sb_crc to 0 when we read
the superblock, and masking out the CRC bit from ALL_BITS
when we write it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:24:11 +10:00
Eric Sandeen
6ee49a20c1 xfs: don't send null bp to xfs_trans_brelse()
In this case, if bp is NULL, error is set, and we send a
NULL bp to xfs_trans_brelse, which will try to dereference it.

Test whether we actually have a buffer before we try to
free it.

Coverity spotted this.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:23:49 +10:00
Brian Foster
ce57bcf6b8 xfs: check for inode size overflow in xfs_new_eof()
If we write to the maximum file offset (2^63-2), XFS fails to log the
inode size update when the page is flushed. For example:

$ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file
wrote 1/1 bytes at offset 9223372036854775806
1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec)
$ stat -c %s /mnt/file
9223372036854775807
$ umount /mnt ; mount <dev> /mnt/
$ stat -c %s /mnt/file
0

This occurs because XFS calculates the new file size as io_offset +
io_size, I/O occurs in block sized requests, and the maximum supported
file size is not block aligned. Therefore, a write to the max allowable
offset on a 4k blocksize fs results in a write of size 4k to offset
2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the
offset of the block that contains the max file size). The offset plus
size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed
64-bit variable which goes negative and causes the > comparison to the
on-disk inode size to fail. This returns 0 from xfs_new_eof() and
results in no change to the inode on-disk.

Update xfs_new_eof() to explicitly detect overflow of the local
calculation and use the VFS inode size in this scenario. The VFS inode
size is capped to the maximum and thus XFS writes the correct inode size
to disk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:21:53 +10:00
Dave Chinner
a872703f34 xfs: only set extent size hint when asked
Currently the extent size hint is set unconditionally in
xfs_ioctl_setattr() when the FSX_EXTSIZE flag is set. Hence we can
set hints when the inode flags indicating the hint should be used
are not set.  Hence only set the extent size hint from userspace
when the inode has the XFS_DIFLAG_EXTSIZE flag set to indicate that
we should have an extent size hint set on the inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:20:30 +10:00
Dave Chinner
9336e3a765 xfs: project id inheritance is a directory only flag
xfs_set_diflags() allows it to be set on non-directory inodes, and
this flags errors in xfs_repair. Further, inode allocation allows
the same directory-only flag to be inherited to non-directories.
Make sure directory inode flags don't appear on other types of
inodes.

This fixes several xfstests scratch fileystem corruption reports
(e.g. xfs/050) now that xfstests checks scratch filesystems after
test completion.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:18:40 +10:00
Dave Chinner
e076b0f3a5 xfs: kill time.h
The typedef for timespecs and nanotime() are completely unnecessary,
and delay() can be moved to fs/xfs/linux.h, which means this file
can go away.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:18:13 +10:00
Dave Chinner
b1d6cc02f2 xfs: compat_xfs_bstat does not have forkoff
struct compat_xfs_bstat is missing the di_forkoff field and so does
not fully translate the structure correctly. Fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:17:58 +10:00
Dave Chinner
75e58ce4c8 Merge branch 'xfs-buf-iosubmit' into for-next 2014-10-02 09:11:14 +10:00
Christoph Hellwig
8c15612546 xfs: simplify xfs_zero_remaining_bytes
xfs_zero_remaining_bytes() open codes a log of buffer manupulations
to do a read forllowed by a write. It can simply be replaced by an
uncached read followed by a xfs_bwrite() call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:05:44 +10:00
Dave Chinner
ba3726742c xfs: check xfs_buf_read_uncached returns correctly
xfs_buf_read_uncached() has two failure modes. If can either return
NULL or bp->b_error != 0 depending on the type of failure, and not
all callers check for both. Fix it so that xfs_buf_read_uncached()
always returns the error status, and the buffer is returned as a
function parameter. The buffer will only be returned on success.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:05:32 +10:00
Dave Chinner
595bff75dc xfs: introduce xfs_buf_submit[_wait]
There is a lot of cookie-cutter code that looks like:

	if (shutdown)
		handle buffer error
	xfs_buf_iorequest(bp)
	error = xfs_buf_iowait(bp)
	if (error)
		handle buffer error

spread through XFS. There's significant complexity now in
xfs_buf_iorequest() to specifically handle this sort of synchronous
IO pattern, but there's all sorts of nasty surprises in different
error handling code dependent on who owns the buffer references and
the locks.

Pull this pattern into a single helper, where we can hide all the
synchronous IO warts and hence make the error handling for all the
callers much saner. This removes the need for a special extra
reference to protect IO completion processing, as we can now hold a
single reference across dispatch and waiting, simplifying the sync
IO smeantics and error handling.

In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
make it explicitly handle on asynchronous IO. This forces all users
to be switched specifically to one interface or the other and
removes any ambiguity between how the interfaces are to be used. It
also means that xfs_buf_iowait() goes away.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:05:14 +10:00
Dave Chinner
8b131973d1 xfs: kill xfs_bioerror_relse
There is only one caller now - xfs_trans_read_buf_map() - and it has
very well defined call semantics - read, synchronous, and b_iodone
is NULL. Hence it's pretty clear what error handling is necessary
for this case. The bigger problem of untangling
xfs_trans_read_buf_map error handling is left to a future patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:05:05 +10:00
Dave Chinner
2718775469 xfs: xfs_bioerror can die.
Internal buffer write error handling is a mess due to the unnatural
split between xfs_bioerror and xfs_bioerror_relse().

xfs_bwrite() only does sync IO and determines the handler to
call based on b_iodone, so for this caller the only difference
between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE
flag. We don't care what the XBF_DONE flag state is because we stale
the buffer in both paths - the next buffer lookup will clear
XBF_DONE because XBF_STALE is set. Hence we can use common
error handling for xfs_bwrite().

__xfs_buf_delwri_submit() is a similar - it's only ever called
on writes - all sync or async - and again there's no reason to
handle them any differently at all.

Clean up the nasty error handling and remove xfs_bioerror().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:04:56 +10:00
Dave Chinner
8dac392198 xfs: kill xfs_bdstrat_cb
Only has two callers, and is just a shutdown check and error handler
around xfs_buf_iorequest. However, the error handling is a mess of
read and write semantics, and both internal callers only call it for
writes. Hence kill the wrapper, and follow up with a patch to
sanitise the error handling.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:04:40 +10:00
Dave Chinner
61be9c529a xfs: rework xfs_buf_bio_endio error handling
Currently the report of a bio error from completion
immediately marks the buffer with an error. The issue is that this
is racy w.r.t. synchronous IO - the submitter can see b_error being
set before the IO is complete, and hence we cannot differentiate
between submission failures and completion failures.

Add an internal b_io_error field protected by the b_lock to catch IO
completion errors, and only propagate that to the buffer during
final IO completion handling. Hence we can tell in xfs_buf_iorequest
if we've had a submission failure bey checking bp->b_error before
dropping our b_io_remaining reference - that reference will prevent
b_io_error values from being propagated to b_error in the event that
completion races with submission.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:04:31 +10:00
Dave Chinner
e8aaba9a78 xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
We do some work in xfs_buf_ioend, and some work in
xfs_buf_iodone_work, but much of that functionality is the same.
This work can all be done in a single function, leaving
xfs_buf_iodone just a wrapper to determine if we should execute it
by workqueue or directly. hence rename xfs_buf_iodone_work to
xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
need async processing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:04:22 +10:00
Dave Chinner
e11bb8052c xfs: synchronous buffer IO needs a reference
When synchronous IO runs IO completion work, it does so without an
IO reference or a hold reference on the buffer. The IO "hold
reference" is owned by the submitter, and released when the
submission is complete. The IO reference is released when both the
submitter and the bio end_io processing is run, and so if the io
completion work is run from IO completion context, it is run without
an IO reference.

Hence we can get the situation where the submitter can submit the
IO, see an error on the buffer and unlock and free the buffer while
there is still IO in progress. This leads to use-after-free and
memory corruption.

Fix this by taking a "sync IO hold" reference that is owned by the
IO and not released until after the buffer completion calls are run
to wake up synchronous waiters. This means that the buffer will not
be freed in any circumstance until all IO processing is completed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:04:11 +10:00
Dave Chinner
cf53e99d19 xfs: Don't use xfs_buf_iowait in the delwri buffer code
For the special case of delwri buffer submission and waiting, we
don't need to issue IO synchronously at all. The second pass to call
xfs_buf_iowait() can be replaced with  blocking on xfs_buf_lock() -
the buffer will be unlocked when the async IO is complete.

This formalises a sane the method of waiting for async IO - take an
extra reference, submit the IO, call xfs_buf_lock() when you want to
wait for IO completion. i.e.:

	bp = xfs_buf_find();
	xfs_buf_hold(bp);
	bp->b_flags |= XBF_ASYNC;
	xfs_buf_iosubmit(bp);
	xfs_buf_lock(bp)
	error = bp->b_error;
	....
	xfs_buf_relse(bp);

While this is somewhat racy for gathering IO errors, none of the
code that calls xfs_buf_delwri_submit() will race against other
users of the buffers being submitted. Even if they do, we don't
really care if the error is detected by the delwri code or the user
we raced against. Either way, the error will be detected and
handled.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:04:01 +10:00
Dave Chinner
a870fe6dfa xfs: force the log before shutting down
When we have marked the filesystem for shutdown, we want to prevent
any further buffer IO from being submitted. However, we currently
force the log after marking the filesystem as shut down, hence
allowing IO to the log *after* we have marked both the filesystem
and the log as in an error state.

Clean this up by forcing the log before we mark the filesytem with
an error. This replaces the pure CIL flush that we currently have
which works around this same issue (i.e the CIL can't be flushed
once the shutdown flags are set) and hence enables us to clean up
the logic substantially.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-02 09:02:28 +10:00
David Sterba
143f363618 btrfs: remove unused variable from btrfs_parse_options
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-01 19:31:33 +02:00
David Sterba
aab110abcb btrfs: defrag, use unsigned type for extent thresh
Signed type mismatches the ioctl structure, all extent calculations are
done on unsigned types.

Signed-off-by: David Sterba <dsterba@suse.cz>
2014-10-01 19:30:52 +02:00
Jeff Layton
34549ab09e nfsd: eliminate "to_delegation" define
We now have cb_to_delegation and to_delegation, which do the same thing
and are defined separately in different .c files. Move the
cb_to_delegation definition into a header file and eliminate the
redundant to_delegation definition.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-10-01 12:28:01 -04:00
Bob Peterson
19aeb5a65f GFS2: Make rename not save dirent location
This patch fixes a regression in the patch "GFS2: Remember directory
insert point", commit 2b47dad866d04f14c328f888ba5406057b8c7d33.
The problem had to do with the rename function: The function found
space for the new dirent, and remembered that location. But then the
old dirent was removed, which often moved the eligible location for
the renamed dirent. Putting the new dirent at the saved location
caused file system corruption.

This patch adds a new "save_loc" variable to struct gfs2_diradd.
If 1, the dirent location is saved. If 0, the dirent location is not
saved and the buffer_head is released as per previous behavior.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-10-01 14:06:15 +01:00
Jaegeuk Kim
52656e6cf7 f2fs: clean up f2fs_ioctl functions
This patch cleans up f2fs_ioctl functions for better readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:56 -07:00
Dan Carpenter
8a21984d5d f2fs: potential shift wrapping buf in f2fs_trim_fs()
My static checker complains that segment is a u64 but only the lower 31
bits can be used before we hit a shift wrapping bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:56 -07:00
Jaegeuk Kim
44c1615651 f2fs: call f2fs_unlock_op after error was handled
This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:55 -07:00
Jaegeuk Kim
7cd8558baa f2fs: check the use of macros on block counts and addresses
This patch cleans up the existing and new macros for readability.

Rule is like this.

         ,-----------------------------------------> MAX_BLKADDR -,
         |  ,------------- TOTAL_BLKS ----------------------------,
         |  |                                                     |
         |  ,- seg0_blkaddr   ,----- sit/nat/ssa/main blkaddress  |
block    |  | (SEG0_BLKADDR)  | | | |   (e.g., MAIN_BLKADDR)      |
address  0..x................ a b c d .............................
            |                                                     |
global seg# 0...................... m .............................
            |                       |                             |
            |                       `------- MAIN_SEGS -----------'
            `-------------- TOTAL_SEGS ---------------------------'
                                    |                             |
 seg#                               0..........xx..................

= Note =
 o GET_SEGNO_FROM_SEG0 : blk address -> global segno
 o GET_SEGNO           : blk address -> segno
 o START_BLOCK         : segno -> starting block address

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:47 -07:00
Jaegeuk Kim
309cc2b6e7 f2fs: refactor flush_nat_entries to remove costly reorganizing ops
Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:30:41 -07:00
Jaegeuk Kim
4b2fecc846 f2fs: introduce FITRIM in f2fs_ioctl
This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:06:09 -07:00
Jaegeuk Kim
75ab4cb830 f2fs: introduce cp_control structure
This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:01:28 -07:00
Trond Myklebust
72c23f0819 Merge branch 'bugfixes' into linux-next
* bugfixes:
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  NFS: Fabricate fscache server index key correctly
  SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
  nfs: fix duplicate proc entries
2014-09-30 17:21:41 -04:00
Andy Adamson
d1f456b0b9 NFSv4.1: Fix an NFSv4.1 state renewal regression
Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.

The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:

In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.

Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.

An example application:
open, lock, write a file.

sleep for 6 * lease (could be less)

ulock, close.

In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.

This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.

Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...)
Cc: stable@vger.kernel.org # 3.2.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-30 17:18:42 -04:00
J. Bruce Fields
15b23ef5d3 nfsd4: fix corruption of NFSv4 read data
The calculation of page_ptr here is wrong in the case the read doesn't
start at an offset that is a multiple of a page.

The result is that nfs4svc_encode_compoundres sets rq_next_page to a
value one too small, and then the loop in svc_free_res_pages may
incorrectly fail to clear a page pointer in rq_respages[].

Pages left in rq_respages[] are available for the next rpc request to
use, so xdr data may be written to that page, which may hold data still
waiting to be transmitted to the client or data in the page cache.

The observed result was silent data corruption seen on an NFSv4 client.

We tag this as "fixing" 05638dc73af2 because that commit exposed this
bug, though the incorrect calculation predates it.

Particular thanks to Andrea Arcangeli and David Gilbert for analysis and
testing.

Fixes: 05638dc73af2 "nfsd4: simplify server xdr->next_page use"
Cc: stable@vger.kernel.org
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-30 15:57:04 -04:00
Jan Kara
c53f755d33 ocfs2: Back out change to use OCFS2_MAXQUOTAS in ocfs2_setattr()
ocfs2_setattr() actually needs to really use MAXQUOTAS and not
OCFS2_MAXQUOTAS since it will pass the array over to VFS. Currently
this isn't a problem since MAXQUOTAS == OCFS2_MAXQUOTAS but it would
be once we introduce project quotas.

CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
2014-09-30 18:09:59 +02:00
James Morris
6c8ff877cd Merge commit 'v3.16' into next 2014-10-01 00:44:04 +10:00
David Howells
a3b7c00484 CacheFiles: Handle object being killed before being set up
If a cache object gets killed whilst in the process of being set up - for
instance if the netfs relinquishes the cookie that the object is associated
with - then the object's state machine will transit to the DROP_OBJECT state
without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.

This is a problem for CacheFiles because cachefiles_drop_object() assumes that
object->dentry will be set upon reaching the DROP_OBJECT state and has an
ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
they fail).

To fix this, just make the dentry cleanup in cachefiles_drop_object()
conditional on the dentry actually being set and remove the assertion.

	CacheFiles: Assertion failed
	------------[ cut here ]------------
	kernel BUG at .../fs/cachefiles/namei.c:425!
	...
	Workqueue: fscache_object fscache_object_work_func [fscache]
	...
	RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
	...
	Call Trace:
	 [<ffffffffa043280f>] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
	 [<ffffffffa02ac511>] ? fscache_drop_object+0xd1/0x1d0 [fscache]
	 [<ffffffffa02ac697>] ? fscache_object_work_func+0x87/0x210 [fscache]
	 [<ffffffff81080635>] ? process_one_work+0x155/0x450
	 [<ffffffff81081c44>] ? worker_thread+0x114/0x370
	 [<ffffffff81081b30>] ? manage_workers.isra.21+0x2c0/0x2c0
	 [<ffffffff81087fcc>] ? kthread+0xbc/0xe0
	 [<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
	 [<ffffffff8150638c>] ? ret_from_fork+0x7c/0xb0
	 [<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0

Reported-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
2014-09-30 14:50:28 +01:00
Richard Weinberger
443b39cdd5 UBIFS: Fix trivial typo in power_cut_emulated()
s/withing/within/

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-09-30 09:29:44 +03:00
Al Viro
6d13f69444 missing data dependency barrier in prepend_name()
AFAICS, prepend_name() is broken on SMP alpha.  Disclaimer: I don't have
SMP alpha boxen to reproduce it on.  However, it really looks like the race
is real.

CPU1: d_path() on /mnt/ramfs/<255-character>/foo
CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character>

CPU2 does d_alloc(), which allocates an external name, stores the name there
including terminating NUL, does smp_wmb() and stores its address in
dentry->d_name.name.  It proceeds to d_add(dentry, NULL) and d_move()
old dentry over to that.  ->d_name.name value ends up in that dentry.

In the meanwhile, CPU1 gets to prepend_name() for that dentry.  It fetches
->d_name.name and ->d_name.len; the former ends up pointing to new name
(64-byte kmalloc'ed array), the latter - 255 (length of the old name).
Nothing to force the ordering there, and normally that would be OK, since we'd
run into the terminating NUL and stop.  Except that it's alpha, and we'd need
a data dependency barrier to guarantee that we see that store of NUL
__d_alloc() has done.  In a similar situation dentry_cmp() would survive; it
does explicit smp_read_barrier_depends() after fetching ->d_name.name.
prepend_name() doesn't and it risks walking past the end of kmalloc'ed object
and possibly oops due to taking a page fault in kernel mode.

Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-29 14:46:30 -04:00
Anna Schumaker
24bab49122 NFSD: Implement SEEK
This patch adds server support for the NFS v4.2 operation SEEK, which
returns the position of the next hole or data segment in a file.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-29 14:35:20 -04:00
Anna Schumaker
87a15a8090 NFSD: Add generic v4.2 infrastructure
It's cleaner to introduce everything at once and have the server reply
with "not supported" than it would be to introduce extra operations when
implementing a specific one in the middle of the list.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-29 14:35:19 -04:00
Fabian Frederick
37993271cf udf: remove redundant sys_tz declaration
sys_tz is already declared in include/linux/time.h

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-09-29 13:45:12 +02:00
Dave Chinner
bd438f825f Merge branch 'xfs-sparse-fixes' into for-next 2014-09-29 10:52:44 +10:00
Dave Chinner
b972d07971 xfs: annotate user variables passed as void
Some argument callbacks can contain user buffers, and sparse warns
about passing them as void pointers. Cast appropriately to remove
the sparse warnings.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 10:46:22 +10:00
Dave Chinner
e3aed1a081 xfs: xfs_kset should be static
As it is accessed through the struct xfs_mount and can be set up
entirely from fs/xfs/xfs_super.c

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 10:46:08 +10:00
Dave Chinner
bf1ed38330 xfs: xfs_qm_dquot_isolate needs locking annotations for sparse
To remove noise from the build.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 10:43:40 +10:00
Dave Chinner
e68ed77521 xfs: fix use of agi_newino in finobt lookup
Sparse warns that we are passing the big-endian valueo f agi_newino
to the initial btree lookup function when trying to find a new
inode. This is wrong - we need to pass the host order value, not the
disk order value. This will adversely affect the next inode
allocated, but given that the free inode btree is usually much
smaller than the allocated inode btree it is much less likely to be
a performance issue if we start the search in the wrong place.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 10:43:15 +10:00
Dave Chinner
2f43bbd96e Merge branch 'xfs-trans-recover-cleanup' into for-next 2014-09-29 10:00:24 +10:00
Dave Chinner
b818cca197 xfs: refactor recovery transaction start handling
Rework the transaction lookup and allocation code in
xlog_recovery_process_ophdr() to fold two related call-once
helper functions into a single helper. Then fold in all the
XLOG_START_TRANS logic to that helper to clean up the remaining
logic in xlog_recovery_process_ophdr().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 09:45:54 +10:00
Dave Chinner
7656066986 xfs: reorganise transaction recovery item code
The code for managing transactions anf the items for recovery is
spread across 3 different locations in the file. Move them all
together so that it is easy to read the code without needing to jump
long distances in the file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 09:45:42 +10:00
Dave Chinner
88b863db97 xfs: fix double free in xlog_recover_commit_trans
When an error occurs during buffer submission in
xlog_recover_commit_trans(), we free the trans structure twice. Fix
it by only freeing the structure in the caller regardless of the
success or failure of the function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-29 09:45:32 +10:00