46965 Commits

Author SHA1 Message Date
Seth Forshee
f97df70b1c xenfs: Use proc_create_mount_point() to create /proc/xen
Mounting proc in user namespace containers fails if the xenbus
filesystem is mounted on /proc/xen because this directory fails
the "permanently empty" test. proc_create_mount_point() exists
specifically to create such mountpoints in proc but is currently
proc-internal. Export this interface to modules, then use it in
xenbus when creating /proc/xen.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2016-11-17 13:52:18 +01:00
Andreas Gruenbacher
4a59015372 xattr: Fix setting security xattrs on sockfs
The IOP_XATTR flag is set on sockfs because sockfs supports getting the
"system.sockprotoname" xattr.  Since commit 6c6ef9f2, this flag is checked for
setxattr support as well.  This is wrong on sockfs because security xattr
support there is supposed to be provided by security_inode_setsecurity.  The
smack security module relies on socket labels (xattrs).

Fix this by adding a security xattr handler on sockfs that returns
-EAGAIN, and by checking for -EAGAIN in setxattr.

We cannot simply check for -EOPNOTSUPP in setxattr because there are
filesystems that neither have direct security xattr support nor support
via security_inode_setsecurity.  A more proper fix might be to move the
call to security_inode_setsecurity into sockfs, but it's not clear to me
if that is safe: we would end up calling security_inode_post_setxattr after
that as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-17 00:00:23 -05:00
Linus Torvalds
984573abf8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "A regression fix and bug fix bound for stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix fuse_write_end() if zero bytes were copied
  fuse: fix root dentry initialization
2016-11-16 09:20:10 -08:00
Mike Marshall
19ff7fcc76 orangefs: add .owner to debugfs file_operations
Without ".owner = THIS_MODULE" it is possible to crash the kernel
by unloading the Orangefs module while someone is reading debugfs
files.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-11-16 11:52:19 -05:00
Nicolas Pitre
baa73d9e47 posix-timers: Make them configurable
Some embedded systems have no use for them.  This removes about
25KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:26:35 +01:00
Kees Cook
fc46d4e453 ramoops: add pdata NULL check to ramoops_probe
This adds a check for a NULL platform data, which should only be possible
if a driver incorrectly sets up a probe request without also having defined
the platform_data structure. This is based on a patch from Geliang Tang.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:32 -08:00
Namhyung Kim
70ad35db33 pstore: Convert console write to use ->write_buf
Maybe I'm missing something, but I don't know why it needs to copy the
input buffer to psinfo->buf and then write.  Instead we can write the
input buffer directly.  The only implementation that supports console
message (i.e. ramoops) already does it for ftrace messages.

For the upcoming virtio backend driver, it needs to protect psinfo->buf
overwritten from console messages.  If it could use ->write_buf method
instead of ->write, the problem will be solved easily.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:32 -08:00
Namhyung Kim
e9e360b08a pstore: Protect unlink with read_mutex
When update_ms is set, pstore_get_records() will be called when there's
a new entry.  But unlink can be called at the same time and might
contend with the open-read-close loop.  Depending on the implementation
of platform driver, it may be safe or not.  But I think it'd be better
to protect those race in the first place.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:31 -08:00
Joel Fernandes
7a0032f504 pstore: Use global ftrace filters for function trace filtering
Currently, pstore doesn't have any filters setup for function tracing.
This has the associated overhead and may not be useful for users looking
for tracing specific set of functions.

ftrace's regular function trace filtering is done writing to
tracing/set_ftrace_filter however this is not available if not requested.
In order to be able to use this feature, the support to request global
filtering introduced earlier in the series should be requested before
registering the ftrace ops. Here we do the same.

Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:30 -08:00
Kees Cook
a5d23b956c pstore: Clarify context field przs as dprzs
Since "przs" (persistent ram zones) is a general name in the code now, so
rename the Oops-dump zones to dprzs from przs.

Based on a patch from Nobuhiro Iwamatsu.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:29 -08:00
Kees Cook
c443a5f3f1 pstore: improve error report for failed setup
When setting ramoops record sizes, sometimes it's not clear which
parameters contributed to the allocation failure. This adds a per-zone
name and expands the failure reports.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:28 -08:00
Joel Fernandes
2fbea82bbb pstore: Merge per-CPU ftrace records into one
Up until this patch, each of the per CPU ftrace buffers appear as a
separate ftrace-ramoops-N file. In this patch we merge all the zones into
one and populate a single ftrace-ramoops-0 file.

Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: clarified variables names, added -ENOMEM handling]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:28 -08:00
Joel Fernandes
fbccdeb8d7 pstore: Add ftrace timestamp counter
In preparation for merging the per CPU buffers into one buffer when
we retrieve the pstore ftrace data, we store the timestamp as a
counter in the ftrace pstore record.  We store the CPU number as well
if !PSTORE_CPU_IN_IP, in this case we shift the counter and may lose
ordering there but we preserve the same record size. The timestamp counter
is also racy, and not doing any locking or synchronization here results
in the benefit of lower overhead. Since we don't care much here for exact
ordering of function traces across CPUs, we don't synchronize and may lose
some counter updates but I'm ok with that.

Using trace_clock() results in much lower performance so avoid using it
since we don't want accuracy in timestamp and need a rough ordering to
perform merge.

Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: updated commit message, added comments]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:27 -08:00
Joel Fernandes
a1cf53ac6d ramoops: Split ftrace buffer space into per-CPU zones
If the RAMOOPS_FLAG_FTRACE_PER_CPU flag is passed to ramoops pdata, split
the ftrace space into multiple zones depending on the number of CPUs.

This speeds up the performance of function tracing by about 280% in my
tests as we avoid the locking. The trade off being lesser space available
per CPU. Let the ramoops user decide which option they want based on pdata
flag.

Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: added max_ftrace_cnt to track size, added DT logic and docs]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:26 -08:00
Kees Cook
de83209249 pstore: Make ramoops_init_przs generic for other prz arrays
Currently ramoops_init_przs() is hard wired only for panic dump zone
array. In preparation for the ftrace zone array (one zone per-cpu) and pmsg
zone array, make the function more generic to be able to handle this case.

Heavily based on similar work from Joel Fernandes.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:25 -08:00
Joel Fernandes
663deb4788 pstore: Allow prz to control need for locking
In preparation of not locking at all for certain buffers depending on if
there's contention, make locking optional depending on the initialization
of the prz.

Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: moved locking flag into prz instead of via caller arguments]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-15 16:34:25 -08:00
David S. Miller
bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
Miklos Szeredi
59c3b76cc6 fuse: fix fuse_write_end() if zero bytes were copied
If pos is at the beginning of a page and copied is zero then page is not
zeroed but is marked uptodate.

Fix by skipping everything except unlock/put of page if zero bytes were
copied.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 6b12c1b37e55 ("fuse: Implement write_begin/write_end callbacks")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-11-15 12:34:21 +01:00
Eric Whitney
d5c8dab6a8 ext4: remove parameter from ext4_xattr_ibody_set()
The parameter "handle" isn't used.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-14 21:56:48 -05:00
Eric Whitney
88e0387769 ext4: allow inode expansion for nojournal file systems
Runs of xfstest ext4/022 on nojournal file systems result in failures
because the inodes of some of its test files do not expand as expected.
The cause is a conditional in ext4_mark_inode_dirty() that prevents inode
expansion unless the test file system has a journal.  Remove this
unnecessary restriction.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-14 21:48:35 -05:00
Deepa Dinamani
eeca7ea1ba ext4: use current_time() for inode timestamps
CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
current_time() will be transitioned to be y2038 safe
along with vfs.

current_time() returns timestamps according to the
granularities set in the super_block.
The granularity check in ext4_current_time() to call
current_time() or CURRENT_TIME_SEC is not required.
Use current_time() directly to obtain timestamps
unconditionally, and remove ext4_current_time().

Quota files are assumed to be on the same filesystem.
Hence, use current_time() for these files as well.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2016-11-14 21:40:10 -05:00
Chandan Rajendra
30a9d7afe7 ext4: fix stack memory corruption with 64k block size
The number of 'counters' elements needed in 'struct sg' is
super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
elements in the array. This is insufficient for block sizes >= 32k. In
such cases the memcpy operation performed in ext4_mb_seq_groups_show()
would cause stack memory corruption.

Fixes: c9de560ded61f
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
2016-11-14 21:26:26 -05:00
Chandan Rajendra
69e43e8cc9 ext4: fix mballoc breakage with 64k block size
'border' variable is set to a value of 2 times the block size of the
underlying filesystem. With 64k block size, the resulting value won't
fit into a 16-bit variable. Hence this commit changes the data type of
'border' to 'unsigned int'.

Fixes: c9de560ded61f
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org
2016-11-14 21:04:37 -05:00
Theodore Ts'o
1566a48aaa ext4: don't lock buffer in ext4_commit_super if holding spinlock
If there is an error reported in mballoc via ext4_grp_locked_error(),
the code is holding a spinlock, so ext4_commit_super() must not try to
lock the buffer head, or else it will trigger a BUG:

  BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
  in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount
  CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
   ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166
   ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153
   ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0
  Call Trace:
    [<ffffffff81318c89>] dump_stack+0x67/0x9e
    [<ffffffff810810b0>] ___might_sleep+0xf0/0x140
    [<ffffffff81081153>] __might_sleep+0x53/0xb0
    [<ffffffff8126c1dc>] ext4_commit_super+0x19c/0x290
    [<ffffffff8126e61a>] __ext4_grp_locked_error+0x14a/0x230
    [<ffffffff81081153>] ? __might_sleep+0x53/0xb0
    [<ffffffff812822be>] ext4_mb_generate_buddy+0x1de/0x320

Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0
(and it is the only caller which does so), avoid locking and unlocking
the buffer in this case.

This can result in races with ext4_commit_super() if there are other
problems (which is what commit 4743f83990614 was trying to address),
but a Warning is better than BUG.

Fixes: 4743f83990614
Cc: stable@vger.kernel.org # 4.9
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13 22:02:29 -05:00
Theodore Ts'o
d0abb36db4 ext4: allow ext4_ext_truncate() to return an error
Return errors to the caller instead of declaring the file system
corrupted.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13 22:02:28 -05:00
Theodore Ts'o
2c98eb5ea2 ext4: allow ext4_truncate() to return an error
This allows us to properly propagate errors back up to
ext4_truncate()'s callers.  This also means we no longer have to
silently ignore some errors (e.g., when trying to add the inode to the
orphan inode list).

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13 22:02:26 -05:00
Theodore Ts'o
6da22013bb Merge branch 'fscrypt' into origin 2016-11-13 22:02:22 -05:00
Theodore Ts'o
a2f6d9c4c0 Merge branch 'dax-4.10-iomap-pmd' into origin 2016-11-13 22:02:15 -05:00
Eric Biggers
a6e0891286 fscrypto: don't use on-stack buffer for key derivation
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page.  get_crypt_info() was using a stack buffer to hold the
output from the encryption operation used to derive the per-file key.
Fix it by using a heap buffer.

This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 21:56:25 -05:00
Eric Biggers
08ae877f4e fscrypto: don't use on-stack buffer for filename encryption
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page.  For short filenames, fname_encrypt() was encrypting a
stack buffer holding the padded filename.  Fix it by encrypting the
filename in-place in the output buffer, thereby making the temporary
buffer unnecessary.

This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 21:56:19 -05:00
David Gstir
9c4bb8a3a9 fscrypt: Let fs select encryption index/tweak
Avoid re-use of page index as tweak for AES-XTS when multiple parts of
same page are encrypted. This will happen on multiple (partial) calls of
fscrypt_encrypt_page on same page.
page->index is only valid for writeback pages.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 20:18:16 -05:00
David Gstir
0b93e1b94b fscrypt: Constify struct inode pointer
Some filesystems, such as UBIFS, maintain a const pointer for struct
inode.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 20:18:01 -05:00
David Gstir
7821d4dd45 fscrypt: Enable partial page encryption
Not all filesystems work on full pages, thus we should allow them to
hand partial pages to fscrypt for en/decryption.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:55:21 -05:00
David Gstir
b50f7b268b fscrypt: Allow fscrypt_decrypt_page() to function with non-writeback pages
Some filesystem might pass pages which do not have page->mapping->host
set to the encrypted inode. We want the caller to explicitly pass the
corresponding inode.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:53:10 -05:00
David Gstir
1c7dcf69ee fscrypt: Add in-place encryption mode
ext4 and f2fs require a bounce page when encrypting pages. However, not
all filesystems will need that (eg. UBIFS). This is handled via a
flag on fscrypt_operations where a fs implementation can select in-place
encryption over using a bounce page (which is the default).

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:47:04 -05:00
Deepa Dinamani
362ad5d58e fs: jfs: Replace CURRENT_TIME_SEC by current_time()
jfs uses nanosecond granularity for filesystem timestamps.
Only this assignment is not using nanosecond granularity.
Use current_time() to get the right granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2016-11-11 15:51:39 -06:00
Jens Axboe
bbd7bb7017 block: move poll code to blk-mq
The poll code is blk-mq specific, let's move it to blk-mq.c. This
is a prep patch for improving the polling code.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-11-11 13:40:25 -07:00
Joel Fernandes
d8991f51e5 pstore: Warn on PSTORE_TYPE_PMSG using deprecated function
PMSG now uses ramoops_pstore_write_buf_user() instead of ...write_buf().
Print a ratelimited warning if gets accidentally called.

Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: adjusted commit log and added -EINVAL return]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-11 10:36:46 -08:00
Joel Fernandes
109704492e pstore: Make spinlock per zone instead of global
Currently pstore has a global spinlock for all zones. Since the zones
are independent and modify different areas of memory, there's no need
to have a global lock, so we should use a per-zone lock as introduced
here. Also, when ramoops's ftrace use-case has a FTRACE_PER_CPU flag
introduced later, which splits the ftrace memory area into a single zone
per CPU, it will eliminate the need for locking. In preparation for this,
make the locking optional.

Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: updated commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-11 10:35:37 -08:00
Linus Torvalds
968ef8de55 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  lib/stackdepot: export save/fetch stack for drivers
  mm: kmemleak: scan .data.ro_after_init
  memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB
  coredump: fix unfreezable coredumping task
  mm/filemap: don't allow partially uptodate page for pipes
  mm/hugetlb: fix huge page reservation leak in private mapping error paths
  ocfs2: fix not enough credit panic
  Revert "console: don't prefer first registered if DT specifies stdout-path"
  mm: hwpoison: fix thp split handling in memory_failure()
  swapfile: fix memory corruption via malformed swapfile
  mm/cma.c: check the max limit for cma allocation
  scripts/bloat-o-meter: fix SIGPIPE
  shmem: fix pageflags after swapping DMA32 object
  mm, frontswap: make sure allocated frontswap map is assigned
  mm: remove extra newline from allocation stall warning
2016-11-11 09:44:23 -08:00
Linus Torvalds
c5e4ca6da9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
 "Christoph's and Jan's aio fixes, fixup for generic_file_splice_read
  (removal of pointless detritus that actually breaks it when used for
  gfs2 ->splice_read()) and fixup for generic_file_read_iter()
  interaction with ITER_PIPE destinations."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  splice: remove detritus from generic_file_splice_read()
  mm/filemap: don't allow partially uptodate page for pipes
  aio: fix freeze protection of aio writes
  fs: remove aio_run_iocb
  fs: remove the never implemented aio_fsync file operation
  aio: hold an extra file reference over AIO read/write operations
2016-11-11 09:19:01 -08:00
Linus Torvalds
ef091b3cef Ceph's ->read_iter() implementation is incompatible with the new
generic_file_splice_read() code that went into -rc1.  Switch to the
 less efficient default_file_splice_read() for now; the proper fix is
 being held for 4.10.
 
 We also have a fix for a 4.8 regression and a trival libceph fixup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJYJdjPAAoJEEp/3jgCEfOLzEoH/A3B1qqiqs2WoMn0O4pnEdcM
 TxaU46VOkYcK2wh/xbYAns2kZEXKgcCySv+kXn4l3Gh6/lXVxv4WexNqWdO1o6yN
 GqEufIH7yQM6QOE/hkwtUciBXmPfQMPxF14vvprYQuyu5Bs96mrphiAa7vX6Vbk5
 VhfE/j0shb8Q2QQj/Om0mWqM6JtOAlr5aFtEcJcodbCk1k8CptUcBsSoQ31PXMC7
 UcaBHh1VHGLvx9WeG1Rw1g9tc2LiUyu+UK0csolp51+amB7HezgfmzDQzHtXzBmm
 n90SQwonf0DrdWUGuQlOpHnREwxLSgN19s68FCjLc0jeMTP4b6TFEIUgFxiqWc4=
 =Ws5s
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client

Pull Ceph fixes from Ilya Dryomov:
 "Ceph's ->read_iter() implementation is incompatible with the new
  generic_file_splice_read() code that went into -rc1.  Switch to the
  less efficient default_file_splice_read() for now; the proper fix is
  being held for 4.10.

  We also have a fix for a 4.8 regression and a trival libceph fixup"

* tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client:
  libceph: initialize last_linger_id with a large integer
  libceph: fix legacy layout decode with pool 0
  ceph: use default file splice read callback
2016-11-11 09:17:10 -08:00
Linus Torvalds
ef5beed998 NFS client bugfixes for Linux 4.9
Bugfixes:
 - Trim extra slashes in v4 nfs_paths to fix tools that use this
 - Fix a -Wmaybe-uninitialized warnings
 - Fix suspicious RCU usages
 - Fix Oops when mounting multiple servers at once
 - Suppress a false-positive pNFS error
 - Fix a DMAR failure in NFS over RDMA
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYJOCbAAoJENfLVL+wpUDrbO0QAIkcxdUu2iQeOrk07VP48kDE
 UEfJTal8vbW/KtKyL9bIeRa1qCvYpSJXnnKcR/Uo5VHE5nMz/5omoJofWf5Zg0UM
 iEHyZfOsuGFieBbl1NBaLjEd6MCoJYpmWFUj+3drZ8zqSdqDTL+JgrP7k3XEU2Mx
 glKb7U0AKoclm3h1MKyCyo5TgDVeI5TOhi+i3VVw2IN79VY2CUp4lHWMY4vloghp
 h+GuJWeVFS1nBpfCF9PpTU6LdHDfg4o/J5+DrP+IjIffD1XGzGEjfFR0BX5HyDcN
 PgOSF3fc7uVOOUIBEAqHUHY/7XiKlv6TEMRPdM8ALVoCXZ6hPSSFxq8JBJSWoVEp
 r11ts66VgYxdQgHbs51Y5AaKudLBwU60KosWuddbdZVb4YPM0cn5WQzVezrpoQYu
 k4rfrpt+LFv23NGfIJa6JaTSFBzM+YXmggEGUI8TI/YUFSN+wEp4uzLB4r19nqAP
 ff32iunzV9Z5edpPQFDCf3/1HAhzrL5KWo7E8EvijpdQKZl5k5CnUJxbG22lh4ct
 QIyYg51LjhCayzbRH8Mu+TKUFT29ORlcSp851BotLjT8ZdUetWXcFab93nAkQI7g
 sMREml4DvcXWy8qFAOzi8mX1ddTBumxBfOD0m3skPg+odxwsl/KiwjLCRwfTrgwS
 jfSXsXmrwTniPCDWgKg3
 =hFod
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Most of these fix regressions in 4.9, and none are going to stable
  this time around.

  Bugfixes:
   - Trim extra slashes in v4 nfs_paths to fix tools that use this
   - Fix a -Wmaybe-uninitialized warnings
   - Fix suspicious RCU usages
   - Fix Oops when mounting multiple servers at once
   - Suppress a false-positive pNFS error
   - Fix a DMAR failure in NFS over RDMA"

* tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
  fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
  NFS: Don't print a pNFS error if we aren't using pNFS
  NFS: Ignore connections that have cl_rpcclient uninitialized
  SUNRPC: Fix suspicious RCU usage
  NFSv4.1: work around -Wmaybe-uninitialized warning
  NFS: Trim extra slash in v4 nfs_path
2016-11-11 09:15:30 -08:00
Linus Torvalds
a4fac3b5d1 xfs: update for 4.9-rc5
In this update:
 o fix for aborting deferred transactions on filesystem shutdown.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYI/hIAAoJEK3oKUf0dfodjtIP/28PmMnwkSwC3IfNzMEuIlvM
 Q96IlnIXiOlqXSIlO69VM2cS8YCW2oVUYD6pjbhC2YcOaogYGo4TlFmQMrFL4dgG
 /q9aH+Hf6jGgGfwpC2MKq6WIYFU8oYYzG2FIm3jyEnnDCIFMH4H+FYGgzrNVIraQ
 31nWn3ye/xHD44EWvRbEUAE0ROxbZfgm7+QST+R2+lAhLitTqZLNKkenlN8v5P9h
 WInRYBHxq5beIbbhAgm50wKfvxalrYChDeujorGsGAjJLliWpgJnbah6TwPNvRXH
 SwSkJfzI53AFXqH/45P2X5Ib34P7mIz6hTl3zednfU1GBeB3PxUGay6QU0rxRKAL
 4vv/RB6pd5EzmU+hvnLJbM7lpRWpnV2FoP9YwRoqIR8vUvlXTYLpecEw1mtJwFmd
 VZ99F+8mtz+jGKK01hZDAtP7tHt1JSFhOnOQ506UsmnqhbgWaYuDEOOktX08iTik
 +zwTETuoh4frQDVLjrQ507RWJl3XblWXEgD4Dw+N19w7oTSAbuF6zIU7sTlcLJSx
 APqD1uJUzvfYV11eyY7bosuCx7QtTRwUQWQSkKu0f9FXAuILf2KdDI62lZCnKi5M
 6wTUKFNZfqED+ergGyzPuVuQr/PhbQPEC61EehYMpQPSjeVEay2e30ouM6A+8VPp
 ZQal03fQzJvbxm/+8m3v
 =62jA
 -----END PGP SIGNATURE-----

Merge tag 'xfs-fixes-for-linus-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fix from Dave Chinner:
 "This is a fix for an unmount hang (regression) when the filesystem is
  shutdown.  It was supposed to go to you for -rc3, but I accidentally
  tagged the commit prior to it in that pullreq.

  Summary:

   - fix for aborting deferred transactions on filesystem shutdown"

* tag 'xfs-fixes-for-linus-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: defer should abort intent items if the trans roll fails
2016-11-11 09:13:48 -08:00
Andrey Ryabinin
70d78fe7c8 coredump: fix unfreezable coredumping task
It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.

Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation.  So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.

Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.

Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:12:37 -08:00
Junxiao Bi
d006c71f8a ocfs2: fix not enough credit panic
The following panic was caught when run ocfs2 disconfig single test
(block size 512 and cluster size 8192).  ocfs2_journal_dirty() return
-ENOSPC, that means credits were used up.

The total credit should include 3 times of "num_dx_leaves" from
ocfs2_dx_dir_rebalance(), because 2 times will be consumed in
ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in
ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() ->
ocfs2_dx_dir_format_cluster().  But only two times is included in
ocfs2_dx_dir_rebalance_credits(), fix it.

This can cause read-only fs(v4.1+) or panic for mainline linux depending
on mount option.

  ------------[ cut here ]------------
  kernel BUG at fs/ocfs2/journal.c:775!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
  CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
  task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
  RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
  RSP: 0018:ffff8800a7d4b6d8  EFLAGS: 00010286
  RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
  RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
  RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
  R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
  R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
  FS:  00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
  Call Trace:
    ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
    ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
    ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
    ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
    ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
    ocfs2_mknod+0x5a2/0x1400 [ocfs2]
    ocfs2_create+0x73/0x180 [ocfs2]
    vfs_create+0xd8/0x100
    lookup_open+0x185/0x1c0
    do_last+0x36d/0x780
    path_openat+0x92/0x470
    do_filp_open+0x4a/0xa0
    do_sys_open+0x11a/0x230
    SyS_open+0x1e/0x20
    system_call_fastpath+0x12/0x71
  Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
  RIP  ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
  ---[ end trace 91ac5312a6ee1288 ]---
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

Link: http://lkml.kernel.org/r/1478248135-31963-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:12:37 -08:00
Al Viro
e519e77747 splice: remove detritus from generic_file_splice_read()
i_size check is a leftover from the horrors that used to play with
the page cache in that function.  With the switch to ->read_iter(),
it's neither needed nor correct - for gfs2 it ends up being buggy,
since i_size is not guaranteed to be correct until later (inside
->read_iter()).

Spotted-by: Abhi Das <adas@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-10 18:32:13 -05:00
Yan, Zheng
8a8d561766 ceph: use default file splice read callback
Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-11-10 20:13:04 +01:00
Jan Kara
9484ab1bf4 dax: Introduce IOMAP_FAULT flag
Introduce a flag telling iomap operations whether they are handling a
fault or other IO. That may influence behavior wrt inode size and
similar things.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-10 10:26:50 +11:00
Sebastian Andrzej Siewior
fc4d24c9b4 fs/buffer: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161103145021.28528-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:25 +01:00