linux-stable/fs
Dave Chinner a034676d36 xfs: log vector rounding leaks log space
commit 110dc24ad2 upstream.

The addition of direct formatting of log items into the CIL
linear buffer added alignment restrictions that the start of each
vector needed to be 64 bit aligned. Hence padding was added in
xlog_finish_iovec() to round up the vector length to ensure the next
vector started with the correct alignment.

This adds a small number of bytes to the size of
the linear buffer that is otherwise unused. The issue is that we
then use the linear buffer size to determine the log space used by
the log item, and this includes the unused space. Hence when we
account for space used by the log item, it's more than is actually
written into the iclogs, and hence we slowly leak this space.

This results on log hangs when reserving space, with threads getting
stuck with these stack traces:

Call Trace:
[<ffffffff81d15989>] schedule+0x29/0x70
[<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
[<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
[<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
[<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
.....

The 4 bytes is significant. Brain Foster did all the hard work in
tracking down a reproducable leak to inode chunk allocation (it went
away with the ikeep mount option). His rough numbers were that
creating 50,000 inodes leaked 11 log blocks. This turns out to be
roughly 800 inode chunks or 1600 inode cluster buffers. That
works out at roughly 4 bytes per cluster buffer logged, and at that
I started looking for a 4 byte leak in the buffer logging code.

What I found was that a struct xfs_buf_log_format structure for an
inode cluster buffer is 28 bytes in length. This gets rounded up to
32 bytes, but the vector length remains 28 bytes. Hence the CIL
ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
for that vector rather than 28 bytes which are written into the log.

The fix for this problem is to separately track the bytes used by
the log vectors in the item and use that instead of the buffer
length when accounting for the log space that will be used by the
formatted log item.

Again, thanks to Brian Foster for doing all the hard work and long
hours to isolate this leak and make finding the bug relatively
simple.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Bill <billstuff2001@sbcglobal.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14 09:51:50 +08:00
..
9p mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
adfs fs/adfs/super.c: add __init to init_inodecache() 2014-04-07 16:36:08 -07:00
affs fs/affs/super.c: bugfix / double free 2014-05-06 13:05:00 -07:00
afs AFS: Pass an afs_call* to call->async_workfn() instead of a work_struct* 2014-05-23 13:05:22 +01:00
autofs4 autofs: fix lockref lookup 2014-05-06 13:04:59 -07:00
befs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
bfs fs/bfs/inode.c: add __init to init_inodecache() 2014-04-07 16:36:08 -07:00
btrfs btrfs: only unlock block in verify_parent_transid if we locked it 2014-07-09 11:21:31 -07:00
cachefiles Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-05-05 15:17:02 -07:00
cifs CIFS: fix mount failure with broken pathnames when smb3 mount with mapchars option 2014-07-09 11:21:29 -07:00
coda Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
configfs configfs: fix race between dentry put and lookup 2013-11-21 16:42:27 -08:00
cramfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
debugfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
devpts fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
dlm net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
ecryptfs Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2014-04-04 14:03:05 -07:00
efivarfs efivarfs: 'efivarfs_file_write' function reorganization 2014-03-04 16:16:16 +00:00
efs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
exofs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2014-04-10 14:33:02 -07:00
exportfs exportfs: fix quadratic behavior in filehandle lookup 2013-11-09 00:16:38 -05:00
ext2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
ext4 ext4: fix a potential deadlock in __ext4_es_shrink() 2014-07-17 16:23:16 -07:00
f2fs f2fs: check bdi->dirty_exceeded when trying to skip data writes 2014-07-17 16:23:19 -07:00
fat Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
freevxfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
fscache FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree 2014-02-17 13:47:35 -08:00
fuse fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT 2014-07-31 12:44:08 -07:00
gfs2 mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
hfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
hfsplus Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
hostfs mm + fs: store shadow entries in page cache 2014-04-03 16:21:01 -07:00
hpfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
hppfs clean up scary strncpy(dst, src, strlen(src)) uses 2013-07-03 16:07:41 -07:00
hugetlbfs hugetlb: ensure hugepage access is denied if hugepages are not supported 2014-05-06 13:04:58 -07:00
isofs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
jbd jbd: Revise KERN_EMERG error messages 2013-12-04 12:27:46 +01:00
jbd2 ext4: disable synchronous transaction batching if max_batch_time==0 2014-07-17 16:23:16 -07:00
jffs2 MTD updates for 3.15: 2014-04-07 10:17:30 -07:00
jfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
kernfs kernfs: introduce kernfs_pin_sb() 2014-07-17 16:23:19 -07:00
lockd lockd: ensure we tear down any live sockets when socket creation fails during lockd_up 2014-03-28 10:43:08 -04:00
logfs mm + fs: store shadow entries in page cache 2014-04-03 16:21:01 -07:00
minix Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
ncpfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-04-12 17:31:22 -07:00
nfs nfs: only show Posix ACLs in listxattr if actually present 2014-07-31 12:44:06 -07:00
nfs_common
nfsd nfsd: fix rare symlink decoding bug 2014-07-09 11:21:30 -07:00
nilfs2 mm: implement ->map_pages for page cache 2014-04-07 16:35:53 -07:00
nls nls: have register_nls() set ->owner 2014-01-25 03:14:05 -05:00
notify fanotify: fix -EOVERFLOW with large files on 64-bit 2014-05-06 13:04:59 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
ocfs2 ocfs2: fix double kmem_cache_destroy in dlm_init 2014-05-23 09:37:30 -07:00
omfs mm + fs: store shadow entries in page cache 2014-04-03 16:21:01 -07:00
openpromfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
proc mm: add !pte_present() check on existing hugetlb_entry callbacks 2014-06-06 13:21:16 -07:00
pstore Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
qnx4 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
qnx6 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
quota quota: missing lock in dqcache_shrink_scan() 2014-07-28 08:08:21 -07:00
ramfs fs/ramfs: move ramfs_aops to inode.c 2014-01-23 16:36:58 -08:00
reiserfs reiserfs: call truncate_setsize under tailpack mutex 2014-07-06 18:59:12 -07:00
romfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
squashfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
sysfs kernfs: move the last knowledge of sysfs out from kernfs 2014-06-03 08:11:18 -07:00
sysv Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
ubifs UBIFS: Remove incorrect assertion in shrink_tnc() 2014-07-06 18:59:09 -07:00
udf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
ufs fs/ufs: remove unused ufs_super_block_third pointer 2014-04-07 16:36:16 -07:00
xfs xfs: log vector rounding leaks log space 2014-08-14 09:51:50 +08:00
aio.c aio: protect reqs_available updates from changes in interrupt handlers 2014-07-28 08:08:28 -07:00
anon_inodes.c vfs: Allocate anon_inode_inode in anon_inode_init() 2014-03-27 09:52:54 -07:00
attr.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:44:09 -07:00
bad_inode.c [readdir] ->readdir() is gone 2013-06-29 12:57:04 +04:00
binfmt_aout.c dump_skip(): dump_seek() replacement taking coredump_params 2013-11-09 00:16:26 -05:00
binfmt_elf_fdpic.c elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo) 2013-11-09 00:16:30 -05:00
binfmt_elf.c exec: kill the unnecessary mm->def_flags setting in load_elf_binary() 2014-04-07 16:35:52 -07:00
binfmt_em86.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
binfmt_flat.c
binfmt_misc.c binfmt_misc: add missing 'break' statement 2014-04-03 16:21:16 -07:00
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Ensure we only enable integrity metadata for reads and writes 2014-04-09 08:00:06 -06:00
bio.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
block_dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
buffer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
char_dev.c Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block 2013-11-14 12:08:14 +09:00
compat_binfmt_elf.c binfmt_elf: add ELF_HWCAP2 to compat auxv entries 2014-03-04 08:05:21 +00:00
compat_ioctl.c fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types 2014-03-06 16:30:44 +01:00
compat.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
coredump.c coredump: fix the setting of PF_DUMPCORE 2014-07-31 12:44:08 -07:00
dcache.c lock_parent: don't step on stale ->d_parent of all-but-freed one 2014-06-16 13:44:09 -07:00
dcookies.c fs/compat: fix lookup_dcookie() parameter handling 2014-01-29 16:22:40 -08:00
direct-io.c xfs: update for 3.15-rc1 2014-04-04 15:50:08 -07:00
drop_caches.c drop_caches: add some documentation and info message 2014-04-03 16:21:04 -07:00
eventfd.c eventfd_ctx_fdget(): use fdget() instead of fget() 2014-01-25 03:13:04 -05:00
eventpoll.c epoll: fix use-after-free in eventpoll_release_file 2014-06-30 20:14:04 -07:00
exec.c metag: Reduce maximum stack size to 256MB 2014-05-15 00:00:35 +01:00
fcntl.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
fhandle.c
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
filesystems.c sys_sysfs: Add CONFIG_SYSFS_SYSCALL 2014-04-03 16:21:05 -07:00
fs_struct.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
fs-writeback.c One of the main highlights this time, is not the patches themselves 2014-04-04 14:49:16 -07:00
inode.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:44:09 -07:00
internal.h get rid of s_files and files_lock 2013-11-09 00:16:20 -05:00
ioctl.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
ioprio.c
Kconfig kernfs: add CONFIG_KERNFS 2014-02-07 16:08:57 -08:00
Kconfig.binfmt
libfs.c consolidate simple ->d_delete() instances 2013-11-15 22:04:17 -05:00
locks.c locks: only validate the lock vs. f_mode in F_SETLK codepaths 2014-05-09 11:41:54 -04:00
Makefile kernfs: add CONFIG_KERNFS 2014-02-07 16:08:57 -08:00
mbcache.c ext4: each filesystem creates and uses its own mb_cache 2014-03-18 19:24:49 -04:00
mount.h reduce m_start() cost... 2014-04-01 23:19:09 -04:00
mpage.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
namei.c fs: umount on symlink leaks mnt count 2014-07-31 12:44:08 -07:00
namespace.c VFS: Make delayed_free() call free_vfsmnt() 2014-04-01 23:19:18 -04:00
no-block.c
open.c vfs: fix check for fallocate on active swapfile 2014-08-07 16:53:53 -07:00
pipe.c switch pipe_read() to copy_page_to_iter() 2014-04-01 23:19:22 -04:00
pnode.c smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
pnode.h smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-05-06 13:58:42 -04:00
proc_namespace.c reduce m_start() cost... 2014-04-01 23:19:09 -04:00
read_write.c Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-03-31 14:32:17 -07:00
readdir.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
select.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-11-13 15:34:18 +09:00
seq_file.c seq_file: always clear m->count when we free m->buf 2013-11-18 19:07:53 -08:00
signalfd.c
splice.c vfs: fix vmplice_to_user() 2014-05-28 01:54:52 -04:00
stack.c
stat.c vfs: split out vfs_getattr_nosec 2013-11-09 00:16:31 -05:00
statfs.c vfs: allow O_PATH file descriptors for fstatfs() 2013-10-12 13:12:31 -07:00
super.c fs: Don't return 0 from get_anon_bdev 2014-04-16 11:53:08 -07:00
sync.c Revert "writeback: do not sync data dirtied after sync start" 2014-02-22 02:02:28 +01:00
timerfd.c timerfd: support CLOCK_BOOTTIME clock 2014-01-23 16:57:40 -08:00
utimes.c locks: break delegations on any attribute modification 2013-11-09 00:16:44 -05:00
xattr.c