linux-stable/mm
KAMEZAWA Hiroyuki 8d0120df91 memcg: add mem_cgroup_replace_page_cache() to fix LRU issue
commit ab936cbcd0 upstream.

Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page().  This function replaces a page in the
radix-tree with a new page.  WHen doing this, memory cgroup needs to fix
up the accounting information.  memcg need to check PCG_USED bit etc.

In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache().  So, memcg's LRU accounting information should be
fixed, too.

This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
 In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
races with LRU handling.

Background:
  replace_page_cache_page() is called by FUSE code in its splice() handling.
  Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
  page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
  because rmdir() checks the whole LRU is empty and there is no account leak.
  If a page is on the other LRU than it should be, rmdir() will fail.

This bug was added in March 2011, but no bug report yet.  I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak.  So, no panic, no deadlock.  And, even if an active cgroup
exist, umount can succseed.  So no problem at shutdown.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-18 07:31:58 -08:00
..
backing-dev.c backing-dev: ensure wakeup_timer is deleted 2011-11-21 14:35:28 -08:00
bootmem.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
cleancache.c mm: cleancache core ops functions and config 2011-05-26 10:01:36 -06:00
compaction.c mm: compaction: abort compaction if too many pages are isolated and caller is asynchronous V2 2011-06-15 20:04:02 -07:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c devres: fix possible use after free 2011-07-25 20:57:14 -07:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c fault-injection: add ability to export fault_attr in arbitrary directory 2011-08-03 14:25:20 -10:00
filemap_xip.c mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
filemap.c memcg: add mem_cgroup_replace_page_cache() to fix LRU issue 2012-01-18 07:31:58 -08:00
fremap.c mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
highmem.c mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const. 2011-08-17 13:00:20 -07:00
huge_memory.c mm: thp: tail page refcounting fix 2011-11-11 09:43:41 -08:00
hugetlb.c mm: hugetlb: fix non-atomic enqueue of huge page 2012-01-06 14:17:22 -08:00
hwpoison-inject.c Fix common misspellings 2011-03-31 11:26:23 -03:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: thp: tail page refcounting fix 2011-11-11 09:43:41 -08:00
Kconfig mm Kconfig typo: cleancacne -> cleancache 2011-06-10 14:47:52 +02:00
Kconfig.debug mm: debug-pagealloc: fix kconfig dependency warning 2011-03-22 17:44:02 -07:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
ksm.c ksm: fix NULL pointer dereference in scan_get_next_rmap_item() 2011-06-15 20:04:02 -07:00
maccess.c maccess,probe_kernel: Make write/read src const void * 2011-05-25 19:56:23 -04:00
madvise.c fs: kill i_alloc_sem 2011-07-20 20:47:46 -04:00
Makefile mm: cleancache core ops functions and config 2011-05-26 10:01:36 -06:00
memblock.c mm/memblock.c: avoid abuse of RED_INACTIVE 2011-07-25 20:57:09 -07:00
memcontrol.c memcg: add mem_cgroup_replace_page_cache() to fix LRU issue 2012-01-18 07:31:58 -08:00
memory_hotplug.c mm: extend memory hotplug API to allow memory hotplug in virtual machines 2011-07-25 20:57:08 -07:00
memory-failure.c HWPoison: add memory_failure_queue() 2011-08-03 11:15:58 -04:00
memory.c mm: thp: tail page refcounting fix 2011-11-11 09:43:41 -08:00
mempolicy.c mm/mempolicy.c: refix mbind_range() vma issue 2012-01-06 14:17:23 -08:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: fix race between mremap and removing migration entry 2011-10-19 23:42:58 -07:00
mincore.c mm: clarify the radix_tree exceptional cases 2011-08-03 14:25:24 -10:00
mlock.c mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mmap: fix and tidy up overcommit page arithmetic 2011-07-25 20:57:09 -07:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low 2011-01-13 17:32:31 -08:00
mprotect.c thp: mprotect: transparent huge page support 2011-01-13 17:32:44 -08:00
mremap.c mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high() 2011-05-25 08:39:31 -07:00
nommu.c mmap: fix and tidy up overcommit page arithmetic 2011-07-25 20:57:09 -07:00
oom_kill.c oom: fix integer overflow of points in oom_badness 2012-01-06 14:17:03 -08:00
page_alloc.c mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks 2011-12-21 12:58:23 -08:00
page_cgroup.c mm/page_cgroup.c: simplify code by using SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macros 2011-07-25 20:57:09 -07:00
page_io.c block: kill off REQ_UNPLUG 2011-03-10 08:52:27 +01:00
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
page-writeback.c squeeze max-pause area and drop pass-good area 2011-08-19 22:42:07 +08:00
pagewalk.c pagewalk: fix code comment for THP 2011-07-25 20:57:09 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c percpu: fix chunk range calculation 2011-12-21 12:58:30 -08:00
percpu.c percpu: fix per_cpu_ptr_to_phys() handling of non-page-aligned addresses 2012-01-06 14:17:00 -08:00
pgtable-generic.c mm/pgtable-generic.c: fix CONFIG_SWAP=n build 2011-01-26 10:49:58 +10:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead: readahead page allocations are OK to fail 2011-05-25 08:39:25 -07:00
rmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback 2011-07-26 10:39:54 -07:00
shmem.c mm: clarify the radix_tree exceptional cases 2011-08-03 14:25:24 -10:00
slab.c slab, lockdep: Fix silly bug 2011-12-09 08:55:48 -08:00
slob.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
slub.c slub: fix a possible memleak in __slab_alloc() 2012-01-18 07:31:57 -08:00
sparse-vmemmap.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
sparse.c mm: make some struct page's const 2011-07-25 20:57:07 -07:00
swap_state.c block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
swap.c mm: thp: tail page refcounting fix 2011-11-11 09:43:41 -08:00
swapfile.c mm: let swap use exceptional entries 2011-08-03 14:25:22 -10:00
thrash.c mm: swap-token: add a comment for priority aging 2011-07-25 20:57:08 -07:00
truncate.c mm: a few small updates for radix-swap 2011-08-03 14:25:24 -10:00
util.c mm: nommu: sort mm->mmap list properly 2011-05-25 08:39:05 -07:00
vmalloc.c mm: vmalloc: check for page allocation failure before vmlist insertion 2011-12-21 12:58:24 -08:00
vmscan.c memcg: Revert "memcg: add memory.vmscan_stat" 2011-09-14 18:09:38 -07:00
vmstat.c numa: fix NUMA compile error when sysfs and procfs are disabled 2011-09-14 18:09:37 -07:00