linux-stable/kernel
Zhang Yi b3fbfad0b7 futex: Take hugepages into account when generating futex_key
commit 13d60f4b6a upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13 11:39:17 -07:00
..
debug KGDB/KDB fixes and cleanups 2013-03-02 08:31:39 -08:00
events hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() 2013-07-03 10:55:21 -07:00
gcov kernel/gcov: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:33 -08:00
irq Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
power Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
sched sched: Avoid prev->stime underflow 2013-05-19 11:38:19 -07:00
time timekeeping: Correct run-time detection of persistent_clock. 2013-06-13 10:49:52 -07:00
trace tracing: Fix leaks of filter preds 2013-05-19 11:38:25 -07:00
.gitignore kernel/hz.bc: ignore. 2013-04-22 07:09:06 -07:00
acct.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
async.c Merge branch 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2013-02-19 22:01:33 -08:00
audit_tree.c kernel/audit_tree.c: tree will leak memory when failure occurs in audit_trim_trees() 2013-05-11 07:18:39 -07:00
audit_watch.c audit: catch possible NULL audit buffers 2013-01-11 14:54:55 -08:00
audit.c audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE 2013-06-20 12:01:27 -07:00
audit.h audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditfilter.c audit: Make testing for a valid loginuid explicit. 2013-05-19 11:38:47 -07:00
auditsc.c audit: Make testing for a valid loginuid explicit. 2013-05-19 11:38:47 -07:00
backtracetest.c
bounds.c
capability.c Add file_ns_capable() helper function for open-time capability checking 2013-04-14 10:06:31 -07:00
cgroup_freezer.c cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() 2012-11-19 08:13:38 -08:00
cgroup.c cgroup: fix a subtle bug in descendant pre-order walk 2013-06-07 12:53:07 -07:00
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-02-23 18:50:11 -08:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
context_tracking.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 18:19:48 -08:00
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpu.c CPU hotplug: provide a generic helper to disable/enable CPU hotplug 2013-06-20 12:01:29 -07:00
cpuset.c Merge branch 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2013-02-20 09:18:31 -08:00
crash_dump.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-18 10:55:28 -08:00
delayacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
dma.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
elfcore.c
exec_domain.c
exit.c Revert "lockdep: check that no locks held at freeze time" 2013-03-31 11:38:33 -07:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c userns: Don't allow CLONE_NEWUSER | CLONE_FS 2013-03-13 15:00:20 -07:00
freezer.c freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() 2012-10-26 14:27:49 -07:00
futex_compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-02-23 18:50:11 -08:00
futex.c futex: Take hugepages into account when generating futex_key 2013-07-13 11:39:17 -07:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Add expiry time overflow check in hrtimer_interrupt 2013-05-07 20:33:10 -07:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c Merge branch 'nohz/printk-v8' into irq/core 2013-02-05 00:48:46 +01:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c kcmp: include linux/ptrace.h 2012-12-20 17:40:19 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
kexec.c kexec: use Crash kernel for Crash kernel low 2013-04-17 12:35:34 -07:00
kmod.c usermodehelper: check subprocess_info->path != NULL 2013-05-19 11:38:45 -07:00
kprobes.c kprobes: Fix a double lock bug of kprobe_mutex 2013-04-18 08:58:38 -07:00
ksysfs.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 18:10:49 -08:00
kthread.c kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
latencytop.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep_internals.h
lockdep_proc.c lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name() 2012-10-24 12:39:09 +02:00
lockdep_states.h
lockdep.c Revert "lockdep: check that no locks held at freeze time" 2013-03-31 11:38:33 -07:00
Makefile MODSIGN: do not send garbage to stderr when enabling modules signature 2013-05-11 07:18:23 -07:00
modsign_certificate.S MODSIGN: Avoid using .incbin in C source 2012-12-14 13:06:44 +10:30
modsign_pubkey.c keys: use keyring_alloc() to create module signing keyring 2012-12-20 17:40:21 -08:00
module_signing.c MODSIGN: Don't use enum-type bitfields in module signature info block 2012-12-05 11:27:24 +10:30
module-internal.h MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
module.c module: do percpu allocation after uniqueness check. No, really! 2013-07-13 11:39:17 -07:00
mutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex-debug.h
mutex.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
mutex.h
notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
padata.c padata: use __this_cpu_read per-cpu helper 2012-12-06 17:16:23 +08:00
panic.c taint: add explicit flag to show whether lock dep is still OK. 2013-01-21 17:17:57 +10:30
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid_namespace.c pid: Handle the exit of a multi-threaded init. 2013-03-26 03:41:23 -07:00
pid.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
posix-cpu-timers.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 19:05:45 -08:00
posix-timers.c posix-timers: convert to idr_alloc() 2013-02-27 19:10:19 -08:00
printk.c kmsg: honor dmesg_restrict sysctl on /dev/kmsg 2013-06-20 12:01:29 -07:00
profile.c profiling: Remove unused timer hook 2013-01-24 15:37:26 +01:00
ptrace.c uprobes: Add exports for module use 2013-02-08 17:47:13 +01:00
range.c range: Do not add new blank slot with add_range_with_merge 2013-06-27 10:38:46 -07:00
rcu.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcupdate.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny_plugin.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcutiny.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutorture.c rcu: Allow rcutorture to be built at low optimization levels 2013-02-04 12:18:20 -08:00
rcutree_plugin.h rcu: Make rcu_nocb_poll an early_param instead of module_param 2013-01-08 14:12:19 -08:00
rcutree_trace.c rcutrace: single_open() leaks 2013-05-07 20:33:15 -07:00
rcutree.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutree.h Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
relay.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
res_counter.c res_counter: return amount of charges after res_counter_uncharge() 2012-12-18 15:02:12 -08:00
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2012-10-06 03:05:31 +09:00
rtmutex_common.h
rtmutex-debug.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex-debug.h
rtmutex-tester.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex.h
rwsem.c lockdep, rwsem: provide down_write_nest_lock() 2013-01-11 14:54:55 -08:00
seccomp.c seccomp: Make syscall skipping and nr changes more consistent 2012-10-02 21:14:29 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c kernel/signal.c: stop info leak via the tkill and the tgkill syscalls 2013-04-17 16:10:45 -07:00
smp.c smp: make smp_call_function_many() use logic similar to smp_call_function_single() 2013-02-21 17:22:20 -08:00
smpboot.c kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c Fix lockup related to stop_machine being stuck in __do_softirq. 2013-06-13 10:49:50 -07:00
spinlock.c locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
srcu.c srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock() 2013-02-07 15:19:36 -08:00
stacktrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stop_machine.c stop_machine: Mark per cpu stopper enabled early 2013-02-26 22:25:17 +01:00
sys_ni.c module: add syscall to load module from fd 2012-12-14 13:05:22 +10:30
sys.c reboot: rigrate shutdown/reboot to boot cpu 2013-06-20 12:01:29 -07:00
sysctl_binary.c sysctl: fix null checking in bin_dn_node_address() 2013-02-27 19:10:21 -08:00
sysctl.c Initial ARC Linux port with some fixes on top for 3.9-rc1 2013-03-02 07:58:56 -08:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c
time.c time: don't inline EXPORT_SYMBOL functions 2013-02-21 17:22:19 -08:00
timeconst.bc kernel: Replace timeconst.pl with a bc script 2013-02-16 23:17:25 +01:00
timer.c timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE 2013-05-19 11:38:24 -07:00
tracepoint.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
tsacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user_namespace.c userns: Changing any namespace id mappings should require privileges 2013-04-14 18:11:32 -07:00
user-return-notifier.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
user.c userns: Restrict when proc and sysfs can be mounted 2013-03-27 07:50:08 -07:00
utsname_sysctl.c kernel/utsname_sysctl.c: put get/get_uts() into CONFIG_PROC_SYSCTL code block 2013-02-27 19:10:22 -08:00
utsname.c kernel/utsname.c: fix wrong comment about clone_uts_ns() 2013-02-27 19:10:22 -08:00
wait.c propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
watchdog.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-22 19:25:09 -08:00
workqueue_internal.h workqueue: rename cpu_workqueue to pool_workqueue 2013-02-13 19:29:12 -08:00
workqueue.c Merge branch 'for-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2013-03-18 18:47:07 -07:00