linux-next/arch/mips
Michal Hocko a4ff8e8620 mm: introduce MAP_FIXED_NOREPLACE
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.

This has started as a follow up discussion [3][4] resulting in the
runtime failure caused by hardening patch [5] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber an existing underlying mapping (e.g.  stack).
The reason for the failure is that some architectures enforce an
alignment for the given address hint without MAP_FIXED used (e.g.  for
shared or file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [6].  The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution.  We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
address but unlike MAP_FIXED it fails with EEXIST if the given range
conflicts with an existing one.  The flag is introduced as a completely
new one rather than a MAP_FIXED extension because of the backward
compatibility.  We really want a never-clobber semantic even on older
kernels which do not recognize the flag.  Unfortunately mmap sucks
wrt flags evaluation because we do not EINVAL on unknown flags.  On
those kernels we would simply use the traditional hint based semantic so
the caller can still get a different address (which sucks) but at least
not silently corrupt an existing mapping.  I do not see a good way
around that.  Except we won't export expose the new semantic to the
userspace at all.

It seems there are users who would like to have something like that.
Jemalloc has been mentioned by Michael Ellerman [7]

Florian Weimer has mentioned the following:
: glibc ld.so currently maps DSOs without hints.  This means that the kernel
: will map right next to each other, and the offsets between them a completely
: predictable.  We would like to change that and supply a random address in a
: window of the address space.  If there is a conflict, we do not want the
: kernel to pick a non-random address. Instead, we would try again with a
: random address.

John Hubbard has mentioned CUDA example
: a) Searches /proc/<pid>/maps for a "suitable" region of available
: VA space.  "Suitable" generally means it has to have a base address
: within a certain limited range (a particular device model might
: have odd limitations, for example), it has to be large enough, and
: alignment has to be large enough (again, various devices may have
: constraints that lead us to do this).
:
: This is of course subject to races with other threads in the process.
:
: Let's say it finds a region starting at va.
:
: b) Next it does:
:     p = mmap(va, ...)
:
: *without* setting MAP_FIXED, of course (so va is just a hint), to
: attempt to safely reserve that region. If p != va, then in most cases,
: this is a failure (almost certainly due to another thread getting a
: mapping from that region before we did), and so this layer now has to
: call munmap(), before returning a "failure: retry" to upper layers.
:
:     IMPROVEMENT: --> if instead, we could call this:
:
:             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
:
:         , then we could skip the munmap() call upon failure. This
:         is a small thing, but it is useful here. (Thanks to Piotr
:         Jaroszynski and Mark Hairgrove for helping me get that detail
:         exactly right, btw.)
:
: c) After that, CUDA suballocates from p, via:
:
:      q = mmap(sub_region_start, ... MAP_FIXED ...)
:
: Interestingly enough, "freeing" is also done via MAP_FIXED, and
: setting PROT_NONE to the subregion. Anyway, I just included (c) for
: general interest.

Atomic address range probing in the multithreaded programs in general
sounds like an interesting thing to me.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
should follow.  Actually real MAP_FIXED usages should be docummented
properly and they should be more of an exception.

[1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
[3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

This patch (of 2):

MAP_FIXED is used quite often to enforce mapping at the particular range.
The main problem of this flag is, however, that it is inherently dangerous
because it unmaps existing mappings covered by the requested range.  This
can cause silent memory corruptions.  Some of them even with serious
security implications.  While the current semantic might be really
desiderable in many cases there are others which would want to enforce the
given range but rather see a failure than a silent memory corruption on a
clashing range.  Please note that there is no guarantee that a given range
is obeyed by the mmap even when it is free - e.g.  arch specific code is
allowed to apply an alignment.

Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
request with a single exception that it fails with EEXIST if the requested
address is already covered by an existing mapping.  We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

The flag is introduced as a completely new one rather than a MAP_FIXED
extension because of the backward compatibility.  We really want a
never-clobber semantic even on older kernels which do not recognize the
flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
EINVAL on unknown flags.  On those kernels we would simply use the
traditional hint based semantic so the caller can still get a different
address (which sucks) but at least not silently corrupt an existing
mapping.  I do not see a good way around that.

[mpe@ellerman.id.au: fix whitespace]
[fail on clashing range with EEXIST as per Florian Weimer]
[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Evans <jasone@google.com>
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
..
alchemy MIPS changes for 4.17 2018-04-10 11:39:22 -07:00
ar7 MIPS: AR7: Constify gpio_led 2018-02-19 10:55:36 +00:00
ath25 MIPS: ath25: Check for kzalloc allocation failure 2018-02-23 09:35:54 +00:00
ath79 Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2017-11-14 17:52:21 -08:00
bcm47xx MIPS: BCM47XX: Use standard reset button for Luxul XWR-1750 2018-04-07 00:10:48 +01:00
bcm63xx MIPS: BCM63XX: kconfig: Remove blank help text 2018-02-02 23:53:10 +09:00
bmips License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
boot MIPS changes for 4.17 2018-04-10 11:39:22 -07:00
cavium-octeon MIPS changes for 4.17 2018-04-10 11:39:22 -07:00
cobalt MIPS: Cobalt: Fix typo 2016-08-03 08:16:30 +02:00
configs MIPS: generic: Add support for Microsemi Ocelot 2018-03-21 23:33:10 +00:00
crypto MIPS: crypto: Add crc32 and crc32c hw accelerated module 2018-02-19 20:50:36 +00:00
dec License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
emma MIPS: Avoid old-style declaration 2017-01-25 02:51:11 +01:00
fw License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
generic MIPS: generic: Add support for Microsemi Ocelot 2018-03-21 23:33:10 +00:00
include mm: introduce MAP_FIXED_NOREPLACE 2018-04-11 10:28:38 -07:00
jazz MIPS: Set I/O port resource types correctly 2017-12-18 23:07:45 -06:00
jz4740 spi: spi-gpio: Rewrite to use GPIO descriptors 2018-02-14 16:02:41 +00:00
kernel MIPS changes for 4.17 2018-04-10 11:39:22 -07:00
kvm KVM: introduce kvm_arch_vcpu_async_ioctl 2017-12-14 09:26:59 +01:00
lantiq MIPS: lantiq: ase: Enable MFD_SYSCON 2018-03-21 21:57:35 +00:00
lasat treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
lib MIPS: Implement __multi3 for GCC7 MIPS64r6 builds 2018-01-11 14:40:31 +01:00
loongson32 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
loongson64 dma/direct: Handle the memory encryption bit in common code 2018-03-20 10:01:59 +01:00
math-emu MIPS: math-emu: Mark fall throughs in switch statements with a comment 2017-12-12 17:20:20 +01:00
mm exec: pass stack rlimit into mm layout functions 2018-04-11 10:28:37 -07:00
mti-malta MIPS: Set I/O port resource types correctly 2017-12-18 23:07:45 -06:00
net MIPS: BPF: Replace __mips_isa_rev with MIPS_ISA_REV 2018-03-09 11:22:47 +00:00
netlogic mips/netlogic: remove swiotlb support 2018-01-15 09:35:59 +01:00
oprofile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
paravirt License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pci MIPS: pci-mt7620: Enable PCIe on MT7688 2018-03-14 15:14:05 +00:00
pic32 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pistachio License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pmcs-msp71xx License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pnx833x mtd: nand: Rename nand.h into rawnand.h 2017-08-13 10:11:49 +02:00
power License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ralink MIPS: ralink: Fix booting on MT7621 2018-03-22 00:06:30 +00:00
rb532 MIPS: RB532: Avoid undefined mac_pton without GENERIC_NET_UTILS 2018-01-10 16:39:03 +01:00
sgi-ip22 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
sgi-ip27 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sgi-ip32 mips: ip22/32: Convert timers to use timer_setup() 2017-11-02 15:50:36 -07:00
sibyte License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sni License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tools Update MIPS email addresses 2017-11-03 09:02:30 -07:00
txx9 MIPS: TXX9: Constify gpio_led 2018-02-19 10:55:36 +00:00
vdso MIPS: VDSO: Replace __mips_isa_rev with MIPS_ISA_REV 2018-03-09 11:22:48 +00:00
vr41xx License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kbuild MIPS: Disable Werror when W= is set 2017-04-10 11:56:07 +02:00
Kbuild.platforms MIPS: Xilfpga: Switch to using generic defconfigs 2017-11-08 22:54:14 +00:00
Kconfig MIPS changes for 4.17 2018-04-10 11:39:22 -07:00
Kconfig.debug MIPS: Fix CPS SMP NS16550 UART defaults 2018-01-10 16:44:49 +01:00
Makefile MIPS: Use the entry point from the ELF file header 2018-03-22 22:30:58 +00:00
Makefile.postlink License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00