545630 Commits

Author SHA1 Message Date
Leo Yan
27675ef03c rtc: pl031: fix typo for author email
The email address missed character ">", so add it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:08 +02:00
Javier Martinez Canillas
1c4fc2955a rtc: Export OF module alias information in missing drivers
The I2C core always reports the MODALIAS uevent as "i2c:<client name"
regardless if the driver was matched using the I2C id_table or the
of_match_table. So technically there's no need for a driver to export
the OF table since currently it's not used.

In fact, the I2C device ID table is mandatory for I2C drivers since
a i2c_device_id is passed to the driver's probe function even if the
I2C core used the OF table to match the driver.

And since the I2C core uses different tables, OF-only drivers needs to
have duplicated data that has to be kept in sync and also the dev node
compatible manufacturer prefix is stripped when reporting the MODALIAS.

To avoid the above, the I2C core behavior may be changed in the future
to not require an I2C device table for OF-only drivers and report the
OF module alias. So, it's better to also export the OF table to prevent
breaking module autoloading if that happens.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:08 +02:00
Adrian Huang
8109d44f76 rtc: cmos: Revert "rtc-cmos: Add an alarm disable quirk"
Commit d5a1c7e3fc38 ("rtc-cmos: Add an alarm disable quirk") that
added a special quirk is not needed because [PATCH 1/2] of this
patchset makes the kernel more robust:
rtc-cmos: Cancel alarm timer if alarm time is equal to now+1 seconds

Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Egbert Eich <eich@suse.de>
Tested-by: Diego Ercolani <diego.ercolani@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:08 +02:00
Adrian Huang
88b8d33b1c rtc: cmos: Cancel alarm timer if alarm time is equal to now+1 seconds
Steps to reproduce the problem:
	1) Enable RTC wake-up option in BIOS Setup
	2) Issue one of these commands in the OS: "poweroff"
	   or "shutdown -h now"
	3) System will shut down and then reboot automatically

Root-cause of the issue:
	1) During the shutdown process, the hwclock utility is used
	   to save the system clock to hardware clock (RTC).
	2) The hwclock utility invokes ioctl() with RTC_UIE_ON. The
	   kernel configures the RTC alarm for the periodic interrupt
	   (every 1 second).
	3) The hwclock uitlity closes the /dev/rtc0 device, and the
	   kernel disables the RTC alarm irq (AIE bit of Register B)
	   via ioctl() with RTC_UIE_OFF. But, the configured alarm
	   time is the current_time + 1.
	4) After the next 1 second is elapsed, the AF (alarm
	   interrupt flag) of Register C is set.
	5) The S5 handler in BIOS is invoked to configure alarm
	   registers (enable AIE bit and configure alarm date/time).
	   But, BIOS does not clear the previous interrupt status
	   during alarm configuration. Therefore, "AF=AIE=1" causes
	   the rtc device to trigger an interrupt.
	6) So, the machine reboots automatically right after shutdown.

This patch cancels the alarm timer if the following condictions are
met (suggested by Alexandre):
	1) The configured alarm time is equal to current_time + 1
	   seconds.
	2) The AIE timer is not in use.

The member 'alarm_expires' is introduced in struct cmos_rtc because
of the following reasons:
	1) The configured alarm time can be retrieved from
	   cmos_read_alarm(), but we need to take the 'wrapped
	   timestamp' and 'time rollover' into consideration. The
	   function __rtc_read_alarm() eliminates the concerns. To
	   avoid the duplicated code in the lower level RTC driver,
	   invoking __rtc_read_alarm from the lower level RTC driver
	   is not encouraged. Moreover, the compilation error 'the
	   undefined __rtc_read_alarm" is observed if the lower level
	   RTC driver is compiled as a kernel module.
	2) The uie_rtctimer.node.expires and aie_timer.node.expires can
	   be retrieved for the configured alarm time. But, the problem
	   is that either of them might configure the CMOS alarm time.
	   We cannot make sure UIE timer or AIE tiemr configured the
	   CMOS alarm time before. (uie_rtctimer or aie_timer is enabled
	   and then is disabled).
	3) The patch introduces the member 'alarm_expires' to keep the
	   newly configured alarm time, so the above-mentioned concerns
	   can be eliminated.

The issue goes away after 20-time shutdown tests.

Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Egbert Eich <eich@suse.de>
Tested-by: Diego Ercolani <diego.ercolani@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:08 +02:00
S Twiss
80ca3277bc rtc: da9063: Add DA9062 RTC capability to DA9063 RTC driver
Add DA9062 RTC support into the existing DA9063 RTC driver component by
using generic access tables for common register and bit mask definitions.

The following change will add generic register and bit mask support to the
DA9063 RTC. The changes are slightly complicated by requiring support for
three register sets: DA9063-AD, DA9063-BB and DA9062-AA.

The following alterations have been made to the DA9063 RTC:

- Addition of a da9063_compatible_rtc_regmap structure to hold all generic
  registers and bitmasks for this type of RTC component.
- A re-write of struct da9063 to use pointers for regmap and compatible
  registers/masks definitions
- Addition of a of_device_id table for DA9063 and DA9062 defaults
- Refactoring functions to use struct da9063_compatible_rtc accesses to
  generic registers/masks instead of using defines from registers.h
- Re-work of da9063_rtc_probe() to use of_match_node() and dev_get_regmap()
  to provide initialisation of generic registers and masks and access to
  regmap

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:08 +02:00
Henry Chen
d7f9777de8 rtc: mt6397: implement suspend/resume function in rtc-mt6397 driver
Implement the suspend/resume function in order to control rtc's irq_wake flag and handle as wakeup source.

Signed-off-by: Henry Chen <henryc.chen@mediatek.com>
Acked-by: Eddie Huang <eddie.huang@mediatek.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:08 +02:00
Dmitry Torokhov
3ee2c40b7a rtc: switch to using is_visible() to control sysfs attributes
Instead of creating wakealarm attribute manually, after the device has been
registered, let's rely on facilities provided by the attribute groups to
control which attributes are visible and which are not. This allows to
create all needed attributes at once, at the same time that we register RTC
class device.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Dmitry Torokhov
a17ccd1c6a rtc: switch wakealarm attribute to DEVICE_ATTR_RW
Instead of using older style DEVICE_ATTR for wakealarm attribute let's
switch to using DEVICE_ATTR_RW that ensures consistent across the kernel
permissions on the attribute.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Dmitry Torokhov
df100c017e rtc: make rtc_does_wakealarm() return boolean
Users of rtc_does_wakealarm() return value treat it as boolean so let's
change the signature accordingly.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Henri Roosen
f2284f9c90 rtc: rx8025: remove obsolete local_irq_disable() and local_irq_enable() for rtc_update_irq()
Since commit e6229bec25be ("rtc: make rtc_update_irq callable with irqs
enabled") rtc_update_irq() is callable with irqs enabled.

Signed-off-by: Henri Roosen <henriroosen@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Octavian Purdila
0d9030a2c3 rtc: fix drivers that consider 0 as a valid IRQ in client->irq
Since dab472eb931b ("i2c / ACPI: Use 0 to indicate that device does not
have interrupt assigned"), 0 is not a valid i2c client irq anymore, so
change all driver's checks accordingly.

The same issue occurs when the device is instantiated via device tree
with no IRQ, or from the i2c sysfs interface, even before the patch
above.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Dmitry Torokhov
1e4cd62558 rtc: dev: properly manage lifetime of dev and cdev in rtc device
struct rtc embeds both struct dev and struct cdev.  Unfortunately character
device structure may outlive the parent rtc structure unless we set it up
as parent of character device so that it will stay pinned until character
device is freed.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Dmitry Torokhov
c3b399a4b6 rtc: class: remove unnecessary device_get() in rtc_device_unregister
Technically the address of rtc->dev can never be NULL, so get_device()
can never fail. Also caller of rtc_device_unregister() supposed to be
the owner of the device and thus have a valid reference. Therefore
call to get_device() is not needed here.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Dmitry Torokhov
6706664d92 rtc: class: fix double free in rtc_register_device() error path
Commit 59cca865f21e ("drivers/rtc/class.c: fix device_register() error
handling") correctly noted that naked kfree() should not be used after
failed device_register() call, however, while it added the needed
put_device() it forgot to remove the original kfree() causing double-free.

Cc: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:07 +02:00
Guo Zeng
dfe6c04aa2 rtc: sirfsoc: move to regmap APIs from platform-specific APIs
The current codes use CSR platform specific API exported by machine
codes to read/write RTC registers. they are:
sirfsoc_rtc_iobrg_readl()
sirfsoc_rtc_iobrg_writel()

commit b1999477ed91 ("ARM: prima2: move to use REGMAP APIs for rtciobrg")
moves to regmap support, now we can move to use regmap APIs in RTC
driver.

Signed-off-by: Guo Zeng <guo.zeng@csr.com>
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Vaibhav Jain
f4a2eecb3f rtc: opal: Enable alarms only when opal supports tpo
rtc-opal driver provides support for rtc alarms via
timed-power-on(tpo). However some Power platforms like BML use a fake
rtc clock and don't support tpo. Such platforms are indicated by the
missing 'has-tpo' property in the device tree.

Current implementation however enables callback for
rtc_class_ops.read/set alarm irrespective of the tpo support from the
platform. This results in a failed opal call when kernel tries to read
an existing alarms via opal_get_tpo_time during rtc device registration.

This patch fixes this issue by setting opal_rtc_ops.read/set_alarm
callback pointers only when tpo is supported.

Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Joachim Eastwood
c28b42e3ae rtc: add rtc-lpc24xx driver
Add driver for the RTC found on NXP LPC178x/18xx/408x/43xx devices.
The RTC provides calendar and clock functionality together with
alarm interrupt support.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Joachim Eastwood
dcb9372b34 doc: dt: add documentation for nxp,lpc1788-rtc
Document NXP LPC178x/18xx/408x/43xx bindings

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Krzysztof Kozlowski
045c6fdd37 rtc: Drop owner assignment from platform_driver
platform_driver does not need to set an owner because
platform_driver_register() will set it.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Krzysztof Kozlowski
b28845433e rtc: Drop owner assignment from i2c_driver
i2c_driver does not need to set an owner because i2c_register_driver()
will set it.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Andrea Scian
653ebd75e9 rtc: pcf2127: use OFS flag to detect unreliable date and warn the user
The PCF2127 datasheet states that it's wrong to say that the date in
unreliable if BLF (battery low flag) is set but instead, OSF (seconds
register) should be used to check if oscillator, for any reason, stopped.
Battery may be low (usually below 2V5 threshold) but the date may be anyway
correct (typically date is unreliable when input voltage is below 1V2).

Signed-off-by: Andrea Scian <andrea.scian@dave.eu>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Andrea Scian
821f51c4da rtc: use rtc_valid_tm() error code when reading date/time
There's a wrong comment in some RTC drivers that say it's better to ignore
rtc_valid_tm() when reading RTC timestamp. However this is wrong and is
better to return to the userspace the error if timestamp is not valid.

Signed-off-by: Andrea Scian <andrea.scian@dave.eu>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:06 +02:00
Vaibhav Hiremath
4ab8210313 rtc: 88pm80x: add device tree support
Along with DT support, this patch also cleans up the unnecessary
code around 'rtc_wakeup' initialization.

Signed-off-by: Chao Xie <chao.xie@marvell.com>
Signed-off-by: Vaibhav Hiremath <vaibhav.hiremath@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
Maninder Singh
617f6f7ef5 rtc: bq32k: remove redundant check
removing below static analysis error:
(error) Possible null pointer dereference: client

if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C))
							^^^^^^^
Error comes because client is dereferenced before NULL check.
So probably NULL this check is not required.

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
Vaishali Thakkar
508db592e2 rtc: ds1685: Use module_platform_driver
Use module_platform_driver for drivers whose init and exit functions
only register and unregister, respectively.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@a@
identifier f, x;
@@
-static f(...) { return platform_driver_register(&x); }

@b depends on a@
identifier e, a.x;
@@
-static e(...) { platform_driver_unregister(&x); }

@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);

@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_platform_driver;
@@
-module_exit(e);
+module_platform_driver(x);

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
Nishanth Menon
7abea617a4 rtc: ds1307: Support optional wakeup interrupt source
With the recent pinctrl-single changes, SoCs such as Texas
Instrument's OMAP processors can treat wake-up events from deeper idle
states as interrupts.

Let's add support for the optional second interrupt for wake-up using
the generic wakeirq support added in commit 4990d4fe327b ("PM /
Wakeirq: Add automated device wake IRQ handling")

Finally, to pass the wake-up interrupt in the dts file,
interrupts-extended property needs to be passed.

This is similar in approach to commit 2a0b965cfb6e ("serial: omap: Add
support for optional wake-up") + ee83bd3b6483 ("serial: omap: Switch
wake-up interrupt to generic wakeirq")

Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
Nishanth Menon
eac7237fd8 rtc: ds1307: Sort the headers
It is always a good practice to keep the #includes sorted

Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
Nishanth Menon
c598319136 rtc: ds1307: Switch to managed irq allocation
Since we are not doing anything fancy in remove function that requires
us to sequence IRQ free operation, we might as well switch over to devm_
equivalent of managed IRQ allocation and remove the explicit free_irq
since it'd be done automatically at remove.

Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
Felipe Balbi
2fb07a10e0 rtc: ds1307: Convert to threaded IRQ
The driver currently emulates the concept of threaded IRQ using a
workqueue, which it really does not need to. Instead, switch over to
threaded_irq handlers which is meant precisely for the same purpose.

Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05 13:19:05 +02:00
NeilBrown
e89c6fdf9e Merge linux-block/for-4.3/core into md/for-linux
There were a few conflicts that are fairly easy to resolve.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-05 11:08:32 +02:00
Nicholas Krause
559ec2f8fd mm/hugetlb.c: make vma_has_reserves() return bool
This makes vma_has_reserves() return bool due to this particular function
only returning either one or zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Nicholas Krause
1ecef9ed0f mm/madvise.c: make madvise_behaviour_valid() return bool
This makes the madvise_bahaviour_valid() function return bool due to
this particular function always returning the value of either one or
zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Nicholas Krause
ca1d6c7d9d mm/memory.c: make tlb_next_batch() return bool
This makes the tlb_next_batch() bool due to this particular function only
ever returning either one or zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Nicholas Krause
d9e7e37b4d mm/dmapool.c: change is_page_busy() return from int to bool
This makes the function is_page_busy() return bool rather then an int now
due to this particular function's single return statement only ever
evaulating to either one or zero.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
minkyung88.kim
4e6dab4233 mm: remove struct node_active_region
struct node_active_region is not used anymore.  Remove it.

Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Oleg Nesterov
9943242ca4 mremap: simplify the "overlap" check in mremap_to()
Minor, but this check is overcomplicated.  Two half-intervals do NOT
overlap if END1 <= START2 || END2 <= START1, mremap_to() just needs to
negate this check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Oleg Nesterov
1d39168697 mremap: don't do uneccesary checks if new_len == old_len
The "new_len > old_len" branch in vma_to_resize() looks very confusing.
It only covers the VM_DONTEXPAND/pgoff checks but everything below is
equally unneeded if new_len == old_len.

Change this code to return if "new_len == old_len", new_len < old_len is
not possible, otherwise the code below is wrong anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Oleg Nesterov
d456fb9e52 mremap: don't do mm_populate(new_addr) on failure
move_vma() sets *locked even if move_page_tables() or ->mremap() fails,
change sys_mremap() to check "ret & ~PAGE_MASK".

I think we should simply remove the VM_LOCKED code in move_vma(), that is
why this patch doesn't change move_vma().  But this needs more cleanups.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Oleg Nesterov
5477e70a64 mm: move ->mremap() from file_operations to vm_operations_struct
vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
way ->mremap() can have more users.  Say, vdso.

While at it, s/aio_ring_remap/aio_ring_mremap/.

Note: this is the minimal change before ->mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Oleg Nesterov
df1eab303c mremap: don't leak new_vma if f_op->mremap() fails
move_vma() can't just return if f_op->mremap() fails, we should unmap the
new vma like we do if move_page_tables() fails.  To avoid the code
duplication this patch moves the "move entries back" under the new "if
(err)" branch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Nicholas Krause
31aafb45f4 mm/hugetlb.c: make vma_shareable() return bool
This makes vma_shareable() return bool now due to this particular function
only ever returning either one or zero as its return value.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Kirill A. Shutemov
1027e4436b mm: make GUP handle pfn mapping unless FOLL_GET is requested
With DAX, pfn mapping becoming more common.  The patch adjusts GUP code to
cover pfn mapping for cases when we don't need struct page to proceed.

To make it possible, let's change follow_page() code to return -EEXIST
error code if proper page table entry exists, but no corresponding struct
page.  __get_user_page() would ignore the error code and move to the next
page frame.

The immediate effect of the change is working MAP_POPULATE and mlock() on
DAX mappings.

[akpm@linux-foundation.org: fix arm64 build]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Kirill A. Shutemov
d899844e9c mm: fix status code which move_pages() returns for zero page
The manpage for move_pages(2) specifies that status code for zero page is
supposed to be -EFAULT.  Currently kernel return -ENOENT in this case.

follow_page() can do it for us, if we would ask for FOLL_DUMP.  The use of
FOLL_DUMP also means that the upper layer page tables pages are no longer
allocated.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Sebastian Andrzej Siewior
ce9ce6659a mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
Clark stumbled over a VM_BUG_ON() in -RT which was then was removed by
Johannes in commit f371763a79d ("mm: memcontrol: fix false-positive
VM_BUG_ON() on -rt").  The comment before that patch was a tiny bit better
than it is now.  While the patch claimed to fix a false-postive on -RT
this was not the case.  None of the -RT folks ACKed it and it was not a
false positive report.  That was a *real* problem.

This patch updates the comment that is improper because it refers to
"disabled preemption" as a consequence of that lock being taken.  A
spin_lock() disables preemption, true, but in this case the code relies on
the fact that the lock _also_ disables interrupts once it is acquired.
And this is the important detail (which was checked the VM_BUG_ON()) which
needs to be pointed out.  This is the hint one needs while looking at the
code.  It was explained by Johannes on the list that the per-CPU variables
are protected by local_irq_save().  The BUG_ON() was helpful.  This code
has been workarounded in -RT in the meantime.  I wouldn't mind running
into more of those if the code in question uses *special* kind of locking
since now there is no verification (in terms of lockdep or BUG_ON()) and
therefore I bring the VM_BUG_ON() check back in.

The two functions after the comment could also have a "local_irq_save()"
dance around them in order to serialize access to the per-CPU variables.
This has been avoided because the interrupts should be off.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy
c98c36355d genalloc: add support of multiple gen_pools per device
This change fills devm_gen_pool_create()/gen_pool_get() "name" argument
stub with contents and extends of_gen_pool_get() functionality on this
basis.

If there is no associated platform device with a device node passed to
of_gen_pool_get(), the function attempts to get a label property or device
node name (= repeats MTD OF partition standard) and seeks for a named
gen_pool registered by device of the parent device node.

The main idea of the change is to allow registration of independent
gen_pools under the same umbrella device, say "partitions" on "storage
device", the original functionality of one "partition" per "storage
device" is untouched.

[akpm@linux-foundation.org: fix constness in devres_find()]
[dan.carpenter@oracle.com: freeing const data pointers]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy
7385817359 genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
This change modifies gen_pool_get() and devm_gen_pool_create() client
interfaces adding one more argument "name" of a gen_pool object.

Due to implementation gen_pool_get() is capable to retrieve only one
gen_pool associated with a device even if multiple gen_pools are created,
fortunately right at the moment it is sufficient for the clients, hence
provide NULL as a valid argument on both producer devm_gen_pool_create()
and consumer gen_pool_get() sides.

Because only one created gen_pool per device is addressable, explicitly
add a restriction to devm_gen_pool_create() to create only one gen_pool
per device, this implies two possible error codes returned by the
function, account it on client side (only misc/sram).  This completes
client side changes related to genalloc updates.

[akpm@linux-foundation.org: gen_pool_get() cleanup]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Wei Yang
c0a2949883 mm/memblock: WARN_ON when nid differs from overlap region
Each memblock_region has nid to indicates the Node ID of this range.  For
the overlap case, memblock_add_range() inserts the lower part and leave
the upper part as indicated in the overlapped region.

If the nid of the new range differs from the overlapped region, the
information recorded is not correct.

This patch adds a WARN_ON when the nid of the new range differs from the
overlapped region.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Mel Gorman
c7e1e3ccfb Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Mel Gorman
d950c9477d mm: defer flush of writable TLB entries
If a PTE is unmapped and it's dirty then it was writable recently.  Due to
deferred TLB flushing, it's best to assume a writable TLB cache entry
exists.  With that assumption, the TLB must be flushed before any IO can
start or the page is freed to avoid lost writes or data corruption.  This
patch defers flushing of potentially writable TLBs as long as possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Mel Gorman
72b252aed5 mm: send one IPI per CPU to TLB flush all entries after unmapping pages
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs.  There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.

On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high.  This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped.  When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.

Architectures wishing to do this must give the following guarantee.

        If a clean page is unmapped and not immediately flushed, the
        architecture must guarantee that a write to that linear address
        from a CPU with a cached TLB entry will trap a page fault.

This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting.  The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry.  An additional
architecture helper called flush_tlb_local is required.  It's a trivial
wrapper with some accounting in the x86 case.

The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure.  The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite.  The test case uses NR_CPU
readers of mapped files that consume 10*RAM.

Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User          581.00       611.43
System       5804.93      4111.76
Elapsed       161.03       122.12

This is showing that the readers completed 24.40% faster with 29% less
system CPU time.  From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.

The impact is lower on a single socket machine.

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User           58.09        57.64
System        111.82        76.56
Elapsed        27.29        22.55

It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.

The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages.  It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes.  Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.

[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00