linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-01-13 00:20:06 +00:00

Author	SHA1	Message	Date
Leo Yan	27675ef03c	rtc: pl031: fix typo for author email The email address missed character ">", so add it. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:08 +02:00
Javier Martinez Canillas	1c4fc2955a	rtc: Export OF module alias information in missing drivers The I2C core always reports the MODALIAS uevent as "i2c:<client name" regardless if the driver was matched using the I2C id_table or the of_match_table. So technically there's no need for a driver to export the OF table since currently it's not used. In fact, the I2C device ID table is mandatory for I2C drivers since a i2c_device_id is passed to the driver's probe function even if the I2C core used the OF table to match the driver. And since the I2C core uses different tables, OF-only drivers needs to have duplicated data that has to be kept in sync and also the dev node compatible manufacturer prefix is stripped when reporting the MODALIAS. To avoid the above, the I2C core behavior may be changed in the future to not require an I2C device table for OF-only drivers and report the OF module alias. So, it's better to also export the OF table to prevent breaking module autoloading if that happens. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:08 +02:00
Adrian Huang	8109d44f76	rtc: cmos: Revert "rtc-cmos: Add an alarm disable quirk" Commit d5a1c7e3fc38 ("rtc-cmos: Add an alarm disable quirk") that added a special quirk is not needed because [PATCH 1/2] of this patchset makes the kernel more robust: rtc-cmos: Cancel alarm timer if alarm time is equal to now+1 seconds Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Tested-by: Egbert Eich <eich@suse.de> Tested-by: Diego Ercolani <diego.ercolani@gmail.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:08 +02:00
Adrian Huang	88b8d33b1c	rtc: cmos: Cancel alarm timer if alarm time is equal to now+1 seconds Steps to reproduce the problem: 1) Enable RTC wake-up option in BIOS Setup 2) Issue one of these commands in the OS: "poweroff" or "shutdown -h now" 3) System will shut down and then reboot automatically Root-cause of the issue: 1) During the shutdown process, the hwclock utility is used to save the system clock to hardware clock (RTC). 2) The hwclock utility invokes ioctl() with RTC_UIE_ON. The kernel configures the RTC alarm for the periodic interrupt (every 1 second). 3) The hwclock uitlity closes the /dev/rtc0 device, and the kernel disables the RTC alarm irq (AIE bit of Register B) via ioctl() with RTC_UIE_OFF. But, the configured alarm time is the current_time + 1. 4) After the next 1 second is elapsed, the AF (alarm interrupt flag) of Register C is set. 5) The S5 handler in BIOS is invoked to configure alarm registers (enable AIE bit and configure alarm date/time). But, BIOS does not clear the previous interrupt status during alarm configuration. Therefore, "AF=AIE=1" causes the rtc device to trigger an interrupt. 6) So, the machine reboots automatically right after shutdown. This patch cancels the alarm timer if the following condictions are met (suggested by Alexandre): 1) The configured alarm time is equal to current_time + 1 seconds. 2) The AIE timer is not in use. The member 'alarm_expires' is introduced in struct cmos_rtc because of the following reasons: 1) The configured alarm time can be retrieved from cmos_read_alarm(), but we need to take the 'wrapped timestamp' and 'time rollover' into consideration. The function __rtc_read_alarm() eliminates the concerns. To avoid the duplicated code in the lower level RTC driver, invoking __rtc_read_alarm from the lower level RTC driver is not encouraged. Moreover, the compilation error 'the undefined __rtc_read_alarm" is observed if the lower level RTC driver is compiled as a kernel module. 2) The uie_rtctimer.node.expires and aie_timer.node.expires can be retrieved for the configured alarm time. But, the problem is that either of them might configure the CMOS alarm time. We cannot make sure UIE timer or AIE tiemr configured the CMOS alarm time before. (uie_rtctimer or aie_timer is enabled and then is disabled). 3) The patch introduces the member 'alarm_expires' to keep the newly configured alarm time, so the above-mentioned concerns can be eliminated. The issue goes away after 20-time shutdown tests. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Tested-by: Egbert Eich <eich@suse.de> Tested-by: Diego Ercolani <diego.ercolani@gmail.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:08 +02:00
S Twiss	80ca3277bc	rtc: da9063: Add DA9062 RTC capability to DA9063 RTC driver Add DA9062 RTC support into the existing DA9063 RTC driver component by using generic access tables for common register and bit mask definitions. The following change will add generic register and bit mask support to the DA9063 RTC. The changes are slightly complicated by requiring support for three register sets: DA9063-AD, DA9063-BB and DA9062-AA. The following alterations have been made to the DA9063 RTC: - Addition of a da9063_compatible_rtc_regmap structure to hold all generic registers and bitmasks for this type of RTC component. - A re-write of struct da9063 to use pointers for regmap and compatible registers/masks definitions - Addition of a of_device_id table for DA9063 and DA9062 defaults - Refactoring functions to use struct da9063_compatible_rtc accesses to generic registers/masks instead of using defines from registers.h - Re-work of da9063_rtc_probe() to use of_match_node() and dev_get_regmap() to provide initialisation of generic registers and masks and access to regmap Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:08 +02:00
Henry Chen	d7f9777de8	rtc: mt6397: implement suspend/resume function in rtc-mt6397 driver Implement the suspend/resume function in order to control rtc's irq_wake flag and handle as wakeup source. Signed-off-by: Henry Chen <henryc.chen@mediatek.com> Acked-by: Eddie Huang <eddie.huang@mediatek.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:08 +02:00
Dmitry Torokhov	3ee2c40b7a	rtc: switch to using is_visible() to control sysfs attributes Instead of creating wakealarm attribute manually, after the device has been registered, let's rely on facilities provided by the attribute groups to control which attributes are visible and which are not. This allows to create all needed attributes at once, at the same time that we register RTC class device. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	a17ccd1c6a	rtc: switch wakealarm attribute to DEVICE_ATTR_RW Instead of using older style DEVICE_ATTR for wakealarm attribute let's switch to using DEVICE_ATTR_RW that ensures consistent across the kernel permissions on the attribute. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	df100c017e	rtc: make rtc_does_wakealarm() return boolean Users of rtc_does_wakealarm() return value treat it as boolean so let's change the signature accordingly. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Henri Roosen	f2284f9c90	rtc: rx8025: remove obsolete local_irq_disable() and local_irq_enable() for rtc_update_irq() Since commit e6229bec25be ("rtc: make rtc_update_irq callable with irqs enabled") rtc_update_irq() is callable with irqs enabled. Signed-off-by: Henri Roosen <henriroosen@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Octavian Purdila	0d9030a2c3	rtc: fix drivers that consider 0 as a valid IRQ in client->irq Since dab472eb931b ("i2c / ACPI: Use 0 to indicate that device does not have interrupt assigned"), 0 is not a valid i2c client irq anymore, so change all driver's checks accordingly. The same issue occurs when the device is instantiated via device tree with no IRQ, or from the i2c sysfs interface, even before the patch above. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	1e4cd62558	rtc: dev: properly manage lifetime of dev and cdev in rtc device struct rtc embeds both struct dev and struct cdev. Unfortunately character device structure may outlive the parent rtc structure unless we set it up as parent of character device so that it will stay pinned until character device is freed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	c3b399a4b6	rtc: class: remove unnecessary device_get() in rtc_device_unregister Technically the address of rtc->dev can never be NULL, so get_device() can never fail. Also caller of rtc_device_unregister() supposed to be the owner of the device and thus have a valid reference. Therefore call to get_device() is not needed here. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Dmitry Torokhov	6706664d92	rtc: class: fix double free in rtc_register_device() error path Commit 59cca865f21e ("drivers/rtc/class.c: fix device_register() error handling") correctly noted that naked kfree() should not be used after failed device_register() call, however, while it added the needed put_device() it forgot to remove the original kfree() causing double-free. Cc: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:07 +02:00
Guo Zeng	dfe6c04aa2	rtc: sirfsoc: move to regmap APIs from platform-specific APIs The current codes use CSR platform specific API exported by machine codes to read/write RTC registers. they are: sirfsoc_rtc_iobrg_readl() sirfsoc_rtc_iobrg_writel() commit b1999477ed91 ("ARM: prima2: move to use REGMAP APIs for rtciobrg") moves to regmap support, now we can move to use regmap APIs in RTC driver. Signed-off-by: Guo Zeng <guo.zeng@csr.com> Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Vaibhav Jain	f4a2eecb3f	rtc: opal: Enable alarms only when opal supports tpo rtc-opal driver provides support for rtc alarms via timed-power-on(tpo). However some Power platforms like BML use a fake rtc clock and don't support tpo. Such platforms are indicated by the missing 'has-tpo' property in the device tree. Current implementation however enables callback for rtc_class_ops.read/set alarm irrespective of the tpo support from the platform. This results in a failed opal call when kernel tries to read an existing alarms via opal_get_tpo_time during rtc device registration. This patch fixes this issue by setting opal_rtc_ops.read/set_alarm callback pointers only when tpo is supported. Acked-by: Michael Neuling <mikey@neuling.org> Acked-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Joachim Eastwood	c28b42e3ae	rtc: add rtc-lpc24xx driver Add driver for the RTC found on NXP LPC178x/18xx/408x/43xx devices. The RTC provides calendar and clock functionality together with alarm interrupt support. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Joachim Eastwood	dcb9372b34	doc: dt: add documentation for nxp,lpc1788-rtc Document NXP LPC178x/18xx/408x/43xx bindings Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Krzysztof Kozlowski	045c6fdd37	rtc: Drop owner assignment from platform_driver platform_driver does not need to set an owner because platform_driver_register() will set it. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Krzysztof Kozlowski	b28845433e	rtc: Drop owner assignment from i2c_driver i2c_driver does not need to set an owner because i2c_register_driver() will set it. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Andrea Scian	653ebd75e9	rtc: pcf2127: use OFS flag to detect unreliable date and warn the user The PCF2127 datasheet states that it's wrong to say that the date in unreliable if BLF (battery low flag) is set but instead, OSF (seconds register) should be used to check if oscillator, for any reason, stopped. Battery may be low (usually below 2V5 threshold) but the date may be anyway correct (typically date is unreliable when input voltage is below 1V2). Signed-off-by: Andrea Scian <andrea.scian@dave.eu> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Andrea Scian	821f51c4da	rtc: use rtc_valid_tm() error code when reading date/time There's a wrong comment in some RTC drivers that say it's better to ignore rtc_valid_tm() when reading RTC timestamp. However this is wrong and is better to return to the userspace the error if timestamp is not valid. Signed-off-by: Andrea Scian <andrea.scian@dave.eu> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:06 +02:00
Vaibhav Hiremath	4ab8210313	rtc: 88pm80x: add device tree support Along with DT support, this patch also cleans up the unnecessary code around 'rtc_wakeup' initialization. Signed-off-by: Chao Xie <chao.xie@marvell.com> Signed-off-by: Vaibhav Hiremath <vaibhav.hiremath@linaro.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Maninder Singh	617f6f7ef5	rtc: bq32k: remove redundant check removing below static analysis error: (error) Possible null pointer dereference: client if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) ^^^^^^^ Error comes because client is dereferenced before NULL check. So probably NULL this check is not required. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Vaishali Thakkar	508db592e2	rtc: ds1685: Use module_platform_driver Use module_platform_driver for drivers whose init and exit functions only register and unregister, respectively. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @a@ identifier f, x; @@ -static f(...) { return platform_driver_register(&x); } @b depends on a@ identifier e, a.x; @@ -static e(...) { platform_driver_unregister(&x); } @c depends on a && b@ identifier a.f; declarer name module_init; @@ -module_init(f); @d depends on a && b && c@ identifier b.e, a.x; declarer name module_exit; declarer name module_platform_driver; @@ -module_exit(e); +module_platform_driver(x); Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Nishanth Menon	7abea617a4	rtc: ds1307: Support optional wakeup interrupt source With the recent pinctrl-single changes, SoCs such as Texas Instrument's OMAP processors can treat wake-up events from deeper idle states as interrupts. Let's add support for the optional second interrupt for wake-up using the generic wakeirq support added in commit 4990d4fe327b ("PM / Wakeirq: Add automated device wake IRQ handling") Finally, to pass the wake-up interrupt in the dts file, interrupts-extended property needs to be passed. This is similar in approach to commit 2a0b965cfb6e ("serial: omap: Add support for optional wake-up") + ee83bd3b6483 ("serial: omap: Switch wake-up interrupt to generic wakeirq") Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Nishanth Menon	eac7237fd8	rtc: ds1307: Sort the headers It is always a good practice to keep the #includes sorted Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Nishanth Menon	c598319136	rtc: ds1307: Switch to managed irq allocation Since we are not doing anything fancy in remove function that requires us to sequence IRQ free operation, we might as well switch over to devm_ equivalent of managed IRQ allocation and remove the explicit free_irq since it'd be done automatically at remove. Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
Felipe Balbi	2fb07a10e0	rtc: ds1307: Convert to threaded IRQ The driver currently emulates the concept of threaded IRQ using a workqueue, which it really does not need to. Instead, switch over to threaded_irq handlers which is meant precisely for the same purpose. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2015-09-05 13:19:05 +02:00
NeilBrown	e89c6fdf9e	Merge linux-block/for-4.3/core into md/for-linux There were a few conflicts that are fairly easy to resolve. Signed-off-by: NeilBrown <neilb@suse.com>	2015-09-05 11:08:32 +02:00
Nicholas Krause	559ec2f8fd	mm/hugetlb.c: make vma_has_reserves() return bool This makes vma_has_reserves() return bool due to this particular function only returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	1ecef9ed0f	mm/madvise.c: make madvise_behaviour_valid() return bool This makes the madvise_bahaviour_valid() function return bool due to this particular function always returning the value of either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	ca1d6c7d9d	mm/memory.c: make tlb_next_batch() return bool This makes the tlb_next_batch() bool due to this particular function only ever returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	d9e7e37b4d	mm/dmapool.c: change is_page_busy() return from int to bool This makes the function is_page_busy() return bool rather then an int now due to this particular function's single return statement only ever evaulating to either one or zero. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
minkyung88.kim	4e6dab4233	mm: remove struct node_active_region struct node_active_region is not used anymore. Remove it. Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	9943242ca4	mremap: simplify the "overlap" check in mremap_to() Minor, but this check is overcomplicated. Two half-intervals do NOT overlap if END1 <= START2 \|\| END2 <= START1, mremap_to() just needs to negate this check. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	1d39168697	mremap: don't do uneccesary checks if new_len == old_len The "new_len > old_len" branch in vma_to_resize() looks very confusing. It only covers the VM_DONTEXPAND/pgoff checks but everything below is equally unneeded if new_len == old_len. Change this code to return if "new_len == old_len", new_len < old_len is not possible, otherwise the code below is wrong anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	d456fb9e52	mremap: don't do mm_populate(new_addr) on failure move_vma() sets *locked even if move_page_tables() or ->mremap() fails, change sys_mremap() to check "ret & ~PAGE_MASK". I think we should simply remove the VM_LOCKED code in move_vma(), that is why this patch doesn't change move_vma(). But this needs more cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	5477e70a64	mm: move ->mremap() from file_operations to vm_operations_struct vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this way ->mremap() can have more users. Say, vdso. While at it, s/aio_ring_remap/aio_ring_mremap/. Note: this is the minimal change before ->mremap() finds another user in file_operations; this method should have more arguments, and it can be used to kill arch_remap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Oleg Nesterov	df1eab303c	mremap: don't leak new_vma if f_op->mremap() fails move_vma() can't just return if f_op->mremap() fails, we should unmap the new vma like we do if move_page_tables() fails. To avoid the code duplication this patch moves the "move entries back" under the new "if (err)" branch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Nicholas Krause	31aafb45f4	mm/hugetlb.c: make vma_shareable() return bool This makes vma_shareable() return bool now due to this particular function only ever returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Kirill A. Shutemov	1027e4436b	mm: make GUP handle pfn mapping unless FOLL_GET is requested With DAX, pfn mapping becoming more common. The patch adjusts GUP code to cover pfn mapping for cases when we don't need struct page to proceed. To make it possible, let's change follow_page() code to return -EEXIST error code if proper page table entry exists, but no corresponding struct page. __get_user_page() would ignore the error code and move to the next page frame. The immediate effect of the change is working MAP_POPULATE and mlock() on DAX mappings. [akpm@linux-foundation.org: fix arm64 build] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Kirill A. Shutemov	d899844e9c	mm: fix status code which move_pages() returns for zero page The manpage for move_pages(2) specifies that status code for zero page is supposed to be -EFAULT. Currently kernel return -ENOENT in this case. follow_page() can do it for us, if we would ask for FOLL_DUMP. The use of FOLL_DUMP also means that the upper layer page tables pages are no longer allocated. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Sebastian Andrzej Siewior	ce9ce6659a	mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout() Clark stumbled over a VM_BUG_ON() in -RT which was then was removed by Johannes in commit f371763a79d ("mm: memcontrol: fix false-positive VM_BUG_ON() on -rt"). The comment before that patch was a tiny bit better than it is now. While the patch claimed to fix a false-postive on -RT this was not the case. None of the -RT folks ACKed it and it was not a false positive report. That was a real problem. This patch updates the comment that is improper because it refers to "disabled preemption" as a consequence of that lock being taken. A spin_lock() disables preemption, true, but in this case the code relies on the fact that the lock _also_ disables interrupts once it is acquired. And this is the important detail (which was checked the VM_BUG_ON()) which needs to be pointed out. This is the hint one needs while looking at the code. It was explained by Johannes on the list that the per-CPU variables are protected by local_irq_save(). The BUG_ON() was helpful. This code has been workarounded in -RT in the meantime. I wouldn't mind running into more of those if the code in question uses special kind of locking since now there is no verification (in terms of lockdep or BUG_ON()) and therefore I bring the VM_BUG_ON() check back in. The two functions after the comment could also have a "local_irq_save()" dance around them in order to serialize access to the per-CPU variables. This has been avoided because the interrupts should be off. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy	c98c36355d	genalloc: add support of multiple gen_pools per device This change fills devm_gen_pool_create()/gen_pool_get() "name" argument stub with contents and extends of_gen_pool_get() functionality on this basis. If there is no associated platform device with a device node passed to of_gen_pool_get(), the function attempts to get a label property or device node name (= repeats MTD OF partition standard) and seeks for a named gen_pool registered by device of the parent device node. The main idea of the change is to allow registration of independent gen_pools under the same umbrella device, say "partitions" on "storage device", the original functionality of one "partition" per "storage device" is untouched. [akpm@linux-foundation.org: fix constness in devres_find()] [dan.carpenter@oracle.com: freeing const data pointers] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy	7385817359	genalloc: add name arg to gen_pool_get() and devm_gen_pool_create() This change modifies gen_pool_get() and devm_gen_pool_create() client interfaces adding one more argument "name" of a gen_pool object. Due to implementation gen_pool_get() is capable to retrieve only one gen_pool associated with a device even if multiple gen_pools are created, fortunately right at the moment it is sufficient for the clients, hence provide NULL as a valid argument on both producer devm_gen_pool_create() and consumer gen_pool_get() sides. Because only one created gen_pool per device is addressable, explicitly add a restriction to devm_gen_pool_create() to create only one gen_pool per device, this implies two possible error codes returned by the function, account it on client side (only misc/sram). This completes client side changes related to genalloc updates. [akpm@linux-foundation.org: gen_pool_get() cleanup] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Wei Yang	c0a2949883	mm/memblock: WARN_ON when nid differs from overlap region Each memblock_region has nid to indicates the Node ID of this range. For the overlap case, memblock_add_range() inserts the lower part and leave the upper part as indicated in the overlapped region. If the nid of the new range differs from the overlapped region, the information recorded is not correct. This patch adds a WARN_ON when the nid of the new range differs from the overlapped region. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	c7e1e3ccfb	Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	d950c9477d	mm: defer flush of writable TLB entries If a PTE is unmapped and it's dirty then it was writable recently. Due to deferred TLB flushing, it's best to assume a writable TLB cache entry exists. With that assumption, the TLB must be flushed before any IO can start or the page is freed to avoid lost writes or data corruption. This patch defers flushing of potentially writable TLBs as long as possible. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Mel Gorman	72b252aed5	mm: send one IPI per CPU to TLB flush all entries after unmapping pages An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00

... 4 5 6 7 8 ...

545630 Commits