The email address missed character ">", so add it.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The I2C core always reports the MODALIAS uevent as "i2c:<client name"
regardless if the driver was matched using the I2C id_table or the
of_match_table. So technically there's no need for a driver to export
the OF table since currently it's not used.
In fact, the I2C device ID table is mandatory for I2C drivers since
a i2c_device_id is passed to the driver's probe function even if the
I2C core used the OF table to match the driver.
And since the I2C core uses different tables, OF-only drivers needs to
have duplicated data that has to be kept in sync and also the dev node
compatible manufacturer prefix is stripped when reporting the MODALIAS.
To avoid the above, the I2C core behavior may be changed in the future
to not require an I2C device table for OF-only drivers and report the
OF module alias. So, it's better to also export the OF table to prevent
breaking module autoloading if that happens.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Commit d5a1c7e3fc38 ("rtc-cmos: Add an alarm disable quirk") that
added a special quirk is not needed because [PATCH 1/2] of this
patchset makes the kernel more robust:
rtc-cmos: Cancel alarm timer if alarm time is equal to now+1 seconds
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Egbert Eich <eich@suse.de>
Tested-by: Diego Ercolani <diego.ercolani@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Steps to reproduce the problem:
1) Enable RTC wake-up option in BIOS Setup
2) Issue one of these commands in the OS: "poweroff"
or "shutdown -h now"
3) System will shut down and then reboot automatically
Root-cause of the issue:
1) During the shutdown process, the hwclock utility is used
to save the system clock to hardware clock (RTC).
2) The hwclock utility invokes ioctl() with RTC_UIE_ON. The
kernel configures the RTC alarm for the periodic interrupt
(every 1 second).
3) The hwclock uitlity closes the /dev/rtc0 device, and the
kernel disables the RTC alarm irq (AIE bit of Register B)
via ioctl() with RTC_UIE_OFF. But, the configured alarm
time is the current_time + 1.
4) After the next 1 second is elapsed, the AF (alarm
interrupt flag) of Register C is set.
5) The S5 handler in BIOS is invoked to configure alarm
registers (enable AIE bit and configure alarm date/time).
But, BIOS does not clear the previous interrupt status
during alarm configuration. Therefore, "AF=AIE=1" causes
the rtc device to trigger an interrupt.
6) So, the machine reboots automatically right after shutdown.
This patch cancels the alarm timer if the following condictions are
met (suggested by Alexandre):
1) The configured alarm time is equal to current_time + 1
seconds.
2) The AIE timer is not in use.
The member 'alarm_expires' is introduced in struct cmos_rtc because
of the following reasons:
1) The configured alarm time can be retrieved from
cmos_read_alarm(), but we need to take the 'wrapped
timestamp' and 'time rollover' into consideration. The
function __rtc_read_alarm() eliminates the concerns. To
avoid the duplicated code in the lower level RTC driver,
invoking __rtc_read_alarm from the lower level RTC driver
is not encouraged. Moreover, the compilation error 'the
undefined __rtc_read_alarm" is observed if the lower level
RTC driver is compiled as a kernel module.
2) The uie_rtctimer.node.expires and aie_timer.node.expires can
be retrieved for the configured alarm time. But, the problem
is that either of them might configure the CMOS alarm time.
We cannot make sure UIE timer or AIE tiemr configured the
CMOS alarm time before. (uie_rtctimer or aie_timer is enabled
and then is disabled).
3) The patch introduces the member 'alarm_expires' to keep the
newly configured alarm time, so the above-mentioned concerns
can be eliminated.
The issue goes away after 20-time shutdown tests.
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Egbert Eich <eich@suse.de>
Tested-by: Diego Ercolani <diego.ercolani@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Add DA9062 RTC support into the existing DA9063 RTC driver component by
using generic access tables for common register and bit mask definitions.
The following change will add generic register and bit mask support to the
DA9063 RTC. The changes are slightly complicated by requiring support for
three register sets: DA9063-AD, DA9063-BB and DA9062-AA.
The following alterations have been made to the DA9063 RTC:
- Addition of a da9063_compatible_rtc_regmap structure to hold all generic
registers and bitmasks for this type of RTC component.
- A re-write of struct da9063 to use pointers for regmap and compatible
registers/masks definitions
- Addition of a of_device_id table for DA9063 and DA9062 defaults
- Refactoring functions to use struct da9063_compatible_rtc accesses to
generic registers/masks instead of using defines from registers.h
- Re-work of da9063_rtc_probe() to use of_match_node() and dev_get_regmap()
to provide initialisation of generic registers and masks and access to
regmap
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Implement the suspend/resume function in order to control rtc's irq_wake flag and handle as wakeup source.
Signed-off-by: Henry Chen <henryc.chen@mediatek.com>
Acked-by: Eddie Huang <eddie.huang@mediatek.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Instead of creating wakealarm attribute manually, after the device has been
registered, let's rely on facilities provided by the attribute groups to
control which attributes are visible and which are not. This allows to
create all needed attributes at once, at the same time that we register RTC
class device.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Instead of using older style DEVICE_ATTR for wakealarm attribute let's
switch to using DEVICE_ATTR_RW that ensures consistent across the kernel
permissions on the attribute.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Users of rtc_does_wakealarm() return value treat it as boolean so let's
change the signature accordingly.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Since commit e6229bec25be ("rtc: make rtc_update_irq callable with irqs
enabled") rtc_update_irq() is callable with irqs enabled.
Signed-off-by: Henri Roosen <henriroosen@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Since dab472eb931b ("i2c / ACPI: Use 0 to indicate that device does not
have interrupt assigned"), 0 is not a valid i2c client irq anymore, so
change all driver's checks accordingly.
The same issue occurs when the device is instantiated via device tree
with no IRQ, or from the i2c sysfs interface, even before the patch
above.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
struct rtc embeds both struct dev and struct cdev. Unfortunately character
device structure may outlive the parent rtc structure unless we set it up
as parent of character device so that it will stay pinned until character
device is freed.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Technically the address of rtc->dev can never be NULL, so get_device()
can never fail. Also caller of rtc_device_unregister() supposed to be
the owner of the device and thus have a valid reference. Therefore
call to get_device() is not needed here.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Commit 59cca865f21e ("drivers/rtc/class.c: fix device_register() error
handling") correctly noted that naked kfree() should not be used after
failed device_register() call, however, while it added the needed
put_device() it forgot to remove the original kfree() causing double-free.
Cc: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The current codes use CSR platform specific API exported by machine
codes to read/write RTC registers. they are:
sirfsoc_rtc_iobrg_readl()
sirfsoc_rtc_iobrg_writel()
commit b1999477ed91 ("ARM: prima2: move to use REGMAP APIs for rtciobrg")
moves to regmap support, now we can move to use regmap APIs in RTC
driver.
Signed-off-by: Guo Zeng <guo.zeng@csr.com>
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
rtc-opal driver provides support for rtc alarms via
timed-power-on(tpo). However some Power platforms like BML use a fake
rtc clock and don't support tpo. Such platforms are indicated by the
missing 'has-tpo' property in the device tree.
Current implementation however enables callback for
rtc_class_ops.read/set alarm irrespective of the tpo support from the
platform. This results in a failed opal call when kernel tries to read
an existing alarms via opal_get_tpo_time during rtc device registration.
This patch fixes this issue by setting opal_rtc_ops.read/set_alarm
callback pointers only when tpo is supported.
Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Add driver for the RTC found on NXP LPC178x/18xx/408x/43xx devices.
The RTC provides calendar and clock functionality together with
alarm interrupt support.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
platform_driver does not need to set an owner because
platform_driver_register() will set it.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
i2c_driver does not need to set an owner because i2c_register_driver()
will set it.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The PCF2127 datasheet states that it's wrong to say that the date in
unreliable if BLF (battery low flag) is set but instead, OSF (seconds
register) should be used to check if oscillator, for any reason, stopped.
Battery may be low (usually below 2V5 threshold) but the date may be anyway
correct (typically date is unreliable when input voltage is below 1V2).
Signed-off-by: Andrea Scian <andrea.scian@dave.eu>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
There's a wrong comment in some RTC drivers that say it's better to ignore
rtc_valid_tm() when reading RTC timestamp. However this is wrong and is
better to return to the userspace the error if timestamp is not valid.
Signed-off-by: Andrea Scian <andrea.scian@dave.eu>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Along with DT support, this patch also cleans up the unnecessary
code around 'rtc_wakeup' initialization.
Signed-off-by: Chao Xie <chao.xie@marvell.com>
Signed-off-by: Vaibhav Hiremath <vaibhav.hiremath@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
removing below static analysis error:
(error) Possible null pointer dereference: client
if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C))
^^^^^^^
Error comes because client is dereferenced before NULL check.
So probably NULL this check is not required.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Use module_platform_driver for drivers whose init and exit functions
only register and unregister, respectively.
A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:
@a@
identifier f, x;
@@
-static f(...) { return platform_driver_register(&x); }
@b depends on a@
identifier e, a.x;
@@
-static e(...) { platform_driver_unregister(&x); }
@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);
@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_platform_driver;
@@
-module_exit(e);
+module_platform_driver(x);
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
With the recent pinctrl-single changes, SoCs such as Texas
Instrument's OMAP processors can treat wake-up events from deeper idle
states as interrupts.
Let's add support for the optional second interrupt for wake-up using
the generic wakeirq support added in commit 4990d4fe327b ("PM /
Wakeirq: Add automated device wake IRQ handling")
Finally, to pass the wake-up interrupt in the dts file,
interrupts-extended property needs to be passed.
This is similar in approach to commit 2a0b965cfb6e ("serial: omap: Add
support for optional wake-up") + ee83bd3b6483 ("serial: omap: Switch
wake-up interrupt to generic wakeirq")
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
It is always a good practice to keep the #includes sorted
Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Since we are not doing anything fancy in remove function that requires
us to sequence IRQ free operation, we might as well switch over to devm_
equivalent of managed IRQ allocation and remove the explicit free_irq
since it'd be done automatically at remove.
Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The driver currently emulates the concept of threaded IRQ using a
workqueue, which it really does not need to. Instead, switch over to
threaded_irq handlers which is meant precisely for the same purpose.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
This makes vma_has_reserves() return bool due to this particular function
only returning either one or zero as its return value.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the madvise_bahaviour_valid() function return bool due to
this particular function always returning the value of either one or
zero as its return value.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the tlb_next_batch() bool due to this particular function only
ever returning either one or zero as its return value.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the function is_page_busy() return bool rather then an int now
due to this particular function's single return statement only ever
evaulating to either one or zero.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct node_active_region is not used anymore. Remove it.
Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minor, but this check is overcomplicated. Two half-intervals do NOT
overlap if END1 <= START2 || END2 <= START1, mremap_to() just needs to
negate this check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "new_len > old_len" branch in vma_to_resize() looks very confusing.
It only covers the VM_DONTEXPAND/pgoff checks but everything below is
equally unneeded if new_len == old_len.
Change this code to return if "new_len == old_len", new_len < old_len is
not possible, otherwise the code below is wrong anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
move_vma() sets *locked even if move_page_tables() or ->mremap() fails,
change sys_mremap() to check "ret & ~PAGE_MASK".
I think we should simply remove the VM_LOCKED code in move_vma(), that is
why this patch doesn't change move_vma(). But this needs more cleanups.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
way ->mremap() can have more users. Say, vdso.
While at it, s/aio_ring_remap/aio_ring_mremap/.
Note: this is the minimal change before ->mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
move_vma() can't just return if f_op->mremap() fails, we should unmap the
new vma like we do if move_page_tables() fails. To avoid the code
duplication this patch moves the "move entries back" under the new "if
(err)" branch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes vma_shareable() return bool now due to this particular function
only ever returning either one or zero as its return value.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With DAX, pfn mapping becoming more common. The patch adjusts GUP code to
cover pfn mapping for cases when we don't need struct page to proceed.
To make it possible, let's change follow_page() code to return -EEXIST
error code if proper page table entry exists, but no corresponding struct
page. __get_user_page() would ignore the error code and move to the next
page frame.
The immediate effect of the change is working MAP_POPULATE and mlock() on
DAX mappings.
[akpm@linux-foundation.org: fix arm64 build]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The manpage for move_pages(2) specifies that status code for zero page is
supposed to be -EFAULT. Currently kernel return -ENOENT in this case.
follow_page() can do it for us, if we would ask for FOLL_DUMP. The use of
FOLL_DUMP also means that the upper layer page tables pages are no longer
allocated.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clark stumbled over a VM_BUG_ON() in -RT which was then was removed by
Johannes in commit f371763a79d ("mm: memcontrol: fix false-positive
VM_BUG_ON() on -rt"). The comment before that patch was a tiny bit better
than it is now. While the patch claimed to fix a false-postive on -RT
this was not the case. None of the -RT folks ACKed it and it was not a
false positive report. That was a *real* problem.
This patch updates the comment that is improper because it refers to
"disabled preemption" as a consequence of that lock being taken. A
spin_lock() disables preemption, true, but in this case the code relies on
the fact that the lock _also_ disables interrupts once it is acquired.
And this is the important detail (which was checked the VM_BUG_ON()) which
needs to be pointed out. This is the hint one needs while looking at the
code. It was explained by Johannes on the list that the per-CPU variables
are protected by local_irq_save(). The BUG_ON() was helpful. This code
has been workarounded in -RT in the meantime. I wouldn't mind running
into more of those if the code in question uses *special* kind of locking
since now there is no verification (in terms of lockdep or BUG_ON()) and
therefore I bring the VM_BUG_ON() check back in.
The two functions after the comment could also have a "local_irq_save()"
dance around them in order to serialize access to the per-CPU variables.
This has been avoided because the interrupts should be off.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change fills devm_gen_pool_create()/gen_pool_get() "name" argument
stub with contents and extends of_gen_pool_get() functionality on this
basis.
If there is no associated platform device with a device node passed to
of_gen_pool_get(), the function attempts to get a label property or device
node name (= repeats MTD OF partition standard) and seeks for a named
gen_pool registered by device of the parent device node.
The main idea of the change is to allow registration of independent
gen_pools under the same umbrella device, say "partitions" on "storage
device", the original functionality of one "partition" per "storage
device" is untouched.
[akpm@linux-foundation.org: fix constness in devres_find()]
[dan.carpenter@oracle.com: freeing const data pointers]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change modifies gen_pool_get() and devm_gen_pool_create() client
interfaces adding one more argument "name" of a gen_pool object.
Due to implementation gen_pool_get() is capable to retrieve only one
gen_pool associated with a device even if multiple gen_pools are created,
fortunately right at the moment it is sufficient for the clients, hence
provide NULL as a valid argument on both producer devm_gen_pool_create()
and consumer gen_pool_get() sides.
Because only one created gen_pool per device is addressable, explicitly
add a restriction to devm_gen_pool_create() to create only one gen_pool
per device, this implies two possible error codes returned by the
function, account it on client side (only misc/sram). This completes
client side changes related to genalloc updates.
[akpm@linux-foundation.org: gen_pool_get() cleanup]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Each memblock_region has nid to indicates the Node ID of this range. For
the overlap case, memblock_add_range() inserts the lower part and leave
the upper part as indicated in the overlapped region.
If the nid of the new range differs from the overlapped region, the
information recorded is not correct.
This patch adds a WARN_ON when the nid of the new range differs from the
overlapped region.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a PTE is unmapped and it's dirty then it was writable recently. Due to
deferred TLB flushing, it's best to assume a writable TLB cache entry
exists. With that assumption, the TLB must be flushed before any IO can
start or the page is freed to avoid lost writes or data corruption. This
patch defers flushing of potentially writable TLBs as long as possible.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs. There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.
On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high. This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped. When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.
Architectures wishing to do this must give the following guarantee.
If a clean page is unmapped and not immediately flushed, the
architecture must guarantee that a write to that linear address
from a CPU with a cached TLB entry will trap a page fault.
This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting. The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry. An additional
architecture helper called flush_tlb_local is required. It's a trivial
wrapper with some accounting in the x86 case.
The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure. The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite. The test case uses NR_CPU
readers of mapped files that consume 10*RAM.
Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%)
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 581.00 611.43
System 5804.93 4111.76
Elapsed 161.03 122.12
This is showing that the readers completed 24.40% faster with 29% less
system CPU time. From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.
The impact is lower on a single socket machine.
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%)
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 58.09 57.64
System 111.82 76.56
Elapsed 27.29 22.55
It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.
The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages. It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes. Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.
[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>