The only user of the cycle_last validation is the x86 TSC. In order to
provide NMI safe accessor functions for clock monotonic and
monotonic_raw we need to do that in the core.
We can't do the TSC specific
if (now < cycle_last)
now = cycle_last;
for the other wrapping around clocksources, but TSC has
CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
now is less than cycle_last the subtraction will give a negative
result. So we can check for that in clocksource_delta() and return 0
for that case.
Implement and enable it for x86
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
We want to move the TSC sanity check into core code to make NMI safe
accessors to clock monotonic[_raw] possible. For this we need to
sanity check the delta calculation. Create a helper function and
convert all sites to use it.
[ Build fix from jstultz ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Provide a ktime_t based interface for raw monotonic time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
timekeeping_clocktai() is not used in fast pathes, so the extra
timespec conversion is not problematic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Subtracting plain nsec values and converting to timespec is simpler
than the whole timespec math. Not really fastpath code, so the
division is not an issue.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
get_monotonic_boottime() is not used in fast pathes, so the extra
timespec conversion is not problematic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Having two fields within the same struct that is off by one character
can be confusing and error prone. Rename the counter "trampolines"
to "nr_trampolines" to explicitly show it is a counter and not to
be confused by the "trampoline" field.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Converting cputime to timespec and timespec to nanoseconds makes no
sense. Use cputime_to_ns() and be done with it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Kill the timespec juggling and calculate with plain nanoseconds.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Simplify the timespec to nsec/usec conversions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Simplify the only user of this data by removing the timespec
conversion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Required for moving drivers to the nanosecond based interfaces.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
ktime based conversion function to map a monotonic time stamp to a
different CLOCK.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Provide a helper function which lets us implement ktime_t based
interfaces for real, boot and tai clocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Speed up ktime_get() by using ktime_t based data. Text size shrinks by
64 bytes on x8664.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The ktime_t based interfaces are used a lot in performance critical
code pathes. Add ktime_t based data so the interfaces don't have to
convert from the xtime/timespec based data.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
We already have a function which does the right thing, that also makes
sure that the coming ktime_t based cached values are getting updated.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
struct timekeeper is quite badly sorted for the hot readout path. Most
time access functions need to load two cache lines.
Rearrange it so ktime_get() and getnstimeofday() are happy with a
single cache line.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
To convert callers of the core code to timespec64 we need to provide
the proper interfaces.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Right now we have time related prototypes in 3 different header
files. Move it to a single timekeeping header file and move the core
internal stuff into a core private header.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Convert the core timekeeping logic to use timespec64s. This moves the
2038 issues out of the core logic and into all of the accessor
functions.
Future changes will need to push the timespec64s out to all
timekeeping users, but that can be done interface by interface.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Helper and conversion functions for timespec64.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
With the plain nanoseconds based ktime_t we can simply use
ktime_divns() instead of going through loops and hoops of
timespec/timeval conversion.
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The non-scalar ktime_t implementation is basically a timespec
which has to be changed to support dates past 2038 on 32bit
systems.
This patch removes the non-scalar ktime_t implementation, forcing
the scalar s64 nanosecond version on all architectures.
This may have additional performance overhead on some 32bit
systems when converting between ktime_t and timespec structures,
however the majority of 32bit systems (arm and i386) were already
using scalar ktime_t, so no performance regressions will be seen
on those platforms.
On affected platforms, I'm open to finding optimizations, including
avoiding converting to timespecs where possible.
[ tglx: We can now cleanup the ktime_t.tv64 mess, but thats a
different issue and we can throw a coccinelle script at it ]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Rather then having two similar but totally different implementations
that provide timekeeping state to the hrtimer code, try to unify the
two implementations to be more simliar.
Thus this clarifies ktime_get_update_offsets to
ktime_get_update_offsets_now and changes get_xtime... to
ktime_get_update_offsets_tick.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Provide a default stub function instead of having the extra
conditional. Cuts binary size on a m68k build by ~100 bytes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Create a module that allows udelay() to be executed to ensure that
it is delaying at least as long as requested (with a little bit of
error allowed).
There are some configurations which don't have reliably udelay
due to using a loop delay with cpufreq changes which should use
a counter time based delay instead. This test aims to identify
those configurations where timing is unreliable.
Signed-off-by: David Riley <davidriley@chromium.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The PM workqueue is going to be used by ACPI PM notify handlers
regardless of whether or not runtime PM is configured, so move
it out of #ifdef CONFIG_PM_RUNTIME.
Do that in three places in the ACPI device PM code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After the introduction of freeze_ops it makes more sense to move
all of the platform suspend operations to separate functions that
each will do all of the necessary checks and choose the right
callback to execute istead of doing all that in the core code
which makes it generally harder to follow.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since the OPP layer is a kernel library which has been converted to be
directly selectable by its callers rather than user selectable and
requiring architectures to enable it explicitly the ARCH_HAS_OPP symbol
has become redundant and can be removed. Do so.
Signed-off-by: Mark Brown <broonie@linaro.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Nishanth Menon <nm@ti.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Shawn Guo <shawn.guo@freescale.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
They are the same and nr_node_ids is provided by the memory subsystem.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
After the locking was moved up to the caller of the get_unbound_pool(),
out_unlock label doesn't need to do any unlock operation and the name
became bad, so we just remove this label, and the only usage-site
"goto out_unlock" is subsituted to "return pool".
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In 75ccf5950f82 ("workqueue: prepare flush_workqueue() for dynamic
creation and destrucion of unbound pool_workqueues"), a comment
about the synchronization for the pwq in pwq_unbound_release_workfn()
was added. The comment claimed the flush_mutex wasn't strictly
necessary, it was correct in that time, due to the pwq was protected
by workqueue_lock.
But it is incorrect now since the wq->flush_mutex was renamed to
wq->mutex and workqueue_lock was removed, the wq->mutex is strictly
needed. But the comment was miss-updated when the synchronization
was changed.
This patch removes the incorrect comments and doesn't add any new
comment to explain why wq->mutex is needed here, which is definitely
obvious and wq->pwqs_node has "WQ" notation in its definition which is
better comment.
The old commit mentioned above also introduced a comment in link_pwq()
about the synchronization. This comment is also removed in this patch
since the whole link_pwq() is proteced by wq->mutex.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In 51697d393922 ("workqueue: use generic attach/detach routine for
rescuers"), The rescuer detaches itself from the pool before put_pwq()
so that the put_unbound_pool() will not destroy the rescuer-attached
pool.
It is unnecessary. worker_detach_from_pool() can be used as the last
statement to access to the pool just like the regular workers,
put_unbound_pool() will wait for it to detach and then free the pool.
So we move the worker_detach_from_pool() down, make it coincide with
the regular workers.
tj: Minor description update.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Simply unfold the code of start_worker() into create_worker() and
remove the original start_worker() and create_and_start_worker().
The only trade-off is the introduced overhead that the pool->lock
is released and regrabbed after the newly worker is started.
The overhead is acceptible since the manager is slow path.
And because this new locking behavior, the newly created worker
may grab the lock earlier than the manager and go to process
work items. In this case, the recheck need_to_create_worker() may be
true as expected and the manager goes to restart which is the
correct behavior.
tj: Minor updates to description and comments.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
worker_set_flags() has only two callers, each specifying %true and
%false for @wakeup. Let's push the wake up to the caller and remove
@wakeup from worker_set_flags(). The caller can use the following
instead if wakeup is necessary:
worker_set_flags();
if (need_more_worker(pool))
wake_up_worker(pool);
This makes the code simpler. This patch doesn't introduce behavior
changes.
tj: Updated description and comments.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In process_one_work():
if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool))
wake_up_worker(pool);
the first test is unneeded. Even if the first test is removed, it
doesn't affect the wake-up logic for WORKER_UNBOUND, and it will not
introduce any useless wake-ups for normal per-cpu workers since
nr_running is always >= 1. It will introduce useless/redundant
wake-ups for CPU_INTENSIVE, but this case is rare and the next patch
will also remove this redundant wake-up.
tj: Minor updates to the description and comment.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Conflicts:
drivers/infiniband/hw/cxgb4/device.c
The cxgb4 conflict was simply overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The "uptime" trace clock added in:
commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).
Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).
Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com
Cc: stable@vger.kernel.org # 3.10+
Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies"
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Simplify the sleep states sysfs interface /sys/power/state code by
redefining pm_states[] as an array of pointers to constant strings
such that only the entries corresponding to valid states are set.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull locking fixes from Thomas Gleixner:
"The locking department delivers:
- A rather large and intrusive bundle of fixes to address serious
performance regressions introduced by the new rwsem / mcs
technology. Simpler solutions have been discussed, but they would
have been ugly bandaids with more risk than doing the right thing.
- Make the rwsem spin on owner technology opt-in for architectures
and enable it only on the known to work ones.
- A few fixes to the lockdep userspace library"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER
locking/mutex: Disable optimistic spinning on some architectures
locking/rwsem: Reduce the size of struct rw_semaphore
locking/rwsem: Rename 'activity' to 'count'
locking/spinlocks/mcs: Micro-optimize osq_unlock()
locking/spinlocks/mcs: Introduce and use init macro and function for osq locks
locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead
locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()
locking/rwsem: Allow conservative optimistic spinning when readers have lock
tools/liblockdep: Account for bitfield changes in lockdeps lock_acquire
tools/liblockdep: Remove debug print left over from development
tools/liblockdep: Fix comparison of a boolean value with a value of 2