Timers and timekeeping updates:

- Improve the VDSO build time checks to cover all dynamic relocations
 
     VDSO does not allow dynamic relcations, but the build time check is
     incomplete and fragile.
 
     It's based on architectures specifying the relocation types to search
     for and does not handle R_*_NONE relocation entries correctly.
     R_*_NONE relocations are injected by some GNU ld variants if they fail
     to determine the exact .rel[a]/dyn_size to cover trailing zeros.
     R_*_NONE relocations must be ignored by dynamic loaders, so they
     should be ignored in the build time check too.
 
     Remove the architecture specific relocation types to check for and
     validate strictly that no other relocations than R_*_NONE end up
     in the VSDO .so file.
 
   - Prefer signal delivery to the current thread for
     CLOCK_PROCESS_CPUTIME_ID based posix-timers
 
     Such timers prefer to deliver the signal to the main thread of a
     process even if the context in which the timer expires is the current
     task. This has the downside that it might wake up an idle thread.
 
     As there is no requirement or guarantee that the signal has to be
     delivered to the main thread, avoid this by preferring the current
     task if it is part of the thread group which shares sighand.
 
     This not only avoids waking idle threads, it also distributes the
     signal delivery in case of multiple timers firing in the context
     of different threads close to each other better.
 
   - Align the tick period properly (again)
 
     For a long time the tick was starting at CLOCK_MONOTONIC zero, which
     allowed users space applications to either align with the tick or to
     place a periodic computation so that it does not interfere with the
     tick. The alignement of the tick period was more by chance than by
     intention as the tick is set up before a high resolution clocksource is
     installed, i.e. timekeeping is still tick based and the tick period
     advances from there.
 
     The early enablement of sched_clock() broke this alignement as the time
     accumulated by sched_clock() is taken into account when timekeeping is
     initialized. So the base value now(CLOCK_MONOTONIC) is not longer a
     multiple of tick periods, which breaks applications which relied on
     that behaviour.
 
     Cure this by aligning the tick starting point to the next multiple of
     tick periods, i.e 1000ms/CONFIG_HZ.
 
  - A set of NOHZ fixes and enhancements
 
    - Cure the concurrent writer race for idle and IO sleeptime statistics
 
      The statitic values which are exposed via /proc/stat are updated from
      the CPU local idle exit and remotely by cpufreq, but that happens
      without any form of serialization. As a consequence sleeptimes can be
      accounted twice or worse.
 
      Prevent this by restricting the accumulation writeback to the CPU
      local idle exit and let the remote access compute the accumulated
      value.
 
    - Protect idle/iowait sleep time with a sequence count
 
      Reading idle/iowait sleep time, e.g. from /proc/stat, can race with
      idle exit updates. As a consequence the readout may result in random
      and potentially going backwards values.
 
      Protect this by a sequence count, which fixes the idle time
      statistics issue, but cannot fix the iowait time problem because
      iowait time accounting races with remote wake ups decrementing the
      remote runqueues nr_iowait counter. The latter is impossible to fix,
      so the only way to deal with that is to document it properly and to
      remove the assertion in the selftest which triggers occasionally due
      to that.
 
    - Restructure struct tick_sched for better cache layout
 
    - Some small cleanups and a better cache layout for struct tick_sched
 
  - Implement the missing timer_wait_running() callback for POSIX CPU timers
 
    For unknown reason the introduction of the timer_wait_running() callback
    missed to fixup posix CPU timers, which went unnoticed for almost four
    years.
 
    While initially only targeted to prevent livelocks between a timer
    deletion and the timer expiry function on PREEMPT_RT enabled kernels, it
    turned out that fixing this for mainline is not as trivial as just
    implementing a stub similar to the hrtimer/timer callbacks.
 
    The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems
    there is a livelock issue independent of RT.
 
    CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers
    out from hard interrupt context to task work, which is handled before
    returning to user space or to a VM. The expiry mechanism moves the
    expired timers to a stack local list head with sighand lock held. Once
    sighand is dropped the task can be preempted and a task which wants to
    delete a timer will spin-wait until the expiry task is scheduled back
    in. In the worst case this will end up in a livelock when the preempting
    task and the expiry task are pinned on the same CPU.
 
    The timer wheel has a timer_wait_running() mechanism for RT, which uses
    a per CPU timer-base expiry lock which is held by the expiry code and the
    task waiting for the timer function to complete blocks on that lock.
 
    This does not work in the same way for posix CPU timers as there is no
    timer base and expiry for process wide timers can run on any task
    belonging to that process, but the concept of waiting on an expiry lock
    can be used too in a slightly different way.
 
    Add a per task mutex to struct posix_cputimers_work, let the expiry task
    hold it accross the expiry function and let the deleting task which
    waits for the expiry to complete block on the mutex.
 
    In the non-contended case this results in an extra mutex_lock()/unlock()
    pair on both sides.
 
    This avoids spin-waiting on a task which is scheduled out, prevents the
    livelock and cures the problem for RT and !RT systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb
 EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0
 0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs
 IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+
 00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y
 OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H
 5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY
 Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40
 6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2
 tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx
 9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB
 nIJuHl8mxSetag==
 =SVnj
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timers and timekeeping updates from Thomas Gleixner:

 - Improve the VDSO build time checks to cover all dynamic relocations

   VDSO does not allow dynamic relocations, but the build time check is
   incomplete and fragile.

   It's based on architectures specifying the relocation types to search
   for and does not handle R_*_NONE relocation entries correctly.
   R_*_NONE relocations are injected by some GNU ld variants if they
   fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
   R_*_NONE relocations must be ignored by dynamic loaders, so they
   should be ignored in the build time check too.

   Remove the architecture specific relocation types to check for and
   validate strictly that no other relocations than R_*_NONE end up in
   the VSDO .so file.

 - Prefer signal delivery to the current thread for
   CLOCK_PROCESS_CPUTIME_ID based posix-timers

   Such timers prefer to deliver the signal to the main thread of a
   process even if the context in which the timer expires is the current
   task. This has the downside that it might wake up an idle thread.

   As there is no requirement or guarantee that the signal has to be
   delivered to the main thread, avoid this by preferring the current
   task if it is part of the thread group which shares sighand.

   This not only avoids waking idle threads, it also distributes the
   signal delivery in case of multiple timers firing in the context of
   different threads close to each other better.

 - Align the tick period properly (again)

   For a long time the tick was starting at CLOCK_MONOTONIC zero, which
   allowed users space applications to either align with the tick or to
   place a periodic computation so that it does not interfere with the
   tick. The alignement of the tick period was more by chance than by
   intention as the tick is set up before a high resolution clocksource
   is installed, i.e. timekeeping is still tick based and the tick
   period advances from there.

   The early enablement of sched_clock() broke this alignement as the
   time accumulated by sched_clock() is taken into account when
   timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
   not longer a multiple of tick periods, which breaks applications
   which relied on that behaviour.

   Cure this by aligning the tick starting point to the next multiple of
   tick periods, i.e 1000ms/CONFIG_HZ.

 - A set of NOHZ fixes and enhancements:

     * Cure the concurrent writer race for idle and IO sleeptime
       statistics

       The statitic values which are exposed via /proc/stat are updated
       from the CPU local idle exit and remotely by cpufreq, but that
       happens without any form of serialization. As a consequence
       sleeptimes can be accounted twice or worse.

       Prevent this by restricting the accumulation writeback to the CPU
       local idle exit and let the remote access compute the accumulated
       value.

     * Protect idle/iowait sleep time with a sequence count

       Reading idle/iowait sleep time, e.g. from /proc/stat, can race
       with idle exit updates. As a consequence the readout may result
       in random and potentially going backwards values.

       Protect this by a sequence count, which fixes the idle time
       statistics issue, but cannot fix the iowait time problem because
       iowait time accounting races with remote wake ups decrementing
       the remote runqueues nr_iowait counter. The latter is impossible
       to fix, so the only way to deal with that is to document it
       properly and to remove the assertion in the selftest which
       triggers occasionally due to that.

     * Restructure struct tick_sched for better cache layout

     * Some small cleanups and a better cache layout for struct
       tick_sched

 - Implement the missing timer_wait_running() callback for POSIX CPU
   timers

   For unknown reason the introduction of the timer_wait_running()
   callback missed to fixup posix CPU timers, which went unnoticed for
   almost four years.

   While initially only targeted to prevent livelocks between a timer
   deletion and the timer expiry function on PREEMPT_RT enabled kernels,
   it turned out that fixing this for mainline is not as trivial as just
   implementing a stub similar to the hrtimer/timer callbacks.

   The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
   systems there is a livelock issue independent of RT.

   CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
   timers out from hard interrupt context to task work, which is handled
   before returning to user space or to a VM. The expiry mechanism moves
   the expired timers to a stack local list head with sighand lock held.
   Once sighand is dropped the task can be preempted and a task which
   wants to delete a timer will spin-wait until the expiry task is
   scheduled back in. In the worst case this will end up in a livelock
   when the preempting task and the expiry task are pinned on the same
   CPU.

   The timer wheel has a timer_wait_running() mechanism for RT, which
   uses a per CPU timer-base expiry lock which is held by the expiry
   code and the task waiting for the timer function to complete blocks
   on that lock.

   This does not work in the same way for posix CPU timers as there is
   no timer base and expiry for process wide timers can run on any task
   belonging to that process, but the concept of waiting on an expiry
   lock can be used too in a slightly different way.

   Add a per task mutex to struct posix_cputimers_work, let the expiry
   task hold it accross the expiry function and let the deleting task
   which waits for the expiry to complete block on the mutex.

   In the non-contended case this results in an extra
   mutex_lock()/unlock() pair on both sides.

   This avoids spin-waiting on a task which is scheduled out, prevents
   the livelock and cures the problem for RT and !RT systems

* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Implement the missing timer_wait_running callback
  selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  selftests/proc: Remove idle time monotonicity assertions
  MAINTAINERS: Remove stale email address
  timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
  timers/nohz: Add a comment about broken iowait counter update race
  timers/nohz: Protect idle/iowait sleep time under seqcount
  timers/nohz: Only ever update sleeptime from idle exit
  timers/nohz: Restructure and reshuffle struct tick_sched
  tick/common: Align tick period with the HZ tick.
  selftests/timers/posix_timers: Test delivery of signals across threads
  posix-timers: Prefer delivery of signals to the current thread
  vdso: Improve cmd_vdso_check to check all dynamic relocations
This commit is contained in:
Linus Torvalds 2023-04-25 11:22:46 -07:00
commit e7989789c6
24 changed files with 361 additions and 188 deletions

View File

@ -14741,7 +14741,7 @@ F: include/uapi/linux/nitro_enclaves.h
F: samples/nitro_enclaves/ F: samples/nitro_enclaves/
NOHZ, DYNTICKS SUPPORT NOHZ, DYNTICKS SUPPORT
M: Frederic Weisbecker <fweisbec@gmail.com> M: Frederic Weisbecker <frederic@kernel.org>
M: Thomas Gleixner <tglx@linutronix.de> M: Thomas Gleixner <tglx@linutronix.de>
M: Ingo Molnar <mingo@kernel.org> M: Ingo Molnar <mingo@kernel.org>
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org

View File

@ -1,8 +1,6 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_ARM_JUMP_SLOT|R_ARM_GLOB_DAT|R_ARM_ABS32
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
hostprogs := vdsomunge hostprogs := vdsomunge

View File

@ -6,9 +6,7 @@
# Heavily based on the vDSO Makefiles for other archs. # Heavily based on the vDSO Makefiles for other archs.
# #
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_AARCH64_JUMP_SLOT|R_AARCH64_GLOB_DAT|R_AARCH64_ABS64
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
obj-vdso := vgettimeofday.o note.o sigreturn.o obj-vdso := vgettimeofday.o note.o sigreturn.o

View File

@ -3,9 +3,6 @@
# Makefile for vdso32 # Makefile for vdso32
# #
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_ARM_JUMP_SLOT|R_ARM_GLOB_DAT|R_ARM_ABS32
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
# Same as cc-*option, but using CC_COMPAT instead of CC # Same as cc-*option, but using CC_COMPAT instead of CC

View File

@ -1,8 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only # SPDX-License-Identifier: GPL-2.0-only
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_CKCORE_ADDR32|R_CKCORE_JUMP_SLOT
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
# Symbols present in the vdso # Symbols present in the vdso

View File

@ -1,9 +1,7 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
# Objects to go into the VDSO. # Objects to go into the VDSO.
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_LARCH_32|R_LARCH_64|R_LARCH_MARK_LA|R_LARCH_JUMP_SLOT
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o

View File

@ -4,9 +4,7 @@
# Sanitizer runtimes are unavailable and cannot be linked here. # Sanitizer runtimes are unavailable and cannot be linked here.
KCSAN_SANITIZE := n KCSAN_SANITIZE := n
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_MIPS_JUMP_SLOT|R_MIPS_GLOB_DAT
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
obj-vdso-y := elf.o vgettimeofday.o sigreturn.o obj-vdso-y := elf.o vgettimeofday.o sigreturn.o

View File

@ -2,7 +2,7 @@
# List of files in the vdso, has to be asm only for now # List of files in the vdso, has to be asm only for now
ARCH_REL_TYPE_ABS := R_PPC_JUMP_SLOT|R_PPC_GLOB_DAT|R_PPC_ADDR32|R_PPC_ADDR24|R_PPC_ADDR16|R_PPC_ADDR16_LO|R_PPC_ADDR16_HI|R_PPC_ADDR16_HA|R_PPC_ADDR14|R_PPC_ADDR14_BRTAKEN|R_PPC_ADDR14_BRNTAKEN|R_PPC_REL24 # Include the generic Makefile to check the built vdso.
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
obj-vdso32 = sigtramp32-32.o gettimeofday-32.o datapage-32.o cacheflush-32.o note-32.o getcpu-32.o obj-vdso32 = sigtramp32-32.o gettimeofday-32.o datapage-32.o cacheflush-32.o note-32.o getcpu-32.o

View File

@ -1,9 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only # SPDX-License-Identifier: GPL-2.0-only
# Copied from arch/tile/kernel/vdso/Makefile # Copied from arch/tile/kernel/vdso/Makefile
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_RISCV_32|R_RISCV_64|R_RISCV_JUMP_SLOT
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
# Symbols present in the vdso # Symbols present in the vdso
vdso-syms = rt_sigreturn vdso-syms = rt_sigreturn

View File

@ -2,9 +2,8 @@
# List of files in the vdso # List of files in the vdso
KCOV_INSTRUMENT := n KCOV_INSTRUMENT := n
ARCH_REL_TYPE_ABS := R_390_COPY|R_390_GLOB_DAT|R_390_JMP_SLOT|R_390_RELATIVE
ARCH_REL_TYPE_ABS += R_390_GOT|R_390_PLT
# Include the generic Makefile to check the built vdso.
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
obj-vdso32 = vdso_user_wrapper-32.o note-32.o obj-vdso32 = vdso_user_wrapper-32.o note-32.o

View File

@ -2,9 +2,8 @@
# List of files in the vdso # List of files in the vdso
KCOV_INSTRUMENT := n KCOV_INSTRUMENT := n
ARCH_REL_TYPE_ABS := R_390_COPY|R_390_GLOB_DAT|R_390_JMP_SLOT|R_390_RELATIVE
ARCH_REL_TYPE_ABS += R_390_GOT|R_390_PLT
# Include the generic Makefile to check the built vdso.
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
obj-vdso64 = vdso_user_wrapper.o note.o obj-vdso64 = vdso_user_wrapper.o note.o
obj-cvdso64 = vdso64_generic.o getcpu.o obj-cvdso64 = vdso64_generic.o getcpu.o

View File

@ -3,10 +3,7 @@
# Building vDSO images for x86. # Building vDSO images for x86.
# #
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before # Include the generic Makefile to check the built vdso.
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE|
ARCH_REL_TYPE_ABS += R_386_GLOB_DAT|R_386_JMP_SLOT|R_386_RELATIVE
include $(srctree)/lib/vdso/Makefile include $(srctree)/lib/vdso/Makefile
# Sanitizer runtimes are unavailable and cannot be linked here. # Sanitizer runtimes are unavailable and cannot be linked here.

View File

@ -4,6 +4,7 @@
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/list.h> #include <linux/list.h>
#include <linux/mutex.h>
#include <linux/alarmtimer.h> #include <linux/alarmtimer.h>
#include <linux/timerqueue.h> #include <linux/timerqueue.h>
@ -62,16 +63,18 @@ static inline int clockid_to_fd(const clockid_t clk)
* cpu_timer - Posix CPU timer representation for k_itimer * cpu_timer - Posix CPU timer representation for k_itimer
* @node: timerqueue node to queue in the task/sig * @node: timerqueue node to queue in the task/sig
* @head: timerqueue head on which this timer is queued * @head: timerqueue head on which this timer is queued
* @task: Pointer to target task * @pid: Pointer to target task PID
* @elist: List head for the expiry list * @elist: List head for the expiry list
* @firing: Timer is currently firing * @firing: Timer is currently firing
* @handling: Pointer to the task which handles expiry
*/ */
struct cpu_timer { struct cpu_timer {
struct timerqueue_node node; struct timerqueue_node node;
struct timerqueue_head *head; struct timerqueue_head *head;
struct pid *pid; struct pid *pid;
struct list_head elist; struct list_head elist;
int firing; int firing;
struct task_struct __rcu *handling;
}; };
static inline bool cpu_timer_enqueue(struct timerqueue_head *head, static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
@ -135,10 +138,12 @@ struct posix_cputimers {
/** /**
* posix_cputimers_work - Container for task work based posix CPU timer expiry * posix_cputimers_work - Container for task work based posix CPU timer expiry
* @work: The task work to be scheduled * @work: The task work to be scheduled
* @mutex: Mutex held around expiry in context of this task work
* @scheduled: @work has been scheduled already, no further processing * @scheduled: @work has been scheduled already, no further processing
*/ */
struct posix_cputimers_work { struct posix_cputimers_work {
struct callback_head work; struct callback_head work;
struct mutex mutex;
unsigned int scheduled; unsigned int scheduled;
}; };

View File

@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
/* /*
* Now find a thread we can wake up to take the signal off the queue. * Now find a thread we can wake up to take the signal off the queue.
* *
* If the main thread wants the signal, it gets first crack. * Try the suggested task first (may or may not be the main thread).
* Probably the least surprising to the average bear.
*/ */
if (wants_signal(sig, p)) if (wants_signal(sig, p))
t = p; t = p;
@ -1970,8 +1969,24 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
ret = -1; ret = -1;
rcu_read_lock(); rcu_read_lock();
/*
* This function is used by POSIX timers to deliver a timer signal.
* Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
* set), the signal must be delivered to the specific thread (queues
* into t->pending).
*
* Where type is not PIDTYPE_PID, signals must be delivered to the
* process. In this case, prefer to deliver to current if it is in
* the same thread group as the target process, which avoids
* unnecessarily waking up a potentially idle task.
*/
t = pid_task(pid, type); t = pid_task(pid, type);
if (!t || !likely(lock_task_sighand(t, &flags))) if (!t)
goto ret;
if (type != PIDTYPE_PID && same_thread_group(t, current))
t = current;
if (!likely(lock_task_sighand(t, &flags)))
goto ret; goto ret;
ret = 1; /* the signal is ignored */ ret = 1; /* the signal is ignored */

View File

@ -846,6 +846,8 @@ static u64 collect_timerqueue(struct timerqueue_head *head,
return expires; return expires;
ctmr->firing = 1; ctmr->firing = 1;
/* See posix_cpu_timer_wait_running() */
rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr); cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing); list_add_tail(&ctmr->elist, firing);
} }
@ -1161,7 +1163,49 @@ static void handle_posix_cpu_timers(struct task_struct *tsk);
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
static void posix_cpu_timers_work(struct callback_head *work) static void posix_cpu_timers_work(struct callback_head *work)
{ {
struct posix_cputimers_work *cw = container_of(work, typeof(*cw), work);
mutex_lock(&cw->mutex);
handle_posix_cpu_timers(current); handle_posix_cpu_timers(current);
mutex_unlock(&cw->mutex);
}
/*
* Invoked from the posix-timer core when a cancel operation failed because
* the timer is marked firing. The caller holds rcu_read_lock(), which
* protects the timer and the task which is expiring it from being freed.
*/
static void posix_cpu_timer_wait_running(struct k_itimer *timr)
{
struct task_struct *tsk = rcu_dereference(timr->it.cpu.handling);
/* Has the handling task completed expiry already? */
if (!tsk)
return;
/* Ensure that the task cannot go away */
get_task_struct(tsk);
/* Now drop the RCU protection so the mutex can be locked */
rcu_read_unlock();
/* Wait on the expiry mutex */
mutex_lock(&tsk->posix_cputimers_work.mutex);
/* Release it immediately again. */
mutex_unlock(&tsk->posix_cputimers_work.mutex);
/* Drop the task reference. */
put_task_struct(tsk);
/* Relock RCU so the callsite is balanced */
rcu_read_lock();
}
static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
{
/* Ensure that timr->it.cpu.handling task cannot go away */
rcu_read_lock();
spin_unlock_irq(&timr->it_lock);
posix_cpu_timer_wait_running(timr);
rcu_read_unlock();
/* @timr is on stack and is valid */
spin_lock_irq(&timr->it_lock);
} }
/* /*
@ -1177,6 +1221,7 @@ void clear_posix_cputimers_work(struct task_struct *p)
sizeof(p->posix_cputimers_work.work)); sizeof(p->posix_cputimers_work.work));
init_task_work(&p->posix_cputimers_work.work, init_task_work(&p->posix_cputimers_work.work,
posix_cpu_timers_work); posix_cpu_timers_work);
mutex_init(&p->posix_cputimers_work.mutex);
p->posix_cputimers_work.scheduled = false; p->posix_cputimers_work.scheduled = false;
} }
@ -1255,6 +1300,18 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
lockdep_posixtimer_exit(); lockdep_posixtimer_exit();
} }
static void posix_cpu_timer_wait_running(struct k_itimer *timr)
{
cpu_relax();
}
static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
{
spin_unlock_irq(&timr->it_lock);
cpu_relax();
spin_lock_irq(&timr->it_lock);
}
static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk) static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
{ {
return false; return false;
@ -1363,6 +1420,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
*/ */
if (likely(cpu_firing >= 0)) if (likely(cpu_firing >= 0))
cpu_timer_fire(timer); cpu_timer_fire(timer);
/* See posix_cpu_timer_wait_running() */
rcu_assign_pointer(timer->it.cpu.handling, NULL);
spin_unlock(&timer->it_lock); spin_unlock(&timer->it_lock);
} }
} }
@ -1497,23 +1556,16 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
expires = cpu_timer_getexpires(&timer.it.cpu); expires = cpu_timer_getexpires(&timer.it.cpu);
error = posix_cpu_timer_set(&timer, 0, &zero_it, &it); error = posix_cpu_timer_set(&timer, 0, &zero_it, &it);
if (!error) { if (!error) {
/* /* Timer is now unarmed, deletion can not fail. */
* Timer is now unarmed, deletion can not fail.
*/
posix_cpu_timer_del(&timer); posix_cpu_timer_del(&timer);
} else {
while (error == TIMER_RETRY) {
posix_cpu_timer_wait_running_nsleep(&timer);
error = posix_cpu_timer_del(&timer);
}
} }
spin_unlock_irq(&timer.it_lock);
while (error == TIMER_RETRY) { spin_unlock_irq(&timer.it_lock);
/*
* We need to handle case when timer was or is in the
* middle of firing. In other cases we already freed
* resources.
*/
spin_lock_irq(&timer.it_lock);
error = posix_cpu_timer_del(&timer);
spin_unlock_irq(&timer.it_lock);
}
if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) { if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) {
/* /*
@ -1623,6 +1675,7 @@ const struct k_clock clock_posix_cpu = {
.timer_del = posix_cpu_timer_del, .timer_del = posix_cpu_timer_del,
.timer_get = posix_cpu_timer_get, .timer_get = posix_cpu_timer_get,
.timer_rearm = posix_cpu_timer_rearm, .timer_rearm = posix_cpu_timer_rearm,
.timer_wait_running = posix_cpu_timer_wait_running,
}; };
const struct k_clock clock_process = { const struct k_clock clock_process = {

View File

@ -846,6 +846,10 @@ static struct k_itimer *timer_wait_running(struct k_itimer *timer,
rcu_read_lock(); rcu_read_lock();
unlock_timer(timer, *flags); unlock_timer(timer, *flags);
/*
* kc->timer_wait_running() might drop RCU lock. So @timer
* cannot be touched anymore after the function returns!
*/
if (!WARN_ON_ONCE(!kc->timer_wait_running)) if (!WARN_ON_ONCE(!kc->timer_wait_running))
kc->timer_wait_running(timer); kc->timer_wait_running(timer);

View File

@ -218,9 +218,19 @@ static void tick_setup_device(struct tick_device *td,
* this cpu: * this cpu:
*/ */
if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
ktime_t next_p;
u32 rem;
tick_do_timer_cpu = cpu; tick_do_timer_cpu = cpu;
tick_next_period = ktime_get(); next_p = ktime_get();
div_u64_rem(next_p, TICK_NSEC, &rem);
if (rem) {
next_p -= rem;
next_p += TICK_NSEC;
}
tick_next_period = next_p;
#ifdef CONFIG_NO_HZ_FULL #ifdef CONFIG_NO_HZ_FULL
/* /*
* The boot CPU may be nohz_full, in which case set * The boot CPU may be nohz_full, in which case set

View File

@ -647,43 +647,67 @@ static void tick_nohz_update_jiffies(ktime_t now)
touch_softlockup_watchdog_sched(); touch_softlockup_watchdog_sched();
} }
/* static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
* Updates the per-CPU time idle statistics counters
*/
static void
update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
{ {
ktime_t delta; ktime_t delta;
if (ts->idle_active) { if (WARN_ON_ONCE(!ts->idle_active))
delta = ktime_sub(now, ts->idle_entrytime); return;
if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
else
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
ts->idle_entrytime = now;
}
if (last_update_time) delta = ktime_sub(now, ts->idle_entrytime);
*last_update_time = ktime_to_us(now);
} write_seqcount_begin(&ts->idle_sleeptime_seq);
if (nr_iowait_cpu(smp_processor_id()) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
else
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) ts->idle_entrytime = now;
{
update_ts_time_stats(smp_processor_id(), ts, now, NULL);
ts->idle_active = 0; ts->idle_active = 0;
write_seqcount_end(&ts->idle_sleeptime_seq);
sched_clock_idle_wakeup_event(); sched_clock_idle_wakeup_event();
} }
static void tick_nohz_start_idle(struct tick_sched *ts) static void tick_nohz_start_idle(struct tick_sched *ts)
{ {
write_seqcount_begin(&ts->idle_sleeptime_seq);
ts->idle_entrytime = ktime_get(); ts->idle_entrytime = ktime_get();
ts->idle_active = 1; ts->idle_active = 1;
write_seqcount_end(&ts->idle_sleeptime_seq);
sched_clock_idle_sleep_event(); sched_clock_idle_sleep_event();
} }
static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
bool compute_delta, u64 *last_update_time)
{
ktime_t now, idle;
unsigned int seq;
if (!tick_nohz_active)
return -1;
now = ktime_get();
if (last_update_time)
*last_update_time = ktime_to_us(now);
do {
seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
if (ts->idle_active && compute_delta) {
ktime_t delta = ktime_sub(now, ts->idle_entrytime);
idle = ktime_add(*sleeptime, delta);
} else {
idle = *sleeptime;
}
} while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
return ktime_to_us(idle);
}
/** /**
* get_cpu_idle_time_us - get the total idle time of a CPU * get_cpu_idle_time_us - get the total idle time of a CPU
* @cpu: CPU number to query * @cpu: CPU number to query
@ -691,7 +715,10 @@ static void tick_nohz_start_idle(struct tick_sched *ts)
* counters if NULL. * counters if NULL.
* *
* Return the cumulative idle time (since boot) for a given * Return the cumulative idle time (since boot) for a given
* CPU, in microseconds. * CPU, in microseconds. Note this is partially broken due to
* the counter of iowait tasks that can be remotely updated without
* any synchronization. Therefore it is possible to observe backward
* values within two consecutive reads.
* *
* This time is measured via accounting rather than sampling, * This time is measured via accounting rather than sampling,
* and is as accurate as ktime_get() is. * and is as accurate as ktime_get() is.
@ -701,27 +728,9 @@ static void tick_nohz_start_idle(struct tick_sched *ts)
u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{ {
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
ktime_t now, idle;
if (!tick_nohz_active)
return -1;
now = ktime_get();
if (last_update_time) {
update_ts_time_stats(cpu, ts, now, last_update_time);
idle = ts->idle_sleeptime;
} else {
if (ts->idle_active && !nr_iowait_cpu(cpu)) {
ktime_t delta = ktime_sub(now, ts->idle_entrytime);
idle = ktime_add(ts->idle_sleeptime, delta);
} else {
idle = ts->idle_sleeptime;
}
}
return ktime_to_us(idle);
return get_cpu_sleep_time_us(ts, &ts->idle_sleeptime,
!nr_iowait_cpu(cpu), last_update_time);
} }
EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
@ -732,7 +741,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
* counters if NULL. * counters if NULL.
* *
* Return the cumulative iowait time (since boot) for a given * Return the cumulative iowait time (since boot) for a given
* CPU, in microseconds. * CPU, in microseconds. Note this is partially broken due to
* the counter of iowait tasks that can be remotely updated without
* any synchronization. Therefore it is possible to observe backward
* values within two consecutive reads.
* *
* This time is measured via accounting rather than sampling, * This time is measured via accounting rather than sampling,
* and is as accurate as ktime_get() is. * and is as accurate as ktime_get() is.
@ -742,26 +754,9 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
{ {
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
ktime_t now, iowait;
if (!tick_nohz_active) return get_cpu_sleep_time_us(ts, &ts->iowait_sleeptime,
return -1; nr_iowait_cpu(cpu), last_update_time);
now = ktime_get();
if (last_update_time) {
update_ts_time_stats(cpu, ts, now, last_update_time);
iowait = ts->iowait_sleeptime;
} else {
if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
ktime_t delta = ktime_sub(now, ts->idle_entrytime);
iowait = ktime_add(ts->iowait_sleeptime, delta);
} else {
iowait = ts->iowait_sleeptime;
}
}
return ktime_to_us(iowait);
} }
EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
@ -1094,10 +1089,16 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
return true; return true;
} }
static void __tick_nohz_idle_stop_tick(struct tick_sched *ts) /**
* tick_nohz_idle_stop_tick - stop the idle tick from the idle task
*
* When the next event is more than a tick into the future, stop the idle tick
*/
void tick_nohz_idle_stop_tick(void)
{ {
ktime_t expires; struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
int cpu = smp_processor_id(); int cpu = smp_processor_id();
ktime_t expires;
/* /*
* If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the
@ -1129,16 +1130,6 @@ static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
} }
} }
/**
* tick_nohz_idle_stop_tick - stop the idle tick from the idle task
*
* When the next event is more than a tick into the future, stop the idle tick
*/
void tick_nohz_idle_stop_tick(void)
{
__tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched));
}
void tick_nohz_idle_retain_tick(void) void tick_nohz_idle_retain_tick(void)
{ {
tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched)); tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched));

View File

@ -22,65 +22,82 @@ enum tick_nohz_mode {
/** /**
* struct tick_sched - sched tick emulation and no idle tick control/stats * struct tick_sched - sched tick emulation and no idle tick control/stats
* @sched_timer: hrtimer to schedule the periodic tick in high *
* resolution mode
* @check_clocks: Notification mechanism about clocksource changes
* @nohz_mode: Mode - one state of tick_nohz_mode
* @inidle: Indicator that the CPU is in the tick idle mode * @inidle: Indicator that the CPU is in the tick idle mode
* @tick_stopped: Indicator that the idle tick has been stopped * @tick_stopped: Indicator that the idle tick has been stopped
* @idle_active: Indicator that the CPU is actively in the tick idle mode; * @idle_active: Indicator that the CPU is actively in the tick idle mode;
* it is reset during irq handling phases. * it is reset during irq handling phases.
* @do_timer_lst: CPU was the last one doing do_timer before going idle * @do_timer_last: CPU was the last one doing do_timer before going idle
* @got_idle_tick: Tick timer function has run with @inidle set * @got_idle_tick: Tick timer function has run with @inidle set
* @stalled_jiffies: Number of stalled jiffies detected across ticks
* @last_tick_jiffies: Value of jiffies seen on last tick
* @sched_timer: hrtimer to schedule the periodic tick in high
* resolution mode
* @last_tick: Store the last tick expiry time when the tick * @last_tick: Store the last tick expiry time when the tick
* timer is modified for nohz sleeps. This is necessary * timer is modified for nohz sleeps. This is necessary
* to resume the tick timer operation in the timeline * to resume the tick timer operation in the timeline
* when the CPU returns from nohz sleep. * when the CPU returns from nohz sleep.
* @next_tick: Next tick to be fired when in dynticks mode. * @next_tick: Next tick to be fired when in dynticks mode.
* @idle_jiffies: jiffies at the entry to idle for idle time accounting * @idle_jiffies: jiffies at the entry to idle for idle time accounting
* @idle_waketime: Time when the idle was interrupted
* @idle_entrytime: Time when the idle call was entered
* @nohz_mode: Mode - one state of tick_nohz_mode
* @last_jiffies: Base jiffies snapshot when next event was last computed
* @timer_expires_base: Base time clock monotonic for @timer_expires
* @timer_expires: Anticipated timer expiration time (in case sched tick is stopped)
* @next_timer: Expiry time of next expiring timer for debugging purpose only
* @idle_expires: Next tick in idle, for debugging purpose only
* @idle_calls: Total number of idle calls * @idle_calls: Total number of idle calls
* @idle_sleeps: Number of idle calls, where the sched tick was stopped * @idle_sleeps: Number of idle calls, where the sched tick was stopped
* @idle_entrytime: Time when the idle call was entered
* @idle_waketime: Time when the idle was interrupted
* @idle_exittime: Time when the idle state was left * @idle_exittime: Time when the idle state was left
* @idle_sleeptime: Sum of the time slept in idle with sched tick stopped * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
* @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
* @timer_expires: Anticipated timer expiration time (in case sched tick is stopped)
* @timer_expires_base: Base time clock monotonic for @timer_expires
* @next_timer: Expiry time of next expiring timer for debugging purpose only
* @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick
* @last_tick_jiffies: Value of jiffies seen on last tick * @check_clocks: Notification mechanism about clocksource changes
* @stalled_jiffies: Number of stalled jiffies detected across ticks
*/ */
struct tick_sched { struct tick_sched {
struct hrtimer sched_timer; /* Common flags */
unsigned long check_clocks;
enum tick_nohz_mode nohz_mode;
unsigned int inidle : 1; unsigned int inidle : 1;
unsigned int tick_stopped : 1; unsigned int tick_stopped : 1;
unsigned int idle_active : 1; unsigned int idle_active : 1;
unsigned int do_timer_last : 1; unsigned int do_timer_last : 1;
unsigned int got_idle_tick : 1; unsigned int got_idle_tick : 1;
/* Tick handling: jiffies stall check */
unsigned int stalled_jiffies;
unsigned long last_tick_jiffies;
/* Tick handling */
struct hrtimer sched_timer;
ktime_t last_tick; ktime_t last_tick;
ktime_t next_tick; ktime_t next_tick;
unsigned long idle_jiffies; unsigned long idle_jiffies;
ktime_t idle_waketime;
/* Idle entry */
seqcount_t idle_sleeptime_seq;
ktime_t idle_entrytime;
/* Tick stop */
enum tick_nohz_mode nohz_mode;
unsigned long last_jiffies;
u64 timer_expires_base;
u64 timer_expires;
u64 next_timer;
ktime_t idle_expires;
unsigned long idle_calls; unsigned long idle_calls;
unsigned long idle_sleeps; unsigned long idle_sleeps;
ktime_t idle_entrytime;
ktime_t idle_waketime; /* Idle exit */
ktime_t idle_exittime; ktime_t idle_exittime;
ktime_t idle_sleeptime; ktime_t idle_sleeptime;
ktime_t iowait_sleeptime; ktime_t iowait_sleeptime;
unsigned long last_jiffies;
u64 timer_expires; /* Full dynticks handling */
u64 timer_expires_base;
u64 next_timer;
ktime_t idle_expires;
atomic_t tick_dep_mask; atomic_t tick_dep_mask;
unsigned long last_tick_jiffies;
unsigned int stalled_jiffies; /* Clocksource changes */
unsigned long check_clocks;
}; };
extern struct tick_sched *tick_get_tick_sched(int cpu); extern struct tick_sched *tick_get_tick_sched(int cpu);

View File

@ -5,18 +5,13 @@ GENERIC_VDSO_DIR := $(dir $(GENERIC_VDSO_MK_PATH))
c-gettimeofday-$(CONFIG_GENERIC_GETTIMEOFDAY) := $(addprefix $(GENERIC_VDSO_DIR), gettimeofday.c) c-gettimeofday-$(CONFIG_GENERIC_GETTIMEOFDAY) := $(addprefix $(GENERIC_VDSO_DIR), gettimeofday.c)
# This cmd checks that the vdso library does not contain absolute relocation # This cmd checks that the vdso library does not contain dynamic relocations.
# It has to be called after the linking of the vdso library and requires it # It has to be called after the linking of the vdso library and requires it
# as a parameter. # as a parameter.
# #
# $(ARCH_REL_TYPE_ABS) is defined in the arch specific makefile and corresponds # As a workaround for some GNU ld ports which produce unneeded R_*_NONE
# to the absolute relocation types printed by "objdump -R" and accepted by the # dynamic relocations, ignore R_*_NONE.
# dynamic linker.
ifndef ARCH_REL_TYPE_ABS
$(error ARCH_REL_TYPE_ABS is not set)
endif
quiet_cmd_vdso_check = VDSOCHK $@ quiet_cmd_vdso_check = VDSOCHK $@
cmd_vdso_check = if $(OBJDUMP) -R $@ | grep -E -h "$(ARCH_REL_TYPE_ABS)"; \ cmd_vdso_check = if $(READELF) -rW $@ | grep -v _NONE | grep -q " R_\w*_"; \
then (echo >&2 "$@: dynamic relocations are not supported"; \ then (echo >&2 "$@: dynamic relocations are not supported"; \
rm -f $@; /bin/false); fi rm -f $@; /bin/false); fi

View File

@ -13,7 +13,9 @@
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/ */
// Test that values in /proc/uptime increment monotonically. // Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
// monotonically. We don't test idle time monotonicity due to broken iowait
// task counting, cf: comment above get_cpu_idle_time_us()
#undef NDEBUG #undef NDEBUG
#include <assert.h> #include <assert.h>
#include <stdint.h> #include <stdint.h>
@ -25,20 +27,31 @@
int main(void) int main(void)
{ {
uint64_t start, u0, u1, i0, i1; uint64_t start, u0, u1, c0, c1;
int fd; int fd;
fd = open("/proc/uptime", O_RDONLY); fd = open("/proc/uptime", O_RDONLY);
assert(fd >= 0); assert(fd >= 0);
proc_uptime(fd, &u0, &i0); u0 = proc_uptime(fd);
start = u0; start = u0;
c0 = clock_boottime();
do { do {
proc_uptime(fd, &u1, &i1); u1 = proc_uptime(fd);
c1 = clock_boottime();
/* Is /proc/uptime monotonic ? */
assert(u1 >= u0); assert(u1 >= u0);
assert(i1 >= i0);
/* Is CLOCK_BOOTTIME monotonic ? */
assert(c1 >= c0);
/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
assert(c0 >= u0);
u0 = u1; u0 = u1;
i0 = i1; c0 = c1;
} while (u1 - start < 100); } while (u1 - start < 100);
return 0; return 0;

View File

@ -13,8 +13,10 @@
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/ */
// Test that values in /proc/uptime increment monotonically // Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
// while shifting across CPUs. // monotonically while shifting across CPUs. We don't test idle time
// monotonicity due to broken iowait task counting, cf: comment above
// get_cpu_idle_time_us()
#undef NDEBUG #undef NDEBUG
#include <assert.h> #include <assert.h>
#include <errno.h> #include <errno.h>
@ -42,10 +44,10 @@ static inline int sys_sched_setaffinity(pid_t pid, unsigned int len, unsigned lo
int main(void) int main(void)
{ {
uint64_t u0, u1, c0, c1;
unsigned int len; unsigned int len;
unsigned long *m; unsigned long *m;
unsigned int cpu; unsigned int cpu;
uint64_t u0, u1, i0, i1;
int fd; int fd;
/* find out "nr_cpu_ids" */ /* find out "nr_cpu_ids" */
@ -60,7 +62,9 @@ int main(void)
fd = open("/proc/uptime", O_RDONLY); fd = open("/proc/uptime", O_RDONLY);
assert(fd >= 0); assert(fd >= 0);
proc_uptime(fd, &u0, &i0); u0 = proc_uptime(fd);
c0 = clock_boottime();
for (cpu = 0; cpu < len * 8; cpu++) { for (cpu = 0; cpu < len * 8; cpu++) {
memset(m, 0, len); memset(m, 0, len);
m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long))); m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long)));
@ -68,11 +72,20 @@ int main(void)
/* CPU might not exist, ignore error */ /* CPU might not exist, ignore error */
sys_sched_setaffinity(0, len, m); sys_sched_setaffinity(0, len, m);
proc_uptime(fd, &u1, &i1); u1 = proc_uptime(fd);
c1 = clock_boottime();
/* Is /proc/uptime monotonic ? */
assert(u1 >= u0); assert(u1 >= u0);
assert(i1 >= i0);
/* Is CLOCK_BOOTTIME monotonic ? */
assert(c1 >= c0);
/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
assert(c0 >= u0);
u0 = u1; u0 = u1;
i0 = i1; c0 = c1;
} }
return 0; return 0;

View File

@ -19,10 +19,22 @@
#include <string.h> #include <string.h>
#include <stdlib.h> #include <stdlib.h>
#include <unistd.h> #include <unistd.h>
#include <time.h>
#include "proc.h" #include "proc.h"
static void proc_uptime(int fd, uint64_t *uptime, uint64_t *idle) static uint64_t clock_boottime(void)
{
struct timespec ts;
int err;
err = clock_gettime(CLOCK_BOOTTIME, &ts);
assert(err >= 0);
return (ts.tv_sec * 100) + (ts.tv_nsec / 10000000);
}
static uint64_t proc_uptime(int fd)
{ {
uint64_t val1, val2; uint64_t val1, val2;
char buf[64], *p; char buf[64], *p;
@ -43,18 +55,6 @@ static void proc_uptime(int fd, uint64_t *uptime, uint64_t *idle)
assert(p[3] == ' '); assert(p[3] == ' ');
val2 = (p[1] - '0') * 10 + p[2] - '0'; val2 = (p[1] - '0') * 10 + p[2] - '0';
*uptime = val1 * 100 + val2;
p += 4; return val1 * 100 + val2;
val1 = xstrtoull(p, &p);
assert(p[0] == '.');
assert('0' <= p[1] && p[1] <= '9');
assert('0' <= p[2] && p[2] <= '9');
assert(p[3] == '\n');
val2 = (p[1] - '0') * 10 + p[2] - '0';
*idle = val1 * 100 + val2;
assert(p + 4 == buf + rv);
} }

View File

@ -188,6 +188,80 @@ static int check_timer_create(int which)
return 0; return 0;
} }
int remain;
__thread int got_signal;
static void *distribution_thread(void *arg)
{
while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
return NULL;
}
static void distribution_handler(int nr)
{
if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
}
/*
* Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
* timer signals. This primarily tests that the kernel does not favour any one.
*/
static int check_timer_distribution(void)
{
int err, i;
timer_t id;
const int nthreads = 10;
pthread_t threads[nthreads];
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
printf("Check timer_create() per process signal distribution... ");
fflush(stdout);
remain = nthreads + 1; /* worker threads + this thread */
signal(SIGALRM, distribution_handler);
err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
if (err < 0) {
perror("Can't create timer\n");
return -1;
}
err = timer_settime(id, 0, &val, NULL);
if (err < 0) {
perror("Can't set timer\n");
return -1;
}
for (i = 0; i < nthreads; i++) {
if (pthread_create(&threads[i], NULL, distribution_thread, NULL)) {
perror("Can't create thread\n");
return -1;
}
}
/* Wait for all threads to receive the signal. */
while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
for (i = 0; i < nthreads; i++) {
if (pthread_join(threads[i], NULL)) {
perror("Can't join thread\n");
return -1;
}
}
if (timer_delete(id)) {
perror("Can't delete timer\n");
return -1;
}
printf("[OK]\n");
return 0;
}
int main(int argc, char **argv) int main(int argc, char **argv)
{ {
printf("Testing posix timers. False negative may happen on CPU execution \n"); printf("Testing posix timers. False negative may happen on CPU execution \n");
@ -217,5 +291,8 @@ int main(int argc, char **argv)
if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0) if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
return ksft_exit_fail(); return ksft_exit_fail();
if (check_timer_distribution() < 0)
return ksft_exit_fail();
return ksft_exit_pass(); return ksft_exit_pass();
} }