mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-06 05:06:29 +00:00
ed3b7923a8
41976 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
ed3b7923a8 |
Scheduler changes for v6.5:
- Scheduler SMP load-balancer improvements: - Avoid unnecessary migrations within SMT domains on hybrid systems. Problem: On hybrid CPU systems, (processors with a mixture of higher-frequency SMT cores and lower-frequency non-SMT cores), under the old code lower-priority CPUs pulled tasks from the higher-priority cores if more than one SMT sibling was busy - resulting in many unnecessary task migrations. Solution: The new code improves the load balancer to recognize SMT cores with more than one busy sibling and allows lower-priority CPUs to pull tasks, which avoids superfluous migrations and lets lower-priority cores inspect all SMT siblings for the busiest queue. - Implement the 'runnable boosting' feature in the EAS balancer: consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection. This improves CPU utilization for certain workloads, while leaves other key workloads unchanged. - Scheduler infrastructure improvements: - Rewrite the scheduler topology setup code by consolidating it into the build_sched_topology() helper function and building it dynamically on the fly. - Resolve the local_clock() vs. noinstr complications by rewriting the code: provide separate sched_clock_noinstr() and local_clock_noinstr() functions to be used in instrumentation code, and make sure it is all instrumentation-safe. - Fixes: - Fix a kthread_park() race with wait_woken() - Fix misc wait_task_inactive() bugs unearthed by the -rt merge: - Fix UP PREEMPT bug by unifying the SMP and UP implementations. - Fix task_struct::saved_state handling. - Fix various rq clock update bugs, unearthed by turning on the rq clock debugging code. - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger by creating enough cgroups, by removing the warnign and restricting window size triggers to PSI file write-permission or CAP_SYS_RESOURCE. - Propagate SMT flags in the topology when removing degenerate domain - Fix grub_reclaim() calculation bug in the deadline scheduler code - Avoid resetting the min update period when it is unnecessary, in psi_trigger_destroy(). - Don't balance a task to its current running CPU in load_balance(), which was possible on certain NUMA topologies with overlapping groups. - Fix the sched-debug printing of rq->nr_uninterruptible - Cleanups: - Address various -Wmissing-prototype warnings, as a preparation to (maybe) enable this warning in the future. - Remove unused code - Mark more functions __init - Fix shadow-variable warnings Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSatWQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j62xAAuGOx1LcDfRGC6WGQzp1zOdlsVQtnDvlS qL58zYSHgizprpVQ3j87SBaG4CHCdvd2Bo36yW0lNZS4nd203qdq7fkrMb3hPP/w egUQUzMegf5fF6BWldKeMjuHSt+twFQz/ZAKK8iSbAir6CHNAqbNst1oL0i/+Tyk o33hBs1hT5tnbFb1NSVZkX4k+qT3LzTW4K2QgjjGtkScr6yHh2BdEVefyigWOjdo 9s02d00ll9a2r+F5txlN7Dnw6TN7rmTXGMOJU5bZvBE90/anNiAorMXHJdEKCyUR u9+JtBdJWiCplGa/tSRcxT16ZW1VdtTnd9q66TDhXREd2UNDFqBEyg5Wl77K4Tlf vKFajmj/to+cTbuv6m6TVR+zyXpdEpdL6F04P44U3qiJvDobBqeDNKHHIqpmbHXl AXUXcPWTVAzXX1Ce5M+BeAgTBQ1T7C5tELILrTNQHJvO1s9VVBRFZ/l65Ps4vu7T wIZ781IFuopk0zWqHovNvgKrJ7oFmOQQZFttQEe8n6nafkjI7u+IZ8FayiGaUMRr 4GawFGUCEdYh8z9qyslGKe8Q/Rphfk6hxMFRYUJpDmubQ0PkMeDjDGq77jDGl1PF VqwSDEyOaBJs7Gqf/mem00JtzBmXhkhm1SEjggHMI2IQbr/eeBXoLQOn3CDapO/N PiDbtX760ic= =EWQA -----END PGP SIGNATURE----- Merge tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Scheduler SMP load-balancer improvements: - Avoid unnecessary migrations within SMT domains on hybrid systems. Problem: On hybrid CPU systems, (processors with a mixture of higher-frequency SMT cores and lower-frequency non-SMT cores), under the old code lower-priority CPUs pulled tasks from the higher-priority cores if more than one SMT sibling was busy - resulting in many unnecessary task migrations. Solution: The new code improves the load balancer to recognize SMT cores with more than one busy sibling and allows lower-priority CPUs to pull tasks, which avoids superfluous migrations and lets lower-priority cores inspect all SMT siblings for the busiest queue. - Implement the 'runnable boosting' feature in the EAS balancer: consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection. This improves CPU utilization for certain workloads, while leaves other key workloads unchanged. Scheduler infrastructure improvements: - Rewrite the scheduler topology setup code by consolidating it into the build_sched_topology() helper function and building it dynamically on the fly. - Resolve the local_clock() vs. noinstr complications by rewriting the code: provide separate sched_clock_noinstr() and local_clock_noinstr() functions to be used in instrumentation code, and make sure it is all instrumentation-safe. Fixes: - Fix a kthread_park() race with wait_woken() - Fix misc wait_task_inactive() bugs unearthed by the -rt merge: - Fix UP PREEMPT bug by unifying the SMP and UP implementations - Fix task_struct::saved_state handling - Fix various rq clock update bugs, unearthed by turning on the rq clock debugging code. - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger by creating enough cgroups, by removing the warnign and restricting window size triggers to PSI file write-permission or CAP_SYS_RESOURCE. - Propagate SMT flags in the topology when removing degenerate domain - Fix grub_reclaim() calculation bug in the deadline scheduler code - Avoid resetting the min update period when it is unnecessary, in psi_trigger_destroy(). - Don't balance a task to its current running CPU in load_balance(), which was possible on certain NUMA topologies with overlapping groups. - Fix the sched-debug printing of rq->nr_uninterruptible Cleanups: - Address various -Wmissing-prototype warnings, as a preparation to (maybe) enable this warning in the future. - Remove unused code - Mark more functions __init - Fix shadow-variable warnings" * tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle() sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop() sched/core: Fixed missing rq clock update before calling set_rq_offline() sched/deadline: Update GRUB description in the documentation sched/deadline: Fix bandwidth reclaim equation in GRUB sched/wait: Fix a kthread_park race with wait_woken() sched/topology: Mark set_sched_topology() __init sched/fair: Rename variable cpu_util eff_util arm64/arch_timer: Fix MMIO byteswap sched/fair, cpufreq: Introduce 'runnable boosting' sched/fair: Refactor CPU utilization functions cpuidle: Use local_clock_noinstr() sched/clock: Provide local_clock_noinstr() x86/tsc: Provide sched_clock_noinstr() clocksource: hyper-v: Provide noinstr sched_clock() clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX x86/vdso: Fix gettimeofday masking math64: Always inline u128 version of mul_u64_u64_shr() s390/time: Provide sched_clock_noinstr() loongarch: Provide noinstr sched_clock_read() ... |
||
Linus Torvalds
|
af96134dc8 |
RCU pull request for v6.5
This pull contains the following branches: doc.2023.05.10a: Documentation updates fixes.2023.05.11a: Miscellaneous fixes, perhaps most notably: o Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. o Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. o Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) o Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree.2023.05.10a: kvfree_rcu updates o Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). o Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. nocb.2023.05.11a: Callback-offloading updates o Fix a number of bugs involving the shrinker and lazy callbacks. rcu-tasks.2023.05.10a: Tasks RCU updates torture.2023.05.15a: Torture-test updates rcu-urgent.2023.06.06a: Urgent SRCU fix (already pulled) -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmSUuukTHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jLB5EACWArBYSbXh9kx6RP3LRkOd//fQWuqx z/RmHjMx3a2uIQpsbeAj+jrgHYzSOi7Afdnx2s0gUIWGjpF4d+e31eco9xTQtWIs A3/pXUlcTyaPXEZh5ro763UyBF/K003TAdo7EZAScTfDNp2knqGdEOyXTOXiAULX GH922kIqg0chbYaWocLY3g5mXeEm+kGY8GrDAB7/B3jHgoyylXzmSULDP4GQV7hw DkM0GOlc3TSzHonnNS6j1xboqY4HhWIDkBrD4Oh5P//ttMpb1b6gs1zEyjCQcNBe a6fnNF+0dUwANIZKroPn/L1uTGsEUhmLFkVK+XIuAit97yWI6t+aRH6TzHHYmkpu wVmLxv/FbJohP7ArWaI8l0gNl0vkli3ZgQXnRvSpCqIFR93AWVMeZsDTGOcLUdry AZEnuGXHnc9UB0KGOIras0o/EQezKq57JUV2bBZjl/GIDc3qiaJKnBhHysPc1iuE UfP052vCaoZxO3U/FrObQhjLZnstKBYHj8WolxMjIyNMlRIvDro6O1WG4+mjeLDP xdrjKGstsJh80CYDei+vJBXsbszhxv8yV4hCQX9JcDl3RjEqOOxgKUnAaP2mm02O MX33P3MZvSsHGoxkJpXDSlkQlbNqDBMIjZXbZLRF4o8fPhVmQU/4QlJN0iFOoXaQ 1qqGrerEzfn0Jw== =3LCd -----END PGP SIGNATURE----- Merge tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "Documentation updates Miscellaneous fixes, perhaps most notably: - Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. - Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. - Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree_rcu updates: - Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. Callback-offloading updates: - Fix a number of bugs involving the shrinker and lazy callbacks. Tasks RCU updates Torture-test updates" * tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) torture: Remove duplicated argument -enable-kvm for ppc64 doc/rcutorture: Add description of rcutorture.stall_cpu_block rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcutorture: Correct name of use_softirq module parameter locktorture: Add long_hold to adjust lock-hold delays rcu/nocb: Make shrinker iterate only over NOCB CPUs rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp rcu: Employ jiffies-based backstop to callback time limit rcu: Check callback-invocation time limit for rcuc kthreads rcu: Remove RCU_NONIDLE() rcu: Add more RCU files to kernel-api.rst rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker rcu/nocb: Fix shrinker race against callback enqueuer rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading ... |
||
Linus Torvalds
|
40e8e98f51 |
Power management updates for 6.5-rc1
- Introduce power capping core support for Intel TPMI (Topology Aware Register and PM Capsule Interface) and a TPMI interface driver for Intel RAPL (Zhang Rui, Dan Carpenter). - Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping driver (Zhang Rui). - Fix invalid initialization for pl4_supported field in the Intel RAPL power capping driver (Sumeet Pawnikar). - Clean up the intel_idle driver, make it work with VM guests that cannot use the MWAIT instruction and address the case in which the host may enter a deep idle state when the guest is idle (Arjan van de Ven). - Prevent cpufreq drivers that provide the ->adjust_perf() callback without a ->fast_switch() one which is used as a fallback from the former in some cases (Wyes Karny). - Fix some issues related to the AMD P-state cpufreq driver (Mario Limonciello, Wyes Karny). - Fix the energy_performance_preference attribute handling in the intel_pstate driver in passive mode (Tero Kristo). - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset (Kai-Heng Feng). - Correct spelling mistake in a comment in the hibernation code (Wang Honghui). - Add arch_resume_nosmt() prototype to avoid a "missing prototypes" build warning (Arnd Bergmann). - Restrict pm_pr_dbg() to system-wide power transitions and use it in a few additional places (Mario Limonciello). - Drop verification of in-params from genpd_add_device() and ensure that all of its callers will do it (Ulf Hansson). - Prevent possible integer overflows from occurring in genpd_parse_state() (Nikita Zhandarovich). - Reorder fieldls in 'struct devfreq_dev_status' to reduce its size somewhat (Christophe JAILLET). - Ensure that the Exynos PPMU driver is already loaded before the Exynos Bus driver starts probing so as to avoid a possible freeze loading of the kernel modules (Marek Szyprowski). - Fix variable deferencing before NULL check in the mtk-cci devfreq driver (Sukrut Bellary). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSZwPASHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxu44P/AouvVFDMt+eE76nPfNc10X1lswS0vrT X7LmylSDPyWiuz6p8MWGXB6T2nQ+3DbvSfVyBJ960ymkBnE/F9me8o3wB8eGbd6z ZvOD8+wVPXS4Cq8gUxy2zV1ul+o5IwwT20cYC6mWjasvByl13vTevN5d6ZQ9o6hS 1hAQQDd6JjsdLIUyU0EbE4aD+l4h96o45IFxbV86qVH77ywa6VMNdulRKmDcONj3 kM7jHFYL4xl0TfMjHp4IhGWXK32qGYgX1zYTOU5kSc11IExJfVzQcL2uQ9A0KSLp RJ0c93loUsHdMhenNkN4nSBFWBIaftKDLbS+5Ubt0DBuNN7kxWivEVts4DM/wxuB 72PNl5h8YglcW7LHH2IXb/6HEerzbj42+6y459o+M0DcNTq18gu19OQTK5IGtRrQ Yf6+5BhgLR3R1REg0eaBg6njtGq0f5fmW7Iqo52eA8cXhHU0MTDJE1p6ytfN40gH ViA+T8HB6Mh91lWHVftbwo3wONHGcfJy+S2hGM45V5LKEGeKHILgmw0nUzO7epWE 7VIPKGzkVd7h/Drk7X3nQR3DJFA/x5eNhjxt5LZD83cVVg34SS3ST5oH13FI9S7Q 8zwG5KoHTDrmYug3sxQ+Q9pq8MnOl0ZbgqlVfwyiWjKYmNMbg4elsQG/2zD3t0kv y8zXbr7Kr4QB =SZV7 -----END PGP SIGNATURE----- Merge tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add Intel TPMI (Topology Aware Register and PM Capsule Interface) support to the power capping subsystem, extend the intel_idle driver to work in VM guests where MWAIT is not available, extend the system-wide power management diagnostics, fix bugs and clean up code. Specifics: - Introduce power capping core support for Intel TPMI (Topology Aware Register and PM Capsule Interface) and a TPMI interface driver for Intel RAPL (Zhang Rui, Dan Carpenter) - Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping driver (Zhang Rui) - Fix invalid initialization for pl4_supported field in the Intel RAPL power capping driver (Sumeet Pawnikar) - Clean up the intel_idle driver, make it work with VM guests that cannot use the MWAIT instruction and address the case in which the host may enter a deep idle state when the guest is idle (Arjan van de Ven) - Prevent cpufreq drivers that provide the ->adjust_perf() callback without a ->fast_switch() one which is used as a fallback from the former in some cases (Wyes Karny) - Fix some issues related to the AMD P-state cpufreq driver (Mario Limonciello, Wyes Karny) - Fix the energy_performance_preference attribute handling in the intel_pstate driver in passive mode (Tero Kristo) - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset (Kai-Heng Feng) - Correct spelling mistake in a comment in the hibernation code (Wang Honghui) - Add arch_resume_nosmt() prototype to avoid a "missing prototypes" build warning (Arnd Bergmann) - Restrict pm_pr_dbg() to system-wide power transitions and use it in a few additional places (Mario Limonciello) - Drop verification of in-params from genpd_add_device() and ensure that all of its callers will do it (Ulf Hansson) - Prevent possible integer overflows from occurring in genpd_parse_state() (Nikita Zhandarovich) - Reorder fieldls in 'struct devfreq_dev_status' to reduce its size somewhat (Christophe JAILLET) - Ensure that the Exynos PPMU driver is already loaded before the Exynos Bus driver starts probing so as to avoid a possible freeze loading of the kernel modules (Marek Szyprowski) - Fix variable deferencing before NULL check in the mtk-cci devfreq driver (Sukrut Bellary)" * tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits) intel_idle: Add a "Long HLT" C1 state for the VM guest mode cpufreq: intel_pstate: Fix energy_performance_preference for passive cpufreq: amd-pstate: Add a kernel config option to set default mode cpufreq: amd-pstate: Set a fallback policy based on preferred_profile ACPI: CPPC: Add definition for undefined FADT preferred PM profile value cpufreq: amd-pstate: Set default governor to schedutil PM: domains: Move the verification of in-params from genpd_add_device() cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated cpufreq: amd-pstate: Write CPPC enable bit per-socket intel_idle: Add support for using intel_idle in a VM guest using just hlt cpufreq: Fail driver register if it has adjust_perf without fast_switch intel_idle: clean up the (new) state_update_enter_method function intel_idle: refactor state->enter manipulation into its own function platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages pinctrl: amd: Use pm_pr_dbg to show debugging messages ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume powercap: RAPL: Fix a NULL vs IS_ERR() bug powercap: RAPL: Fix CONFIG_IOSF_MBI dependency powercap: RAPL: fix invalid initialization for pl4_supported field ... |
||
Linus Torvalds
|
19300488c9 |
- Address -Wmissing-prototype warnings
- Remove repeated 'the' in comments - Remove unused current_untag_mask() - Document urgent tip branch timing - Clean up MSR kernel-doc notation - Clean up paravirt_ops doc - Update Srivatsa S. Bhat's maintained areas - Remove unused extern declaration acpi_copy_wakeup_routine() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmSZ6xIACgkQaDWVMHDJ krB9aQ/+NjB4CiWLbrnOYj9QYG6p1GE7lfu2dzIDdmcNuiai8htopXys54Igy3Rq BbIoW4E0SGK5E2OD7nLe4fBA/LpsYZTwDhGUu3SiovxLOoC5qkF0Q+6aVypPJE5o q7kn0Eo9IDL1dO0EbJptFDJRjk3K5caEoyXJRelarjIfPRbDEhUFaybVRykMZN9I 4AOxrlb9WFggT4gUE4+N0kWyEqdgI9/aguavmasaG4lBHZ5JAHNQPNIa8bkVSAPL wULAzsrGp96V3tVxdjDCzD9aumk4xlJq7gk+v7mfx013dg7Cjs074Xoi2Y+TmaC7 fdIZiGPJIkNToW+nENVO7BYtACSQhXeVTGxLQO/HNTDc//ZWiIUoJT2U4qu/6e6F aAIGoLwv68H4BghS2qx6Gz+BTIfl35mcPUb75MQhu+D84QZoZWrdamCYhsvHeZzc uC3nojrb6PBOth9nJsRae+j1zpRe/DT2LvHSWPJgK6EygOAi05ZfYUll/6sb0vze IXkUrVV1BvDDVpY9/HnE8RpDCDolP0/ezK9zsw48arZtkc+Qmw2WlD/2D98E+pSb MJPelbVmpzWTaoR4jDzXJCXkWe7CQJ5uPQj5azAE9l7YvnxgCQP5xnm5sLU9eyLu RsOwRzss0+3z44x5rJi9nSxQJ0LHfTAzW8/ZmNSZGHzi0ClszK0= =N82i -----END PGP SIGNATURE----- Merge tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Dave Hansen: "As usual, these are all over the map. The biggest cluster is work from Arnd to eliminate -Wmissing-prototype warnings: - Address -Wmissing-prototype warnings - Remove repeated 'the' in comments - Remove unused current_untag_mask() - Document urgent tip branch timing - Clean up MSR kernel-doc notation - Clean up paravirt_ops doc - Update Srivatsa S. Bhat's maintained areas - Remove unused extern declaration acpi_copy_wakeup_routine()" * tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine() Documentation: virt: Clean up paravirt_ops doc x86/mm: Remove unused current_untag_mask() x86/mm: Remove repeated word in comments x86/lib/msr: Clean up kernel-doc notation x86/platform: Avoid missing-prototype warnings for OLPC x86/mm: Add early_memremap_pgprot_adjust() prototype x86/usercopy: Include arch_wb_cache_pmem() declaration x86/vdso: Include vdso/processor.h x86/mce: Add copy_mc_fragile_handle_tail() prototype x86/fbdev: Include asm/fb.h as needed x86/hibernate: Declare global functions in suspend.h x86/entry: Add do_SYSENTER_32() prototype x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled() x86/mm: Include asm/numa.h for set_highmem_pages_init() x86: Avoid missing-prototype warnings for doublefault code x86/fpu: Include asm/fpu/regset.h x86: Add dummy prototype for mk_early_pgtbl_32() x86/pci: Mark local functions as 'static' x86/ftrace: Move prepare_ftrace_return prototype to header ... |
||
Linus Torvalds
|
cd336f6562 |
Time, timekeeping and related device driver updates:
- Core: - A set of fixes, cleanups and enhancements to the posix timer code: - Prevent another possible live lock scenario in the exit() path, which affects POSIX_CPU_TIMERS_TASK_WORK enabled architectures. - Fix a loop termination issue which was reported syzcaller/KSAN in the posix timer ID allocation code. That triggered a deeper look into the posix-timer code which unearthed more small issues. - Add missing READ/WRITE_ONCE() annotations - Fix or remove completely outdated comments - Document places which are subtle and completely undocumented. - Add missing hrtimer modes to the trace event decoder - Small cleanups and enhancements all over the place - Drivers: - Rework the Hyper-V clocksource and sched clock setup code - Remove a deprecated clocksource driver - Small fixes and enhancements all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZctYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoQqpEACAzSDKH7lpWFXwXMR0j6GKi5erZEYg I0PtvK70+zV0Fk2DOXplxDIis3qtYPSinSEK5Kzycyf+MNOuWKaB8//4PsCbD6aR 3DWWi5xUGAOkmtFQMlmQBKahDcfFhSTN7GeYYcTd5TaQIwVPjb+Qh9XuOG5d/O0q 66jeiYRkiOqTwOM8jZqWOWeKOt56xd9BmCvSdNbnAbZZEjUNAFT7LN6Oux2I91BU VUh1luoKPPKRFQN07oWaBKg/V7Iib10SCejDmAd6QKZQg1A/UulJl0WBOtRYr3RG 81b05dG2Ulp2ygm5YuRWtkpIC6pcFKjhh6WzDio0do6aOtWHOn5oefqJqUmufM9K h6WRRmGecoSvon1euzciy/ArzzoI0fSHYtB2cgBaBS7ImGb+7hDk0RkNota4alLG gfn98Rufqx/FXHFUJeHxoZTQbW1PUoU0VIF1r/nmSwDRJsxmqPyCW+52/TOjnSo1 cvrTflAu/JYazhggsIpOCyVlnaiXZnfGUdbvnzlhaB1vQ8M4X+aq48b1sPU9XawN VB9WDdh8Ba6w8ebALjM0apNaLYLq71P9dzs5dHsmjMkqx2rA+Kafc/jIu37h6ZEp RBFDcI/WAPnp6lS6w2v0F852xBzIJe4zbTIrUivuVxcTo5Rh8iW0AexmHFN2PN4N MGyyJHu8bMdIww== =hRV9 -----END PGP SIGNATURE----- Merge tag 'timers-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Time, timekeeping and related device driver updates: Core: - A set of fixes, cleanups and enhancements to the posix timer code: - Prevent another possible live lock scenario in the exit() path, which affects POSIX_CPU_TIMERS_TASK_WORK enabled architectures. - Fix a loop termination issue which was reported syzcaller/KSAN in the posix timer ID allocation code. That triggered a deeper look into the posix-timer code which unearthed more small issues. - Add missing READ/WRITE_ONCE() annotations - Fix or remove completely outdated comments - Document places which are subtle and completely undocumented. - Add missing hrtimer modes to the trace event decoder - Small cleanups and enhancements all over the place Drivers: - Rework the Hyper-V clocksource and sched clock setup code - Remove a deprecated clocksource driver - Small fixes and enhancements all over the place" * tag 'timers-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe dt-bindings: timers: Add Ralink SoCs timer clocksource/drivers/hyper-v: Rework clocksource and sched clock setup dt-bindings: timer: brcm,kona-timer: convert to YAML clocksource/drivers/imx-gpt: Fold <soc/imx/timer.h> into its only user clk: imx: Drop inclusion of unused header <soc/imx/timer.h> hrtimer: Add missing sparse annotations to hrtimer locking clocksource/drivers/imx-gpt: Use only a single name for functions clocksource/drivers/loongson1: Move PWM timer to clocksource framework dt-bindings: timer: Add Loongson-1 clocksource MIPS: Loongson32: Remove deprecated PWM timer clocksource clocksource/drivers/ingenic-timer: Use pm_sleep_ptr() macro tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode(). posix-timers: Add sys_ni_posix_timers() prototype tick/rcu: Fix bogus ratelimit condition alarmtimer: Remove unnecessary (void *) cast alarmtimer: Remove unnecessary initialization of variable 'ret' posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERS posix-timers: Polish coding style in a few places posix-timers: Remove pointless comments ... |
||
Linus Torvalds
|
9244724fbf |
A large update for SMP management:
- Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZb/YTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRoOD/9vAiGI3IhGyZcX/RjXxauSHf8Pmqll 05jUubFi5Vi3tKI1ubMOsnMmJTw2yy5xDyS/iGj7AcbRLq9uQd3iMtsXXHNBzo/X FNxnuWTXYUj0vcOYJ+j4puBumFzzpRCprqccMInH0kUnSWzbnaQCeelicZORAf+w zUYrswK4HpBXHDOnvPw6Z7MYQe+zyDQSwjSftstLyROzu+lCEw/9KUaysY2epShJ wHClxS2XqMnpY4rJ/CmJAlRhD0Plb89zXyo6k9YZYVDWoAcmBZy6vaTO4qoR171L 37ApqrgsksMkjFycCMnmrFIlkeb7bkrYDQ5y+xqC3JPTlYDKOYmITV5fZ83HD77o K7FAhl/CgkPq2Ec+d82GFLVBKR1rijbwHf7a0nhfUy0yMeaJCxGp4uQ45uQ09asi a/VG2T38EgxVdseC92HRhcdd3pipwCb5wqjCH/XdhdlQrk9NfeIeP+TxF4QhADhg dApp3ifhHSnuEul7+HNUkC6U+Zc8UeDPdu5lvxSTp2ooQ0JwaGgC5PJq3nI9RUi2 Vv826NHOknEjFInOQcwvp6SJPfcuSTF75Yx6xKz8EZ3HHxpvlolxZLq+3ohSfOKn 2efOuZO5bEu4S/G2tRDYcy+CBvNVSrtZmCVqSOS039c8quBWQV7cj0334cjzf+5T TRiSzvssbYYmaw== =Y8if -----END PGP SIGNATURE----- Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ... |
||
Linus Torvalds
|
0017387938 |
Updates for the interrupt subsystem:
- Core: - Convert the interrupt descriptor storage to a maple tree to overcome the limitations of the radixtree + fixed size bitmap. This allows to handle real large servers with a huge number of guests without imposing a huge memory overhead on everyone. - Implement optional retriggering of interrupts which utilize the fasteoi handler to work around a GICv3 architecture issue. - Drivers: - A set of fixes and updates for the Loongson/Loongarch related drivers. - Workaound for an ASR8601 integration hickup which ends up with CPU numbering which can't be represented in the GIC implementation. - The usual set of boring fixes and updates all over the place. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZaf0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoUgaD/9PwYvqeR12oJRz24gso6NNxlZ2nZMh KIApeIV4eoDPjM9Qdc38Tz+LbiClZuhiNRmxqzkaKmLsNObeYJhNvRg14bQA/Mfy t1kqO2rlNTSeRR5Y0XiQqFMIKCcpMQeKXzJ+ZQspiX08kCSl9UqBKpE5HgbTVFiB yTwdtagi8zrDr8KuETe+REKcwvoLippHrnz6evVMOXtN6Jdtz2maZT9dVDAvaVl7 pXgarzMScEFTfK8Q6wjH9ayC1UXPmSIIiirWZHYvtaAXh4/IY1U1LY4KqkVPQ1MB 7thv4CbE/Iyzw78FUMtrsMwqOV/fu71SfBh9uV6kFxoySFJ/gJ8QLOcAqkbNGyBf 9oRWuuY0LJZl1AKtmU6jNaS17JeOpdIdB44cAXBArYMbJUStZ2Mo2EDdw+/IHNzM tt32+Pjtg8BVrFLcR7gQ5rzAktz6678x9Qk6ys+KUCG3tuFyKx6RiD+f0DARe1Td DflNoJ6WTqwoimvTokAg6QGPUyHKJLe29ciSuUjHXaHJAE9xyeGtfJQWNLwpjejD KYYo5mb8cJc917Yx8LUOj02jVtebQtLezDtnUyGXrIR+ze4ZUQxhgvSKRDxX7E56 CjG3ghx6Ty1sTpjL4dHtXLJ1NgitFyjJ7VQlVqxWNQBNI+m3l2zmxj4zB9eI6v1R qyjKEgnFi60vSw== =qKCo -----END PGP SIGNATURE----- Merge tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core: - Convert the interrupt descriptor storage to a maple tree to overcome the limitations of the radixtree + fixed size bitmap. This allows us to handle very large servers with a huge number of guests without imposing a huge memory overhead on everyone - Implement optional retriggering of interrupts which utilize the fasteoi handler to work around a GICv3 architecture issue Drivers: - A set of fixes and updates for the Loongson/Loongarch related drivers - Workaound for an ASR8601 integration hickup which ends up with CPU numbering which can't be represented in the GIC implementation - The usual set of boring fixes and updates all over the place" * tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) Revert "irqchip/mxs: Include linux/irqchip/mxs.h" irqchip/jcore-aic: Fix missing allocation of IRQ descriptors irqchip/stm32-exti: Fix warning on initialized field overwritten irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype irqchip/mxs: Include linux/irqchip/mxs.h irqchip/clps711x: Remove unused clps711x_intc_init() function irqchip/mmp: Remove non-DT codepath irqchip/ftintc010: Mark all function static irqdomain: Include internals.h for function prototypes irqchip/loongson-eiointc: Add DT init support dt-bindings: interrupt-controller: Add Loongson EIOINTC irqchip/loongson-eiointc: Fix irq affinity setting during resume irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag irqchip/loongson-liointc: Fix IRQ trigger polarity irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment irqchip/loongson-pch-pic: Fix initialization of HT vector register irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs genirq: Allow fasteoi handler to resend interrupts on concurrent handling genirq: Expand doc for PENDING and REPLAY flags ... |
||
Linus Torvalds
|
a0433f8cae |
for-6.5/block-2023-06-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8dwQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpilGD/9Yys1oxIXJpRf00fzrylAlBthRxMjFQVWw zAut106hAQiBHvU8IkmGA3MvEFVHxtzwYhHI7IR8K3aZBIqscweCqmVI9JyogJw9 U9Twnzel47VmuKdM94FeoN+hbj1fP8EWTjzmy67/zEEfFCdmHvNlMi3lSrGYIpFy 39LxTB99Y4UarM5PtWbes37GYYljzMSWKuo4AfBkvq1eQa+sZ0Vq2xAABKq3UM7f apqhgHtkJooRePDP0eQp+kAyyVMgW2jIK+oIdJDxNF3CKTu2w40RzaYz6fp+jVSU H4R/xS59GW4/xql+VBJDh/qJg9K62DPPYjlW8BmSR8+IjvfFpsyH3/MacE50CD3P 20fs/Mnj49H79fDrQEHJI53cOOb2EmUitbwLbvOcColNTPpt8loBtdQxjF2RMU8R Nyort9DJPFclYCxky1LYg1CNEC2Ln4Zy/jD47wPvqRmOQphOoVlV/hPnOEqvjaZC 49Vn70W2DeE9cXvYI7ha+XIg6/oj+Gs3iusEbV08Ci7EAtXgI+ZUUsQ97K8UNiUh h2lqSJtuI7lBpYP9sf+BeCch5UCC+xGYyTdoM5f58lehWBBPtbs0g7S9RyRyOYxe n+yxEUo3dAGzJ/xsKAjinbZfeWIpr0b1TkAh4w3Cq/BKzRr9Bp8lBAxYuancbQ+Y 1ADPteUOTA== =zP4Y -----END PGP SIGNATURE----- Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ... |
||
Linus Torvalds
|
3eccc0c886 |
for-6.5/splice-2023-06-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8QgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpupIEADKEZvpxDyaxHjYZFFeoSJRkh+AEJHe0Xtr J5vUL8t8zmAV3F7i8XaoAEcR0dC0VQcoTc8fAOty71+5hsc7gvtyyNjqU/YWRVqK Xr+VJuSJ+OGx3MzpRWEkepagfPyqP5cyyCOK6gqIgqzc3IwqkR/3QHVRc6oR8YbY AQd7tqm2fQXK9WDHEy5hcaQeqb9uKZjQQoZejpPPerpJM+9RMgKxpCGtnLLIUhr/ sgl7KyLIQPBmveO2vfOR+dmsJBqsLqneqkXDKMAIfpeVEEkHHAlCH4E5Ne1XUS+s ie4If+reuyn1Ktt5Ry1t7w2wr8cX1fcay3K28tgwjE2Bvremc5YnYgb3pyUDW38f tXXkpg/eTXd/Pn0Crpagoa9zJ927tt5JXIO1/PagPEP1XOqUuthshDFsrVqfqbs+ 36gqX2JWB4NJTg9B9KBHA3+iVCJyZLjUqOqws7hOJOvhQytZVm/IwkGBg1Slhe1a J5WemBlqX8lTgXz0nM7cOhPYTZeKe6hazCcb5VwxTUTj9SGyYtsMfqqTwRJO9kiF j1VzbOAgExDYe+GvfqOFPh9VqZho66+DyOD/Xtca4eH7oYyHSmP66o8nhRyPBPZA maBxQhUkPQn4/V/0fL2TwIdWYKsbj8bUyINKPZ2L35YfeICiaYIctTwNJxtRmItB M3VxWD3GZQ== =KhW4 -----END PGP SIGNATURE----- Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ... |
||
Linus Torvalds
|
64bf6ae93e |
v6.5/vfs.misc
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZJU4SwAKCRCRxhvAZXjc ojOTAP9gT/z1gasIf8OwDHb4inZGnVpHh2ApKLvgMXH6ICtwRgD+OBtOcf438Lx1 cpFSTVJlh21QXMOOXWHe/LRUV2kZ5wI= =zdfx -----END PGP SIGNATURE----- Merge tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Miscellaneous features, cleanups, and fixes for vfs and individual fs Features: - Use mode 0600 for file created by cachefilesd so it can be run by unprivileged users. This aligns them with directories which are already created with mode 0700 by cachefilesd - Reorder a few members in struct file to prevent some false sharing scenarios - Indicate that an eventfd is used a semaphore in the eventfd's fdinfo procfs file - Add a missing uapi header for eventfd exposing relevant uapi defines - Let the VFS protect transitions of a superblock from read-only to read-write in addition to the protection it already provides for transitions from read-write to read-only. Protecting read-only to read-write transitions allows filesystems such as ext4 to perform internal writes, keeping writers away until the transition is completed Cleanups: - Arnd removed the architecture specific arch_report_meminfo() prototypes and added a generic one into procfs.h. Note, we got a report about a warning in amdpgpu codepaths that suggested this was bisectable to this change but we concluded it was a false positive - Remove unused parameters from split_fs_names() - Rename put_and_unmap_page() to unmap_and_put_page() to let the name reflect the order of the cleanup operation that has to unmap before the actual put - Unexport buffer_check_dirty_writeback() as it is not used outside of block device aops - Stop allocating aio rings from highmem - Protecting read-{only,write} transitions in the VFS used open-coded barriers in various places. Replace them with proper little helpers and document both the helpers and all barrier interactions involved when transitioning between read-{only,write} states - Use flexible array members in old readdir codepaths Fixes: - Use the correct type __poll_t for epoll and eventfd - Replace all deprecated strlcpy() invocations, whose return value isn't checked with an equivalent strscpy() call - Fix some kernel-doc warnings in fs/open.c - Reduce the stack usage in jffs2's xattr codepaths finally getting rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] royally annoying compilation warning - Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not fmode_t is required to avoid fmode_t to integer degradation warnings - Create coredumps with O_WRONLY instead of O_RDWR. There's a long explanation in that commit how O_RDWR is actually a bug which we found out with the help of Linus and git archeology - Fix "no previous prototype" warnings in the pipe codepaths - Add overflow calculations for remap_verify_area() as a signed addition overflow could be triggered in xfstests - Fix a null pointer dereference in sysv - Use an unsigned variable for length calculations in jfs avoiding compilation warnings with gcc 13 - Fix a dangling pipe pointer in the watch queue codepath - The legacy mount option parser provided as a fallback by the VFS for filesystems not yet converted to the new mount api did prefix the generated mount option string with a leading ',' causing issues for some filesystems - Fix a repeated word in a comment in fs.h - autofs: Update the ctime when mtime is updated as mandated by POSIX" * tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) readdir: Replace one-element arrays with flexible-array members fs: Provide helpers for manipulating sb->s_readonly_remount fs: Protect reconfiguration of sb read-write from racing writes eventfd: add a uapi header for eventfd userspace APIs autofs: set ctime as well when mtime changes on a dir eventfd: show the EFD_SEMAPHORE flag in fdinfo fs/aio: Stop allocating aio rings from HIGHMEM fs: Fix comment typo fs: unexport buffer_check_dirty_writeback fs: avoid empty option when generating legacy mount string watch_queue: prevent dangling pipe pointer fs.h: Optimize file struct to prevent false sharing highmem: Rename put_and_unmap_page() to unmap_and_put_page() cachefiles: Allow the cache to be non-root init: remove unused names parameter in split_fs_names() jfs: Use unsigned variable for length calculations fs/sysv: Null check to prevent null-ptr-deref bug fs: use UB-safe check for signed addition overflow in remap_verify_area procfs: consolidate arch_report_meminfo declaration fs: pipe: reveal missing function protoypes ... |
||
Rafael J. Wysocki
|
9b8f36398e |
Merge branches 'pm-sleep' and 'pm-domains'
Merge updates related to system-wide power management and generic power domains (genpd) updates for 6.5-rc1: - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset (Kai-Heng Feng). - Correct spelling mistake in a comment in the hibernation code (Wang Honghui). - Add arch_resume_nosmt() prototype to avoid a "missing prototypes" build warning (Arnd Bergmann). - Restrict pm_pr_dbg() to system-wide power transitions and use it in a few additional places (Mario Limonciello). - Drop verification of in-params from genpd_add_device() and ensure that all of its callers will do it (Ulf Hansson). - Prevent possible integer overflows from occurring in genpd_parse_state() (Nikita Zhandarovich). * pm-sleep: platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages pinctrl: amd: Use pm_pr_dbg to show debugging messages ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume PM: suspend: add a arch_resume_nosmt() prototype PM: hibernate: Correct spelling mistake in a comment PM: suspend: Fix pm_suspend_target_state handling for !CONFIG_PM * pm-domains: PM: domains: Move the verification of in-params from genpd_add_device() PM: domains: fix integer overflow issues in genpd_parse_state() |
||
Thomas Gleixner
|
f121ab7f4a |
irqchip updates for 6.5
- A number of Loogson/Loogarch fixes - Allow the core code to retrigger an interrupt that has fired while the same interrupt is being handled on another CPU, papering over a GICv3 architecture issue - Work around an integration problem on ASR8601, where the CPU numbering isn't representable in the GIC implementation... - Add some missing interrupt to the STM32 irqchip - A bunch of warning squashing triggered by W=1 builds -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmSWHlEPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDgDcP/AyuNjjt9ybZ8LT5hGYRBKTfO3bsssiV/ZIE NRbfwK6nWTurA3tN9XypYouB3HfKvTvDYHzeo9Yi2y51GI20o7noSfVNHHld5dAE WTBYcfc6n+1KqGNA1tPVeE403jd50TnmOmFaXhN1Vn/KQVQKs+LmjoVdS7HXKmwu PLk1tbFSGh5+vgLQTlhze5Wz7nU59lJ5WfCJ8negYls/5l3FseGgxEuffWKytElJ I6PbnRyd9y7R/KYmkobFjS98cjeNznAxJEm/2InVpcb3IwUSEyJUHgNwbs6KSeYf 4BdVyxSAQlFPgxO17UKQxblYu/HuFY/NWghkg7dn6ZgVubj52rtrl9txUId57HGC pYbSVm0bMevqlgHDYHold+Q4I6DqltyHKgiA9UfU9IQj4MV/KXbDjtng/ih4ZkHe rw+BnckyrhZ5K5p3nDlzS/TNQT1baKqNFgx/O5HPV27NT8ARynqoFxNUzA0z37v3 x5dj34kiIW2kAGJDOP9+9jj2uF4yX0MKdNxdMs8PM9hYAsO0AqItNsXEQoUuzJF+ u1hhGZ1H6HYmY2WphXD57fnIR/7Hm6Iz00XC17+1jUP+wsefhAbreOyJIbbcpENF IIxrcIxAPl3PQPsj8n98/9fKQ6u/QfrvWUH+6aoFQH/Np7RTFWmDh8bqucjuXsGS 8gxR3REr =quHC -----END PGP SIGNATURE----- Merge tag 'irqchip-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - A number of Loogson/Loogarch fixes - Allow the core code to retrigger an interrupt that has fired while the same interrupt is being handled on another CPU, papering over a GICv3 architecture issue - Work around an integration problem on ASR8601, where the CPU numbering isn't representable in the GIC implementation... - Add some missing interrupt to the STM32 irqchip - A bunch of warning squashing triggered by W=1 builds Link: https://lore.kernel.org/r/20230623224345.3577134-1-maz@kernel.org |
||
Linus Torvalds
|
afa4bb778e |
workqueue: clean up WORK_* constant types, clarify masking
Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); | ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void *' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
8a28a0b6f1 |
Networking fixes for 6.4-rc8, including fixes from ipsec, bpf,
mptcp and netfilter. Current release - regressions: - netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain - eth: mlx5e: - fix scheduling of IPsec ASO query while in atomic - free IRQ rmap and notifier on kernel shutdown Current release - new code bugs: - phy: manual remove LEDs to ensure correct ordering Previous releases - regressions: - mptcp: fix possible divide by zero in recvmsg() - dsa: revert "net: phy: dp83867: perform soft reset and retain established link" Previous releases - always broken: - sched: netem: acquire qdisc lock in netem_change() - bpf: - fix verifier id tracking of scalars on spill - fix NULL dereference on exceptions - accept function names that contain dots - netfilter: disallow element updates of bound anonymous sets - mptcp: ensure listener is unhashed before updating the sk status - xfrm: - add missed call to delete offloaded policies - fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets - selftests: fixes for FIPS mode - dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling - eth: sfc: use budget for TX completions Misc: - wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0 Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmSUZO0SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkBjAP/RfTUYdlPqz9jSvz0HmQt2Er39HyVb9I pzEpJSQGfO+eyIrlxmleu8cAaW5HdvyfMcBgr04uh+Jf06s+VJrD95IO9zDHHKoC 86itYNKMS3fSt1ivzg49i5uq66MhjtAcfIOB9HMOAQ2Jd+DYlzyWOOHw28ZAxsBZ Q6TU97YEMuU4FdLkoKob1aVswC5cPxNx2IH9NagfbtijaYZqeN9ZX9EI5yMUyH8f 5gboqOhXUQK0MQLM5TFySHeoayyQ+tRBz24nF0/6lWiRr+xzMTEKdkFpRza7Mxzj S8NxN3C+zOf96gic6kYOXmM6y0sOlbwC9JoeWTp8Tuh6DEYi6xLC2XkiYJ51idZg PElgRpkM1ddqvvFWFgZlNik5z0vbGnJH7pt0VuOSNntxE60cdQwvWEOr09vvPcS5 0nMVD0uc8pds2h4hit+sdLltcVnOgoNUYr1/sI6oydofa1BrLnhFPF7z/gUs9foD NuCchiaBF11yBGKufcNBNEB4w35g3Kcu6TGhHb168OJi+UnSnwlI0Ccw7iO10pkv RjefhR60+wZC6+leo57nZeYqaLQJuALY0QYFsyeM+T0MGSYkbH24CmbNdSmO4MRr +VX2CwIqeIds4Hx31o0Feu+FaJqXw46/2nrSDxel/hlCJnGSMXZTw+b/4pFEHLP+ l71ijZpJqV1S =GH2b -----END PGP SIGNATURE----- Merge tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from ipsec, bpf, mptcp and netfilter. Current release - regressions: - netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain - eth: mlx5e: - fix scheduling of IPsec ASO query while in atomic - free IRQ rmap and notifier on kernel shutdown Current release - new code bugs: - phy: manual remove LEDs to ensure correct ordering Previous releases - regressions: - mptcp: fix possible divide by zero in recvmsg() - dsa: revert "net: phy: dp83867: perform soft reset and retain established link" Previous releases - always broken: - sched: netem: acquire qdisc lock in netem_change() - bpf: - fix verifier id tracking of scalars on spill - fix NULL dereference on exceptions - accept function names that contain dots - netfilter: disallow element updates of bound anonymous sets - mptcp: ensure listener is unhashed before updating the sk status - xfrm: - add missed call to delete offloaded policies - fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets - selftests: fixes for FIPS mode - dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling - eth: sfc: use budget for TX completions Misc: - wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0" * tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits) revert "net: align SO_RCVMARK required privileges with SO_MARK" net: wwan: iosm: Convert single instance struct member to flexible array sch_netem: acquire qdisc lock in netem_change() selftests: forwarding: Fix race condition in mirror installation wifi: mac80211: report all unusable beacon frames mptcp: ensure listener is unhashed before updating the sk status mptcp: drop legacy code around RX EOF mptcp: consolidate fallback and non fallback state machine mptcp: fix possible list corruption on passive MPJ mptcp: fix possible divide by zero in recvmsg() mptcp: handle correctly disconnect() failures bpf: Force kprobe multi expected_attach_type for kprobe_multi link bpf/btf: Accept function names that contain dots Revert "net: phy: dp83867: perform soft reset and retain established link" net: mdio: fix the wrong parameters netfilter: nf_tables: Fix for deleting base chains with payload netfilter: nfnetlink_osf: fix module autoload netfilter: nf_tables: drop module reference after updating chain netfilter: nf_tables: disallow timeout for anonymous sets netfilter: nf_tables: disallow updates of anonymous sets ... |
||
Linus Torvalds
|
5950a0066f |
cgroup: Fixes for v6.4-rc7
It's late but here are two bug fixes. Both fix problems which can be severe but are very confined in scope. The risk to most use cases should be minimal. * Fix for an old bug which triggers if a cgroup subsystem is remounted to a different hierarchy while someone is reading its cgroup.procs/tasks file. The risk is pretty low given how seldom cgroup subsystems are moved across hierarchies. * We moved cpus_read_lock() outside of cgroup internal locks a while ago but forgot to update the legacy_freezer leading to lockdep triggers. Fixed. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJNz6g4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGS9zAP9lGszI1Zgvjz+qlU0dmE96yUEuqEg7Tfwcqxr3 Y+hHyAEArgNGnCoPfu4NAWQDZ31AgPUdL8EFqx6pY9Vq9R0oFg0= =uaB+ -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "It's late but here are two bug fixes. Both fix problems which can be severe but are very confined in scope. The risk to most use cases should be minimal. - Fix for an old bug which triggers if a cgroup subsystem is remounted to a different hierarchy while someone is reading its cgroup.procs/tasks file. The risk is pretty low given how seldom cgroup subsystems are moved across hierarchies. - We moved cpus_read_lock() outside of cgroup internal locks a while ago but forgot to update the legacy_freezer leading to lockdep triggers. Fixed" * tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Do not corrupt task iteration when rebinding subsystem cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}() |
||
Ben Dooks
|
ccaa4926c2 |
hrtimer: Add missing sparse annotations to hrtimer locking
Sparse warns about lock imbalance vs. the hrtimer_base lock due to missing sparse annotations: kernel/time/hrtimer.c:175:33: warning: context imbalance in 'lock_hrtimer_base' - wrong count at exit kernel/time/hrtimer.c:1301:28: warning: context imbalance in 'hrtimer_start_range_ns' - unexpected unlock kernel/time/hrtimer.c:1336:28: warning: context imbalance in 'hrtimer_try_to_cancel' - unexpected unlock kernel/time/hrtimer.c:1457:9: warning: context imbalance in '__hrtimer_get_remaining' - unexpected unlock Add the annotations to the relevant functions. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230621075928.394481-1-ben.dooks@codethink.co.uk |
||
Jakub Kicinski
|
59bb14bda2 |
bpf-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZJK5DwAKCRDbK58LschI gyUtAQD4gT4BEVHRqvniw9yyqYo0BvElAznutDq7o9kFHFep2gEAoksEWS84OdZj 0L5mSKjXrpHKzmY/jlMrVIcTb3VzOw0= =gAYE -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-06-21 We've added 7 non-merge commits during the last 14 day(s) which contain a total of 7 files changed, 181 insertions(+), 15 deletions(-). The main changes are: 1) Fix a verifier id tracking issue with scalars upon spill, from Maxim Mikityanskiy. 2) Fix NULL dereference if an exception is generated while a BPF subprogram is running, from Krister Johansen. 3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0, from Florent Revest. 4) Fix expected_attach_type enforcement for kprobe_multi link, from Jiri Olsa. 5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Force kprobe multi expected_attach_type for kprobe_multi link bpf/btf: Accept function names that contain dots selftests/bpf: add a test for subprogram extables bpf: ensure main program has an extable bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable. selftests/bpf: Add test cases to assert proper ID tracking on spill bpf: Fix verifier id tracking of scalars on spill ==================== Link: https://lore.kernel.org/r/20230621101116.16122-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
dad9774dea |
A single regression fix for a regression fix:
For a long time the tick was aligned to clock MONOTONIC so that the tick event happened at a multiple of nanoseconds per tick starting from clock MONOTONIC = 0. At some point this changed as the refined jiffies clocksource which is used during boot before the TSC or other clocksources becomes usable, was adjusted with a boot offset, so that time 0 is closer to the point where the kernel starts. This broke the assumption in the tick code that when the tick setup happens early on ktime_get() will return a multiple of nanoseconds per tick. As a consequence applications which aligned their periodic execution so that it does not collide with the tick were not longer guaranteed that the tick period starts from time 0. The fix for this regression was to realign the tick when it is initially set up to a multiple of tick periods. That works as long as the underlying tick device supports periodic mode, but breaks under certain conditions when the tick device supports only one shot mode. Depending on the offset, the alignment delta to clock MONOTONIC can get in a range where the minimal programming delta of the underlying clock event device is larger than the calculated delta to the next tick. This results in a boot hang as the tick code tries to play catch up, but as the tick never fires jiffies are not advanced so it keeps trying for ever. Solve this by moving the tick alignement into the NOHZ / HIGHRES enablement code because at that point it is guaranteed that the underlying clocksource is high resolution capable and not longer depending on the tick. This is far before user space starts, so at the point where applications try to align their timers, the old behaviour of the tick happening at a multiple of nanoseconds per tick starting from clock MONOTONIC = 0 is restored. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSTUNQTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoQQjD/4+zHK7IxRK+1k1lIStxksEudVeNuqK 7RZKNTawZFSChrxGmv5uhmqYDK9/tudnZot+uyAXI81JHnnGSXbco+VrAA2DQ2v2 w/oxhAHIAobJWDzUuCqT2YsseDVNoTGMrTTPrL4klXqhVtShVOLTI/603+macEiw Olqmd7XxlyhCloNqoMIh18EwSNIs78kP3YAQHUmq3NoRkJT5c3wWKvZy38WOxSJ9 Jjpw75tePfcwWte9fGVq+oJ8hOWkPAKN5hFL8420RrEgdIkxKlhVmuyG48ieZKH1 LE1brB5iaU8+M91PevZMRlD7oio8u/d6xX1vT3Cqz9UBe1NLj1I0ToX6COhwSNxm 50U0HJ5wGlpodu4IyBHXTG45UAVwpOg/EgDlFigeP3Nxl6h45ugrA010r6FqE1zV KltjNGsk/OD5d8ywW+yManU2mySm9znqmWN2zbjRGm138Tuz7vglcBhFB6jEJFUF V8vYPLwI4J2gqifx8NoFBIfLuZp/+D4LOQiJ7XRc9CVm3JLK8Xi8fxnHGTBwJHIn 45WU4hzmY7fjR/gGhAyG7lH6tlo7VTC9/lrO+YtAy2hL2sK14+m/ZH7836I5bAhw EgSyVlcfR6tgO55I3JgkbNIyO5TUAZPZIAY1TlyR53xXvMkb9aZTh9UMm1xMJ3uX +sQaLAuDFNUQSg== =/uvx -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single regression fix for a regression fix: For a long time the tick was aligned to clock MONOTONIC so that the tick event happened at a multiple of nanoseconds per tick starting from clock MONOTONIC = 0. At some point this changed as the refined jiffies clocksource which is used during boot before the TSC or other clocksources becomes usable, was adjusted with a boot offset, so that time 0 is closer to the point where the kernel starts. This broke the assumption in the tick code that when the tick setup happens early on ktime_get() will return a multiple of nanoseconds per tick. As a consequence applications which aligned their periodic execution so that it does not collide with the tick were not longer guaranteed that the tick period starts from time 0. The fix for this regression was to realign the tick when it is initially set up to a multiple of tick periods. That works as long as the underlying tick device supports periodic mode, but breaks under certain conditions when the tick device supports only one shot mode. Depending on the offset, the alignment delta to clock MONOTONIC can get in a range where the minimal programming delta of the underlying clock event device is larger than the calculated delta to the next tick. This results in a boot hang as the tick code tries to play catch up, but as the tick never fires jiffies are not advanced so it keeps trying for ever. Solve this by moving the tick alignement into the NOHZ / HIGHRES enablement code because at that point it is guaranteed that the underlying clocksource is high resolution capable and not longer depending on the tick. This is far before user space starts, so at the point where applications try to align their timers, the old behaviour of the tick happening at a multiple of nanoseconds per tick starting from clock MONOTONIC = 0 is restored" * tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/common: Align tick period during sched_timer setup |
||
Marc Zyngier
|
a82f3119d5 |
Merge branch irq/misc-6.5 into irq/irqchip-next
* irq/misc-6.5: : . : Misc cleanups: : : - Add a number of missing prototypes : - Mark global symbol as static where needed : - Drop some now useless non-DT code paths : - Add a missing interrupt mapping to the STM32 irqchip : - Silence another STM32 warning when building with W=1 : - Fix the jcore-aic driver that actually never worked... : . Revert "irqchip/mxs: Include linux/irqchip/mxs.h" irqchip/jcore-aic: Fix missing allocation of IRQ descriptors irqchip/stm32-exti: Fix warning on initialized field overwritten irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype irqchip/mxs: Include linux/irqchip/mxs.h irqchip/clps711x: Remove unused clps711x_intc_init() function irqchip/mmp: Remove non-DT codepath irqchip/ftintc010: Mark all function static irqdomain: Include internals.h for function prototypes Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
Jiri Olsa
|
db8eae6bc5 |
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Fixes:
|
||
Florent Revest
|
9724160b39 |
bpf/btf: Accept function names that contain dots
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
Fixes:
|
||
Linus Torvalds
|
2e30b97343 |
Tracing fixes for 6.4:
- Fix MAINTAINERS file to point to proper mailing list for rtla and rv The mailing list pointed to linux-trace-devel instead of linux-trace-kernel. The former is for the tracing libraries and the latter is for anything in the Linux kernel tree. The wrong mailing list was used because linux-trace-kernel did not exist when rtla and rv were created. - User events: . Fix matching of dynamic events to their user events When user writes to dynamic_events file, a lookup of the registered dynamic events are made, but there were some cases that a match could be incorrectly made. . Add auto cleanup of user events Have the user events automatically get removed when the last reference (file descriptor) is closed. This was asked for to prevent leaks of user events hanging around needing admins to clean them up. . Add persistent logic (but not let user space use it yet) In some cases, having a persistent user event (one that does not get cleaned up automatically) is useful. But there's still debates about how to expose this to user space. The infrastructure is added, but the API is not. . Update the selftests Update the user event selftests to reflect the above changes. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZJGrABQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgRDAQDvF8ktlcn+gqyUxt2OcTlbBh0jqS0b FKXYdq6FTgfWYQD/ctunFbPdzn4D6Kc/lG8p4QxpMmtA19BUOPwEt3CkAwM= =CDzr -----END PGP SIGNATURE----- Merge tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix MAINTAINERS file to point to proper mailing list for rtla and rv The mailing list pointed to linux-trace-devel instead of linux-trace-kernel. The former is for the tracing libraries and the latter is for anything in the Linux kernel tree. The wrong mailing list was used because linux-trace-kernel did not exist when rtla and rv were created. - User events: - Fix matching of dynamic events to their user events When user writes to dynamic_events file, a lookup of the registered dynamic events is made, but there were some cases that a match could be incorrectly made. - Add auto cleanup of user events Have the user events automatically get removed when the last reference (file descriptor) is closed. This was asked for to prevent leaks of user events hanging around needing admins to clean them up. - Add persistent logic (but not let user space use it yet) In some cases, having a persistent user event (one that does not get cleaned up automatically) is useful. But there's still debates about how to expose this to user space. The infrastructure is added, but the API is not. - Update the selftests Update the user event selftests to reflect the above changes" * tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/user_events: Document auto-cleanup and remove dyn_event refs selftests/user_events: Adapt dyn_test to non-persist events selftests/user_events: Ensure auto cleanup works as expected tracing/user_events: Add auto cleanup and future persist flag tracing/user_events: Track refcount consistently via put/get tracing/user_events: Store register flags on events tracing/user_events: Remove user_ns walk for groups selftests/user_events: Add perf self-test for empty arguments events selftests/user_events: Clear the events after perf self-test selftests/user_events: Add ftrace self-test for empty arguments events tracing/user_events: Fix the incorrect trace record for empty arguments events tracing: Modify print_fields() for fields output order tracing/user_events: Handle matching arguments that is null from dyn_events tracing/user_events: Prevent same name but different args event tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list |
||
Wen Yang
|
a7e282c777 |
tick/rcu: Fix bogus ratelimit condition
The ratelimit logic in report_idle_softirq() is broken because the
exit condition is always true:
static int ratelimit;
if (ratelimit < 10)
return false; ---> always returns here
ratelimit++; ---> no chance to run
Make it check for >= 10 instead.
Fixes:
|
||
Li zeming
|
fc4b4d96f7 |
alarmtimer: Remove unnecessary (void *) cast
Pointers of type void * do not require a type cast when they are assigned to a real pointer. Signed-off-by: Li zeming <zeming@nfschina.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230609182059.4509-1-zeming@nfschina.com |
||
Li zeming
|
986af8dc5a |
alarmtimer: Remove unnecessary initialization of variable 'ret'
ret is assigned before checked, so it does not need to initialize the variable Signed-off-by: Li zeming <zeming@nfschina.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230609182856.4660-1-zeming@nfschina.com |
||
Lukas Bulwahn
|
b9a40f24d8 |
posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERS
Commit c78f261e5dcb ("posix-timers: Clarify posix_timer_fn() comments") turns an ifdef CONFIG_HIGH_RES_TIMERS into an conditional on "IS_ENABLED(CONFIG_HIGHRES_TIMERS)"; note that the new conditional refers to "HIGHRES_TIMERS" not "HIGH_RES_TIMERS" as before. Fix this typo introduced in that refactoring. Fixes: c78f261e5dcb ("posix-timers: Clarify posix_timer_fn() comments") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230609094643.26253-1-lukas.bulwahn@gmail.com |
||
Thomas Gleixner
|
b96ce4931f |
posix-timers: Polish coding style in a few places
Make it consistent with the TIP tree documentation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.888493625@linutronix.de |
||
Thomas Gleixner
|
200dbd6d14 |
posix-timers: Remove pointless comments
Documenting the obvious is just consuming space for no value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.832240451@linutronix.de |
||
Thomas Gleixner
|
84999b8bdb |
posix-timers: Clarify posix_timer_fn() comments
Make the issues vs. SIG_IGN understandable and remove the 15 years old promise that a proper solution is already on the horizon. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/874jnrdmrq.ffs@tglx |
||
Thomas Gleixner
|
02972d7955 |
posix-timers: Clarify posix_timer_rearm() comment
Yet another incomprehensible piece of art. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.724863461@linutronix.de |
||
Thomas Gleixner
|
c575689d66 |
posix-timers: Comment SIGEV_THREAD_ID properly
Replace the word salad. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.672220780@linutronix.de |
||
Thomas Gleixner
|
52f090b164 |
posix-timers: Add proper comments in do_timer_create()
The comment about timer lifetime at the end of the function is misplaced and uncomprehensible. Make it understandable and put it at the right place. Add a new comment about the visibility of the new timer ID to user space. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.619897296@linutronix.de |
||
Thomas Gleixner
|
640fe745d7 |
posix-timers: Document nanosleep() details
The descriptions for common_nsleep() is wrong and common_nsleep_timens() lacks any form of comment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.567072835@linutronix.de |
||
Thomas Gleixner
|
3561fcb402 |
posix-timers: Document sys_clock_settime() permissions in place
The documentation of sys_clock_settime() permissions is at a random place and mostly word salad. Remove it and add a concise comment into sys_clock_settime(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.514700292@linutronix.de |
||
Thomas Gleixner
|
65cade468d |
posix-timers: Document sys_clock_getoverrun()
Document the syscall in detail and with coherent sentences. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.462051641@linutronix.de |
||
Thomas Gleixner
|
a86e928433 |
posix-timers: Document common_clock_get() correctly
Replace another confusing and inaccurate set of comments. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.409169321@linutronix.de |
||
Thomas Gleixner
|
01679b5db7 |
posix-timers: Document sys_clock_getres() correctly
The decades old comment about Posix clock resolution is confusing at best. Remove it and add a proper explanation to sys_clock_getres(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.356427330@linutronix.de |
||
Thomas Gleixner
|
8cc96ca2c7 |
posix-timers: Split release_posix_timers()
release_posix_timers() is called for cleaning up both hashed and unhashed timers. The cases are differentiated by an argument and the usage is hideous. Seperate the actual free path out and use it for unhashed timers. Provide a function for hashed timers. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.301432503@linutronix.de |
||
Thomas Gleixner
|
11fbe6cd41 |
posix-timers: Remove pointless irqsafe from hash_lock
All usage of hash_lock is in thread context. No point in using spin_lock_irqsave()/irqrestore() for a single usage site. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.249063953@linutronix.de |
||
Thomas Gleixner
|
72786ff23d |
posix-timers: Set k_itimer:: It_signal to NULL on exit()
Technically it's not required to set k_itimer::it_signal to NULL on exit() because there is no other thread anymore which could lookup the timer concurrently. Set it to NULL for consistency sake and add a comment to that effect. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.196462644@linutronix.de |
||
Thomas Gleixner
|
028cf5eaa1 |
posix-timers: Annotate concurrent access to k_itimer:: It_signal
k_itimer::it_signal is read lockless in the RCU protected hash lookup, but it can be written concurrently in the timer_create() and timer_delete() path. Annotate these places with READ_ONCE() and WRITE_ONCE() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.143596887@linutronix.de |
||
Thomas Gleixner
|
ae88967d71 |
posix-timers: Add comments about timer lookup
Document how the timer ID validation in the hash table works. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.091081515@linutronix.de |
||
Thomas Gleixner
|
8d44b958a1 |
posix-timers: Cleanup comments about timer ID tracking
Describe the hash table properly and remove the IDR leftover comments. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183313.038444551@linutronix.de |
||
Thomas Gleixner
|
7d99090266 |
posix-timers: Clarify timer_wait_running() comment
Explain it better and add the CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y aspect for completeness. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230425183312.985681995@linutronix.de |
||
Thomas Gleixner
|
8ce8849dd1 |
posix-timers: Ensure timer ID search-loop limit is valid
posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation. This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point. But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem: CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0; So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true: if (sig->posix_timer_id == start) break; While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness. Rewrite it so that all id operations are under the hash lock. Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx |
||
Thomas Gleixner
|
9d9e522010 |
posix-timers: Prevent RT livelock in itimer_delete()
itimer_delete() has a retry loop when the timer is concurrently expired. On
non-RT kernels this just spin-waits until the timer callback has completed,
except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK
enabled.
In that case and on RT kernels the existing task could live lock when
preempting the task which does the timer delivery.
Replace spin_unlock() with an invocation of timer_wait_running() to handle
it the same way as the other retry loops in the posix timer code.
Fixes:
|
||
Arnd Bergmann
|
8091f56ee4 |
irqdomain: Include internals.h for function prototypes
irq_domain_debugfs_init() is defined in irqdomain.c, but the declaration is in a header that is not included here: kernel/irq/irqdomain.c:1965:13: error: no previous prototype for 'irq_domain_debugfs_init' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200432.554240-1-arnd@kernel.org |
||
Hao Jia
|
ebb83d84e4 |
sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
After commit |
||
Hao Jia
|
96500560f0 |
sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
There is a double update_rq_clock() invocation: __balance_push_cpu_stop() update_rq_clock() __migrate_task() update_rq_clock() Sadly select_fallback_rq() also needs update_rq_clock() for __do_set_cpus_allowed(), it is not possible to remove the update from __balance_push_cpu_stop(). So remove it from __migrate_task() and ensure all callers of this function call update_rq_clock() prior to calling it. Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20230613082012.49615-3-jiahao.os@bytedance.com |
||
Hao Jia
|
cab3ecaed5 |
sched/core: Fixed missing rq clock update before calling set_rq_offline()
When using a cpufreq governor that uses cpufreq_add_update_util_hook(), it is possible to trigger a missing update_rq_clock() warning for the CPU hotplug path: rq_attach_root() set_rq_offline() rq_offline_rt() __disable_runtime() sched_rt_rq_enqueue() enqueue_top_rt_rq() cpufreq_update_util() data->func(data, rq_clock(rq), flags) Move update_rq_clock() from sched_cpu_deactivate() (one of it's callers) into set_rq_offline() such that it covers all set_rq_offline() usage. Additionally change rq_attach_root() to use rq_lock_irqsave() so that it will properly manage the runqueue clock flags. Suggested-by: Ben Segall <bsegall@google.com> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20230613082012.49615-2-jiahao.os@bytedance.com |