For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.
This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.
So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period. As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.
Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.
Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6841 ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace the deprecated[1] use of strncpy() in bacct_add_tsk(). Since this
is UAPI, include trailing padding in the copy.
Link: https://github.com/KSPP/linux/issues/90 [1]
Link: https://lkml.kernel.org/r/20240711171308.work.995-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Cc: "Dr. Thomas Orgis" <thomas.orgis@uni-hamburg.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ismael Luceno <ismael@iodev.co.uk>
Cc: Peng Liu <liupeng256@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/resource_kunit.o
Link: https://lkml.kernel.org/r/20240529-md-kernel-resource_kunit-v1-1-bb719784b714@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If hung_task_panic is enabled, don't consider the value of
hung_task_warnings and display the information of the hung tasks.
In some cases, hung_task_panic might not be initially set up, after
several hung tasks occur, the hung_task_warnings count reaches zero. If
hung_task_panic is set up later, it may not display any helpful hung task
info in dmesg, only showing messages like:
Kernel panic - not syncing: hung_task: blocked tasks
CPU: 3 PID: 58 Comm: khungtaskd Not tainted 6.10.0-rc3 #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
panic+0x2f3/0x320
watchdog+0x2dd/0x510
? __pfx_watchdog+0x10/0x10
kthread+0xe0/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Link: https://lkml.kernel.org/r/20240613033159.3446265-1-leonylgao@gmail.com
Signed-off-by: Yongliang Gao <leonylgao@tencent.com>
Reviewed-by: Huang Cun <cunhuang@tencent.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: John Siddle <jsiddle@redhat.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Following warning is reported, so remove these duplicated header
including:
./kernel/crash_reserve.c: linux/kexec.h is included more than once.
This is just a clean code, no logic changed.
Link: https://lkml.kernel.org/r/20240606091427.3512314-1-haowenchao22@gmail.com
Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With nearly 20 taint flags and respective characters, it's getting a bit
difficult to remember what each taint flag character means. Add verbose
logging of the set taints in the format:
Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN
in dump_stack_print_info() when there are taints.
Note that the "negative flag" G is not included.
Link: https://lkml.kernel.org/r/7321e306166cb2ca2807ab8639e665baa2462e9c.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert print_tainted() to use struct seq_buf internally in order to be
more aware of the buffer constraints as well as make it easier to extend
in follow-up work.
Link: https://lkml.kernel.org/r/cb6006fa7c0f82a6b6885e8eea2920fcdc4fc9d0.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After adding min_heap_sift_up(), the naming convention has been adjusted
to maintain consistency with the min_heap_sift_up(). Consequently,
min_heapify() has been renamed to min_heap_sift_down().
Link: https://lkml.kernel.org/CAP-5=fVcBAxt8Mw72=NCJPRJfjDaJcqk4rjbadgouAEAHz_q1A@mail.gmail.com
Link: https://lkml.kernel.org/r/20240524152958.919343-13-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a third parameter 'args' for the 'less' and 'swp' functions in the
'struct min_heap_callbacks'. This additional parameter allows these
comparison and swap functions to handle extra arguments when necessary.
Link: https://lkml.kernel.org/r/20240524152958.919343-9-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement a type-safe interface for min_heap using strong type pointers
instead of void * in the data field. This change includes adding small
macro wrappers around functions, enabling the use of __minheap_cast and
__minheap_obj_size macros for type casting and obtaining element size.
This implementation removes the necessity of passing element size in
min_heap_callbacks. Additionally, introduce the MIN_HEAP_PREALLOCATED
macro for preallocating some elements.
Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
Link: https://lkml.kernel.org/r/20240524152958.919343-5-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "treewide: Refactor heap related implementation", v6.
This patch series focuses on several adjustments related to heap
implementation. Firstly, a type-safe interface has been added to the
min_heap, along with the introduction of several new functions to enhance
its functionality. Additionally, the heap implementation for bcache and
bcachefs has been replaced with the generic min_heap implementation from
include/linux. Furthermore, several typos have been corrected.
Previous discussion with Kent Overstreet:
https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
This patch (of 16):
Replace 'artifically' with 'artificially'.
Replace 'irrespecive' with 'irrespective'.
Replace 'futher' with 'further'.
Replace 'sufficent' with 'sufficient'.
Link: https://lkml.kernel.org/r/20240524152958.919343-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20240524152958.919343-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use this_cpu_try_cmpxchg() instead of this_cpu_cmpxchg (*ptr, old, new) ==
old in try_release_thread_stack_to_cache. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg (and
related move instruction in front of cmpxchg).
No functional change intended.
[ubizjak@gmail.com: simplify the for loop a bit]
Link: https://lkml.kernel.org/r/20240523214442.21102-1-ubizjak@gmail.com
Link: https://lkml.kernel.org/r/20240523073530.8128-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Current release - regressions:
- ipv6: bring NLM_DONE out to a separate recv() again
Current release - new code bugs:
- wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API
Previous releases - regressions:
- wifi: mac80211: fix monitor channel setting with chanctx emulation
(probably most awaited of the fixes in this PR, tracked by Thorsten)
- usb: ax88179_178a: bring back reset on init, if PHY is disconnected
- bpf: fix UML x86_64 compile failure with BPF
- bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit
with malicious BPF
- eth: mvpp2: use slab_build_skb() for packets in slab, driver was
missed during API refactoring
- wifi: iwlwifi: add missing unlock of mvm mutex
Previous releases - always broken:
- ipv6: add a number of missing null-checks for in6_dev_get(), in case
IPv6 disabling races with the datapath
- bpf: fix reg_set_min_max corruption of fake_reg
- sched: act_ct: add netns as part of the key of tcf_ct_flow_table
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZ0VAAACgkQMUZtbf5S
IrtMnQ//b0YNnC2PduSn6fDnDamyZW3vjqwXQ6K0DsgSzEIiAtEd6LbkPN4vAcpp
k634dHseQjTuAcsTZxisIs32nC2up9q/t/+6XD8VSaQbSzKhB+rFDviUxfGJWjt4
MZRK0mDcmib2tXAEfYnMi+QjvC5S+ZSHLpemDdzTI3AyKcPynqLcM1PcC0CGS5GS
6MpvRAtEgTAkXd2rc4WAbOcmd8NLJN80f/srRDXFVqrXy8f6adaULvCvzSXSiQy8
peUaPhI6BYNBL2Tzjp3D+Nh54ks3Ol8MeqaGYsuJHtgd+/I+/YWzYc74an8BuEwR
C6fszbH7i64WaQUI5ZhX/1Da0CTesNxzsPgeAFP3qEe20r53vN0NiFjRrHpO02El
lew9Hrx27Zzt9k3eSdtC3GGj/S93PYjE5RRuSClQrW8fUqETZ8dFocbrNAraHGMv
rDOqIT3XMg/BIBw9ADxizAgsrFC0QbBShQPs2iMuuVwmrWj9DEC0GKlt3KxyPT36
fl4w3gGRdIDz/ZTXKQZtta3Z4ckaKiTw8jbNXxteBDEHErFYYND+4XDzK/uIqHCe
0IoVWVUnhVfKOuGBIDGIFDsAvbgqTcVd+wZTB4SxZsbXISzpfYLcrM4qXf4YQNNb
MeIQg0Zwjm+xdLGXVCt8wBBGmj4EK9uMa3wjYu3lGREgxyH42eI=
=Lb9b
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bpf and netfilter.
Happy summer solstice! The line count is a bit inflated by a selftest
and update to a driver's FW interface header, in reality this is
slightly below average for us. We are expecting one driver fix from
Intel, but there are no big known issues.
Current release - regressions:
- ipv6: bring NLM_DONE out to a separate recv() again
Current release - new code bugs:
- wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API
Previous releases - regressions:
- wifi: mac80211: fix monitor channel setting with chanctx emulation
(probably most awaited of the fixes in this PR, tracked by Thorsten)
- usb: ax88179_178a: bring back reset on init, if PHY is disconnected
- bpf: fix UML x86_64 compile failure with BPF
- bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit
with malicious BPF
- eth: mvpp2: use slab_build_skb() for packets in slab, driver was
missed during API refactoring
- wifi: iwlwifi: add missing unlock of mvm mutex
Previous releases - always broken:
- ipv6: add a number of missing null-checks for in6_dev_get(), in case
IPv6 disabling races with the datapath
- bpf: fix reg_set_min_max corruption of fake_reg
- sched: act_ct: add netns as part of the key of tcf_ct_flow_table"
* tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings
selftests: virtio_net: add forgotten config options
bnxt_en: Restore PTP tx_avail count in case of skb_pad() error
bnxt_en: Set TSO max segs on devices with limits
bnxt_en: Update firmware interface to 1.10.3.44
net: stmmac: Assign configured channel value to EXTTS event
net: do not leave a dangling sk pointer, when socket creation fails
net/tcp_ao: Don't leak ao_info on error-path
ice: Fix VSI list rule with ICE_SW_LKUP_LAST type
ipv6: bring NLM_DONE out to a separate recv() again
selftests: add selftest for the SRv6 End.DX6 behavior with netfilter
selftests: add selftest for the SRv6 End.DX4 behavior with netfilter
netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core
seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors
netfilter: ipset: Fix suspicious rcu_dereference_protected()
selftests: openvswitch: Set value to nla flags.
octeontx2-pf: Fix linking objects into multiple modules
octeontx2-pf: Add error handling to VLAN unoffload handling
virtio_net: fixing XDP for fully checksummed packets handling
virtio_net: checksum offloading handling fix
...
- Restrict gen-API tests for synthetic and kprobe events to only be built as
modules, as they generate dynamic events that cannot be removed, causing
ftracetest and startup selftests to fail.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZy6HobHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bqtYIAMLap5hV/w9Gh5b32hOF
/FS/oqGTIs8wfvZq2PBOruFmmvhrqjvpbZVTU9aNUr2lywYALM+jgO3ElSLIoZdz
5s8Wsnic5a2DvG23r/S5u80f85Gxy14e5fvCcCT/3Bvw1ip65XdMXqUwh9oM4zHh
i8rmeIIJmVspHD9bxTREsosB8/LKvSx6GNzLrHwHyL5UepDgj/r5hLvyEuY3fyuo
hazbvsZbHi+aduAS3it+BnhMoFLgLzqrYi8dl1fPY+xmnGI2LZZkds1mfD1JmjBB
AVm9gOWKpW+HHoxeMEMcAs8mhithR7VGA2V2zdsOmRzndytKhUghHWvgcrBZWvl6
D5Y=
=BNpD
-----END PGP SIGNATURE-----
Merge tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- Restrict gen-API tests for synthetic and kprobe events to only be
built as modules, as they generate dynamic events that cannot be
removed, causing ftracetest and startup selftests to fail
* tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Build event generation tests only as modules
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmZwh44UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNRBA//Q09J/SADHi63fjpStx+Gvo5h6TbM
L4gqYsjxpi7CfXFwlBtFRjk9Q0osRDxbDWTuZ8gMcJONlRdHpZFil2gYSEacImsn
tAkrQpV32U1oNua+kgoIkQTHwNIKjA9odYZ4pyJ0AZvnB5Z62B841r8GAaTADg++
fGOuCBYZeuioCAjPUN2KZtkCKdhiu823Gwe2z9U6SJyCdPqRFjpBuumDoNvCTrCB
UJuc5DqWSNk2rZXZQG6RSLeOOZZwRf9s2ATU96T/9Lp0m6qqxPPisHkWscjhx5Ve
W7z2IWGFrNzJ8ABKwBK/NUMQbs3WzsepyPqZdoo//PkhMjQlfb+5iPitJWM6qmdM
6jgj2HkDzX2OtR9u6VOcOKKwz4NQnf4JcHRUDjq8vQ3eKYOTcDLx4VR8O/Ullmhf
pZL4klNXpBrw7DLYurTlpbm9jUmMCev9DvuSYJmyRjq7jA+8Cph6+clGriIbljqn
9hCqSnbufDxySwB0unYu9zwnC4bN+Yzcgr4qYFoA+zdj5eYloaJvPhwOh6MPsQaO
DJlCt6Wfw4SqD3afxaJnzw4/SBRuPA8ISoxTXVJUg7Q+NfUI8HBDO4YihiqJ7cm0
yvD0mFvweJVEpX2slDyob58xYgkmL8TaIPErJ9A/EO30W0nm+nQzXDR+cOa9VqAc
txcTscOv5YMLLMk=
=nYky
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"A single LSM/IMA patch to fix a problem caused by sleeping while in a
RCU critical section"
* tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ima: Avoid blocking in RCU read-side critical section
Mainly MM singleton fixes. And a couple of ocfs2 regression fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZnCEQAAKCRDdBJ7gKXxA
jmgSAQDk3BYs1n67cnwx/Zi04yMYDyfYTCYg2udPfT2a+GpmbwD+N5dJd/vCztXH
5eLpP11xd/yr2+I9FefyZeUuA80KtgQ=
=2agY
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"
* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kcov: don't lose track of remote references during softirqs
mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
mm: fix possible OOB in numa_rebuild_large_mapping()
mm/migrate: fix kernel BUG at mm/compaction.c:2761!
selftests: mm: make map_fixed_noreplace test names stable
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
gcov: add support for GCC 14
zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
MAINTAINERS: remove Lorenzo as vmalloc reviewer
Revert "mm: init_mlocked_on_free_v3"
mm/page_table_check: fix crash on ZONE_DEVICE
gcc: disable '-Warray-bounds' for gcc-9
ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV
metadata of the current task into a per-CPU variable. However, the
kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV
coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote
KCOV objects.
If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens
to get interrupted and kcov_remote_start() is called, it ultimately leads
to kcov_remote_stop() NOT restoring the original KCOV reference. So when
the task exits, all registered remote KCOV handles remain active forever.
The most uncomfortable effect (at least for syzkaller) is that the bug
prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If
we obtain it in the parent process and then e.g. drop some
capabilities and continuously fork to execute individual programs, at
some point current->kcov of the forked process is lost,
kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls
calls from subsequent forks fail.
And, yes, the efficiency is also affected if we keep on losing remote
kcov objects.
a) kcov_remote_map keeps on growing forever.
b) (If I'm not mistaken), we're also not freeing the memory referenced
by kcov->area.
Fix it by introducing a special kcov_mode that is assigned to the task
that owns a KCOV remote object. It makes kcov_mode_enabled() return true
and yet does not trigger coverage collection in __sanitizer_cov_trace_pc()
and write_comp_data().
[nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment]
Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com
Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com
Fixes: 5ff3b30ab5 ("kcov: collect coverage from interrupts")
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte
long .gcda files with no usable data. To fix this, update GCOV_COUNTERS
to match the value defined by GCC 14.
Tested with GCC versions 14.1.0 and 13.2.0.
Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.com
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reported-by: Allison Henderson <allison.henderson@oracle.com>
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel_wait4() doesn't sleep and returns -EINTR if there is no
eligible child and signal_pending() is true.
That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
return false and avoid a busy-wait loop.
Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com
Fixes: 12db8b6900 ("entry: Add support for TIF_NOTIFY_SIGNAL")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com>
Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Wei Fu <fuweid89@gmail.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmykPwAKCRDbK58LschI
g7LOAQDVPkJ9k50/xrWIBtgvkGq1jCrMlpwEh49QYO0xoqh1IgEA+6Xje9jCIsdp
AHz9WmZ6G0EpTuDgFq50K1NVZ7MgSQE=
=zKfv
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-06-14
We've added 8 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 92 insertions(+), 11 deletions(-).
The main changes are:
1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason()
triggered via __bpf_try_make_writable(), from Florian Westphal.
2) Fix removal of kfuncs during linking phase which then throws a kernel
build warning via resolve_btfids about unresolved symbols,
from Tony Ambardar.
3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol
is not available on User Mode Linux, from Maciej Żenczykowski.
4) Fix a register corruption in reg_set_min_max triggering an invariant
violation in BPF verifier, from Daniel Borkmann.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Harden __bpf_kfunc tag against linker kfunc removal
compiler_types.h: Define __retain for __attribute__((__retain__))
bpf: Avoid splat in pskb_pull_reason
bpf: fix UML x86_64 compile failure
selftests/bpf: Add test coverage for reg_set_min_max handling
bpf: Reduce stack consumption in check_stack_write_fixed_off
bpf: Fix reg_set_min_max corruption of fake_reg
MAINTAINERS: mailmap: Update Stanislav's email address
====================
Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pcpu_hot (defined in arch/x86) is not available on user mode linux (ARCH=um)
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Fixes: 1ae6921009 ("bpf: inline bpf_get_smp_processor_id() helper")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20240613173146.2524647-1-maze@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The fake_reg moved into env->fake_reg given it consumes a lot of stack
space (120 bytes). Migrate the fake_reg in check_stack_write_fixed_off()
as well now that we have it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240613115310.25383-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Juan reported that after doing some changes to buzzer [0] and implementing
a new fuzzing strategy guided by coverage, they noticed the following in
one of the probes:
[...]
13: (79) r6 = *(u64 *)(r0 +0) ; R0=map_value(ks=4,vs=8) R6_w=scalar()
14: (b7) r0 = 0 ; R0_w=0
15: (b4) w0 = -1 ; R0_w=0xffffffff
16: (74) w0 >>= 1 ; R0_w=0x7fffffff
17: (5c) w6 &= w0 ; R0_w=0x7fffffff R6_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0; 0x7fffffff))
18: (44) w6 |= 2 ; R6_w=scalar(smin=umin=smin32=umin32=2,smax=umax=umax32=0x7fffffff,var_off=(0x2; 0x7ffffffd))
19: (56) if w6 != 0x7ffffffd goto pc+1
REG INVARIANTS VIOLATION (true_reg2): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
REG INVARIANTS VIOLATION (false_reg2): const tnum out of sync with range bounds u64=[0x0, 0xffffffffffffffff] s64=[0x8000000000000000, 0x7fffffffffffffff] u32=[0x0, 0xffffffff] s32=[0x80000000, 0x7fffffff] var_off=(0x7fffffff, 0x0)
19: R6_w=0x7fffffff
20: (95) exit
from 19 to 21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
21: (14) w6 -= 2147483632 ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=14,var_off=(0x2; 0xfffffffd))
22: (76) if w6 s>= 0xe goto pc+1 ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=13,var_off=(0x2; 0xfffffffd))
23: (95) exit
from 22 to 24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
24: (14) w6 -= 14 ; R6_w=0
[...]
What can be seen here is a register invariant violation on line 19. After
the binary-or in line 18, the verifier knows that bit 2 is set but knows
nothing about the rest of the content which was loaded from a map value,
meaning, range is [2,0x7fffffff] with var_off=(0x2; 0x7ffffffd). When in
line 19 the verifier analyzes the branch, it splits the register states
in reg_set_min_max() into the registers of the true branch (true_reg1,
true_reg2) and the registers of the false branch (false_reg1, false_reg2).
Since the test is w6 != 0x7ffffffd, the src_reg is a known constant.
Internally, the verifier creates a "fake" register initialized as scalar
to the value of 0x7ffffffd, and then passes it onto reg_set_min_max(). Now,
for line 19, it is mathematically impossible to take the false branch of
this program, yet the verifier analyzes it. It is impossible because the
second bit of r6 will be set due to the prior or operation and the
constant in the condition has that bit unset (hex(fd) == binary(1111 1101).
When the verifier first analyzes the false / fall-through branch, it will
compute an intersection between the var_off of r6 and of the constant. This
is because the verifier creates a "fake" register initialized to the value
of the constant. The intersection result later refines both registers in
regs_refine_cond_op():
[...]
t = tnum_intersect(tnum_subreg(reg1->var_off), tnum_subreg(reg2->var_off));
reg1->var_off = tnum_with_subreg(reg1->var_off, t);
reg2->var_off = tnum_with_subreg(reg2->var_off, t);
[...]
Since the verifier is analyzing the false branch of the conditional jump,
reg1 is equal to false_reg1 and reg2 is equal to false_reg2, i.e. the reg2
is the "fake" register that was meant to hold a constant value. The resulting
var_off of the intersection says that both registers now hold a known value
of var_off=(0x7fffffff, 0x0) or in other words: this operation manages to
make the verifier think that the "constant" value that was passed in the
jump operation now holds a different value.
Normally this would not be an issue since it should not influence the true
branch, however, false_reg2 and true_reg2 are pointers to the same "fake"
register. Meaning, the false branch can influence the results of the true
branch. In line 24, the verifier assumes R6_w=0, but the actual runtime
value in this case is 1. The fix is simply not passing in the same "fake"
register location as inputs to reg_set_min_max(), but instead making a
copy. Moving the fake_reg into the env also reduces stack consumption by
120 bytes. With this, the verifier successfully rejects invalid accesses
from the test program.
[0] https://github.com/google/buzzer
Fixes: 67420501e8 ("bpf: generalize reg_set_min_max() to handle non-const register comparisons")
Reported-by: Juan José López Jaimez <jjlopezjaimez@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20240613115310.25383-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After the recent commit 5097cbcb38 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.
In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().
Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).
Fixes: 08ae95f4fd ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com
Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com
that can result in missed wakeups and hung tasks.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZkCmARHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hrsBAAsOlZWV+VN0l6TGUqqVZZuuZO03Nk3TX3
kk4iq/xj9pRNmq1bizfsyCFEH3lK7fSYK2I0/iK145vHASzqyeXT1EbtGv61g7EJ
Z42oYdmAxhXcBOZRffZYI2t8zOcVBUV5s/aDmim11fftYdXkaX73pAHVJdhg/e0v
tfgFeJwihG4ynCPa8HrRzBUEXwtKngB8II8mmHyL7E5SsoRsO0ubZBU2MW6P5NA9
9f27ab9Uxj8YWAKa/1Yn6VKgAird8ao+kUu7hYDjZyZTEOvrXU3HrM+VXptnn1S8
2UPSRPNlV2MGDn7Uo3AskIFkF4QlZ3DZFGKCn2n/1Qf/OqeD3d39DoCKzg7lTx+9
gSd7cAtKkrsA1bwfz6uZVPEY9M4eVNqZ5aIo7K28q6zSvSFYy2LI0LCzFVwRgAyK
q8+XyR0aH9R9BYT1Tj0i6z9uZY/h8KROO/M/bU0egG/JRe4WX7X7mjVEQAapHi5F
xLvPH0I9wo3jDVuxr7hPaT30Xv67F8V9WSKQ6HlMDN/etXNNLysCyOxDmXSXJ6Z7
9IWaLrHKJG36vj+rKOpPn6Tm+fGAxlm2lDfXExZkYnyzAI59mmf6XEHQBdPly/Q6
mXZVi3APTOn0REUo3tqi6UiaLL56HD0yvzwSmqSAFke3Nv+za+SeerwgjS2JoYWr
hJPmkc9fLBY=
=uVD+
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix race between perf_event_free_task() and perf_event_release_kernel()
that can result in missed wakeups and hung tasks"
* tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix missing wakeup when waiting for context reference
and drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the same
UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the old
behavior for two cases where we started coalescing those messages with
normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting
to 6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZh3mUACgkQMUZtbf5S
IrvPwRAApv8X0ZIbPD5PuVEkiYuSkSE6QVou5GaVO7DzF4gj07zPNtCe6B/ZZdBu
RLdlppxjAmVwdCRmUo0plxSydYZcqFpQqV6lRH/rbWmktWIp0pGIOAcOG7ISRPCC
FAYJ4udSt4+wrq0hXTsE1KO1JZ0p7zE2bXxNC8uR8wgM9yonUjqhYdAUZhrl3yCY
zOCD/+kvWFLYtehDcmyNK0ANS3yNveTNkRhXDc1UrpOGMtza60lf5u3bWK+sU5VS
NGPe9cU60WKMQi6QnWFBZKIcp4Vgy2MukOLdNn9e8BRjFLh2dbY86LAmE4HWPA7I
ONZagOfEjeOcRSCMdFHxui/PUDZLBZNhrnqQ6x8uC2yKwwIMr+CgEt5sCmVFwH6n
3HTlWSjL38yuiVuYuhxGchmVnZfC4bLi2qAFF1oxhlDGViBDhAwi36MSCnjDpN8k
Jo0x6crQLS/uvwVXPKWAUcQhy7OE69A3FwwA1PtkxRX5EQPn1if2Z7yq7YfYb9aD
bChvCarlfuVDm+CBItphXg0ajVZc+im7+JK62Zn50A1cTbEK0lnYCOcmqzqiqrXI
Vr3XXt6gVVnvwY374JDO1vmB5ft2IYBn7sWnLcIvR2UlggqEfqMdKSSwm7pOprG9
YJ/LDAXVmG0kLN7rZUYUBLItnpuHAhYDrBOsV5HaFeksWauc1oY=
=mwEJ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from BPF and big collection of fixes for WiFi core and
drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src
addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the
same UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from
control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the
old behavior for two cases where we started coalescing those
messages with normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting to
6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB"
* tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
selftests: net: lib: set 'i' as local
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: support errexit with busywait
net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
ipv6: fix possible race in __fib6_drop_pcpu_from()
af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
af_unix: Annotate data-races around sk->sk_sndbuf.
af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
af_unix: Annotate data-race of sk->sk_state in unix_accept().
af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
af_unix: Annodate data-races around sk->sk_state for writers.
af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
...
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.
Thread A Thread B
... ...
perf_event_free_task perf_event_release_kernel
...
acquire event->child_mutex
...
get_ctx
... release event->child_mutex
acquire ctx->mutex
...
perf_free_event (acquire/release event->child_mutex)
...
release ctx->mutex
wait_var_event
acquire ctx->mutex
acquire event->child_mutex
# move existing events to free_list
release event->child_mutex
release ctx->mutex
put_ctx
... ...
In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.
Fixes: 1cf8dfe8a6 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com
After commit 1a80dbcb2d, bpf_link can be freed by
link->ops->dealloc_deferred, but the code still tests and uses
link->ops->dealloc afterward, which leads to a use-after-free as
reported by syzbot. Actually, one of them should be sufficient, so
just call one of them instead of both. Also add a WARN_ON() in case
of any problematic implementation.
Fixes: 1a80dbcb2d ("bpf: support deferring bpf_link dealloc to after RCU grace period")
Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com
The iterator variable dst cannot be NULL and the if check can be removed.
Remove it and fix the following Coccinelle/coccicheck warning reported
by itnull.cocci:
ERROR: iterator variable bound on line 762 cannot be NULL
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com
The bpf_session_cookie is unavailable for !CONFIG_FPROBE as reported
by Sebastian [1].
To fix that we remove CONFIG_FPROBE ifdef for session kfuncs, which
is fine, because there's filter for session programs.
Then based on bpf_trace.o dependency:
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
we add bpf_session_cookie BTF_ID in special_kfunc_set list dependency
on CONFIG_BPF_EVENTS.
[1] https://lore.kernel.org/bpf/20240531071557.MvfIqkn7@linutronix.de/T/#m71c6d5ec71db2967288cb79acedc15cc5dbfeec5
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 5c919acef8 ("bpf: Add support for kprobe session cookie")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240531194500.2967187-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current release - regressions:
- gro: initialize network_offset in network layer
- tcp: reduce accepted window in NEW_SYN_RECV state
Current release - new code bugs:
- eth: mlx5e: do not use ptp structure for tx ts stats when not initialized
- eth: ice: check for unregistering correct number of devlink params
Previous releases - regressions:
- bpf: Allow delete from sockmap/sockhash only if update is allowed
- sched: taprio: extend minimum interval restriction to entire cycle too
- netfilter: ipset: add list flush to cancel_gc
- ipv4: fix address dump when IPv4 is disabled on an interface
- sock_map: avoid race between sock_map_close and sk_psock_put
- eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete status rules
Previous releases - always broken:
- core: fix __dst_negative_advice() race
- bpf:
- fix multi-uprobe PID filtering logic
- fix pkt_type override upon netkit pass verdict
- netfilter: tproxy: bail out if IP has been disabled on the device
- af_unix: annotate data-race around unix_sk(sk)->addr
- eth: mlx5e: fix UDP GSO for encapsulated packets
- eth: idpf: don't enable NAPI and interrupts prior to allocating Rx buffers
- eth: i40e: fully suspend and resume IO operations in EEH case
- eth: octeontx2-pf: free send queue buffers incase of leaf to inner
- eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmZYaP0SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk5+QP/3wc2ktY/whZvLyJyM6NsVl1DYohnjua
H05bveXgUMd4NNxEfQ31IMGCct6d2fe+fAIJrefxdjxbjyY38SY5xd1zpXLQDxqB
ks6T9vZ4ITgwpqWT5Z1XafIgV/bYlf42+GHUIPuFFlBisoUqkAm7Wzw/T+Ap3rVX
7Y2p7ulvdh85GyMGsAi5Bz9EkyiSQUsMvbtGOA9a9WopIyqoxTgV5Unk1L/FXlEU
ZO8L7hrwZKWL1UDlaqnfESD9DBEbNc85WRoagFM4EdHl8vTwxwvTQ6+SDMtLO8jW
8DSeb9CCin/VagqPhrylj5u72QGz+i7gDUMZIZVU6mHJc8WB13tIflOq0qKLnfNE
n63/4zu9kWCznb7IKqg99mo1+bDcg1fyZusih+aguCGNYEQ/yrAf5ll2OMfjmZWa
FFOuaVoLmN0f6XMb4L38Wwd9obvC3EbpnNveco3lmTp+4kRk1H/Ox2UI2jaFbUnG
Nim4LZD4iGXJh1qnnQ0xkTjrltFAvnY9zUwo2Yv7TUQOi0JAXxsZwXwY6UjsiNrC
QWdKL5VcdI0N1Y1MrmpQQKpRE9Lu1dTvbIRvFtQHmWgV7gqwTmShoSARBL1IM+lp
tm+jfZOmznjYTaVnc1xnBCaIqs925gvnkniZpzru53xb5UegenadNXvQtYlaAokJ
j13QKA6NrZVI
=xkIZ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- gro: initialize network_offset in network layer
- tcp: reduce accepted window in NEW_SYN_RECV state
Current release - new code bugs:
- eth: mlx5e: do not use ptp structure for tx ts stats when not
initialized
- eth: ice: check for unregistering correct number of devlink params
Previous releases - regressions:
- bpf: Allow delete from sockmap/sockhash only if update is allowed
- sched: taprio: extend minimum interval restriction to entire cycle
too
- netfilter: ipset: add list flush to cancel_gc
- ipv4: fix address dump when IPv4 is disabled on an interface
- sock_map: avoid race between sock_map_close and sk_psock_put
- eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
status rules
Previous releases - always broken:
- core: fix __dst_negative_advice() race
- bpf:
- fix multi-uprobe PID filtering logic
- fix pkt_type override upon netkit pass verdict
- netfilter: tproxy: bail out if IP has been disabled on the device
- af_unix: annotate data-race around unix_sk(sk)->addr
- eth: mlx5e: fix UDP GSO for encapsulated packets
- eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
buffers
- eth: i40e: fully suspend and resume IO operations in EEH case
- eth: octeontx2-pf: free send queue buffers incase of leaf to inner
- eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"
* tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
netdev: add qstat for csum complete
ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
net: ena: Fix redundant device NUMA node override
ice: check for unregistering correct number of devlink params
ice: fix 200G PHY types to link speed mapping
i40e: Fully suspend and resume IO operations in EEH case
i40e: factoring out i40e_suspend/i40e_resume
e1000e: move force SMBUS near the end of enable_ulp function
net: dsa: microchip: fix RGMII error in KSZ DSA driver
ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
net: fix __dst_negative_advice() race
nfc/nci: Add the inconsistency check between the input data length and count
MAINTAINERS: dwmac: starfive: update Maintainer
net/sched: taprio: extend minimum interval restriction to entire cycle too
net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
netfilter: nft_fib: allow from forward/input without iif selector
netfilter: tproxy: bail out if IP has been disabled on the device
netfilter: nft_payload: skbuff vlan metadata mangle support
net: ti: icssg-prueth: Fix start counter for ft1 filter
sock_map: avoid race between sock_map_close and sk_psock_put
...
Commit 13e1df0928 ("kheaders: explicitly validate existence of cpio
command") added an explicit check for `cpio` using `type`.
However, `type` in `dash` (which is used in some popular distributions
and base images as the shell script runner) prints the missing message
to standard output, and thus no error is printed:
$ bash -c 'type missing >/dev/null'
bash: line 1: type: missing: not found
$ dash -c 'type missing >/dev/null'
$
For instance, this issue may be seen by loongarch builders, given its
defconfig enables CONFIG_IKHEADERS since commit 9cc1df421f ("LoongArch:
Update Loongson-3 default config file").
Therefore, use `command -v` instead to have consistent behavior, and
take the chance to provide a more explicit error.
Fixes: 13e1df0928 ("kheaders: explicitly validate existence of cpio command")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Build environments might be running with different umask settings
resulting in indeterministic file modes for the files contained in
kheaders.tar.xz. The file itself is served with 444, i.e. world
readable. Archive the files explicitly with 744,a+X to improve
reproducibility across build environments.
--mode=0444 is not suitable as directories need to be executable. Also,
444 makes it hard to delete all the readonly files after extraction.
Cc: stable@vger.kernel.org
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
- uprobes: prevent mutex_lock() under rcu_read_lock(). Recent changes moves
uprobe_cpu_buffer preparation which involves mutex_lock(), under
__uprobe_trace_func() which is called inside rcu_read_lock(). Fix it
by moving uprobe_cpu_buffer preparation outside of __uprobe_trace_func().
- kprobe-events: Fix to handle the error case of btf_find_struct_member().
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZUglsbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bg/gH/3Y3dhxj8Wv16Wb8U4kA
DpYjXplo1y28Kp+hAgGF+T3PTR8KGxD60PuzeKVR7yUL5pLpTWrl1RUPGVwE2NTT
Hmat5TlHLWZYjkO2ySYiyeYgcdVh9gXcO77zMmhUHR7FvMy/1Krr+t7NYbvH5LXj
VWq6xKIVZi2o1e7DbMgBdR7T8O3Iz/JWmAWgZUiIiP9YX1cGsZkleadZaAl3Y6Oh
L0/FAwh/vNtR7gAadLA8OVZW9QBu+WF978VucgErNHMCgvEouxgtQDbsluE+PVhD
+aEcvELCG+3ZLDlwCmfSOX2+7jzy9FKbzjnEVQXIpYWjF3Ii0W5ZSFMQVK+ZeBGN
9vQ=
=Kwsv
-----END PGP SIGNATURE-----
Merge tag 'probes-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- uprobes: prevent mutex_lock() under rcu_read_lock().
Recent changes moved uprobe_cpu_buffer preparation which involves
mutex_lock(), under __uprobe_trace_func() which is called inside
rcu_read_lock().
Fix it by moving uprobe_cpu_buffer preparation outside of
__uprobe_trace_func()
- kprobe-events: handle the error case of btf_find_struct_member()
* tag 'probes-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: fix error check in parse_btf_field()
uprobes: prevent mutex_lock() under rcu_read_lock()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZlTGFAAKCRDbK58LschI
g5NXAP0QRn8nBSxJHIswFSOwRiCyhOhR7YL2P0c+RGcRMA+ZSAD9E1cwsYXsPu3L
ummQ52AMaMfouHg6aW+rFIoupkGSnwc=
=QctA
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-05-27
We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 18 files changed, 583 insertions(+), 55 deletions(-).
The main changes are:
1) Fix broken BPF multi-uprobe PID filtering logic which filtered by thread
while the promise was to filter by process, from Andrii Nakryiko.
2) Fix the recent influx of syzkaller reports to sockmap which triggered
a locking rule violation by performing a map_delete, from Jakub Sitnicki.
3) Fixes to netkit driver in particular on skb->pkt_type override upon pass
verdict, from Daniel Borkmann.
4) Fix an integer overflow in resolve_btfids which can wrongly trigger build
failures, from Friedrich Vock.
5) Follow-up fixes for ARC JIT reported by static analyzers,
from Shahab Vahedi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Cover verifier checks for mutating sockmap/sockhash
Revert "bpf, sockmap: Prevent lock inversion deadlock in map delete elem"
bpf: Allow delete from sockmap/sockhash only if update is allowed
selftests/bpf: Add netkit test for pkt_type
selftests/bpf: Add netkit tests for mac address
netkit: Fix pkt_type override upon netkit pass verdict
netkit: Fix setting mac address in l2 mode
ARC, bpf: Fix issues reported by the static analyzers
selftests/bpf: extend multi-uprobe tests with USDTs
selftests/bpf: extend multi-uprobe tests with child thread case
libbpf: detect broken PID filtering logic for multi-uprobe
bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic
bpf: fix multi-uprobe PID filtering logic
bpf: Fix potential integer overflow in resolve_btfids
MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT
====================
Link: https://lore.kernel.org/r/20240527203551.29712-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We have seen an influx of syzkaller reports where a BPF program attached to
a tracepoint triggers a locking rule violation by performing a map_delete
on a sockmap/sockhash.
We don't intend to support this artificial use scenario. Extend the
existing verifier allowed-program-type check for updating sockmap/sockhash
to also cover deleting from a map.
From now on only BPF programs which were previously allowed to update
sockmap/sockhash can delete from these map types.
Fixes: ff91059932 ("bpf, sockmap: Prevent lock inversion deadlock in map delete elem")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+ec941d6e24f633a59172@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+ec941d6e24f633a59172@syzkaller.appspotmail.com
Acked-by: John Fastabend <john.fastabend@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=ec941d6e24f633a59172
Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-1-944b372f2101@cloudflare.com
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZlRqlgAKCRCRxhvAZXjc
os5tAQC6o3f2X39FooKv4bbbQkBXx5x8GqjUZyfnYjbm+Mak7wD/cf8tm4LLvVLt
1g7FbakWkEyQKhPRBMhtngX1GdKiuQI=
=Isax
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix io_uring based write-through after converting cifs to use the
netfs library
- Fix aio error handling when doing write-through via netfs library
- Fix performance regression in iomap when used with non-large folio
mappings
- Fix signalfd error code
- Remove obsolete comment in signalfd code
- Fix async request indication in netfs_perform_write() by raising
BDP_ASYNC when IOCB_NOWAIT is set
- Yield swap device immediately to prevent spurious EBUSY errors
- Don't cross a .backup mountpoint from backup volumes in afs to avoid
infinite loops
- Fix a race between umount and async request completion in 9p after 9p
was converted to use the netfs library
* tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs, 9p: Fix race between umount and async request completion
afs: Don't cross .backup mountpoint from backup volume
swap: yield device immediately
netfs: Fix setting of BDP_ASYNC from iocb flags
signalfd: drop an obsolete comment
signalfd: fix error return code
iomap: fault in smaller chunks for non-large folio mappings
filemap: add helper mapping_max_folio_size()
netfs: Fix AIO error handling when doing write-through
netfs: Fix io_uring based write-through
btf_find_struct_member() might return NULL or an error via the
ERR_PTR() macro. However, its caller in parse_btf_field() only checks
for the NULL condition. Fix this by using IS_ERR() and returning the
error up the stack.
Link: https://lore.kernel.org/all/20240527094351.15687-1-clopez@suse.de/
Fixes: c440adfbe3 ("tracing/probes: Support BTF based data structure field access")
Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZRwMURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h/zQ//TTrgyXi6+1xXY4R0LDU45j+wavMTMkq3
kM3eUeyXgy+FDtvLRVaYgEAYbtuR4LGFN9qmVuEHJPZQwpi3AFlnGFUFjFUvyE43
xJuOtHoxFv3mj09VgRGsjZvzp8bxYSkEn3h0ryTWGUHzR+QmoQmYWrU6HExgXw3R
+s8pvi14g6R/+PAy05cF0k1J7aeSsYaOfd38D/XnpyhuhXvPMS2eHgovV6I5Qhk4
5lV6rzJv8XlKxVr7bOYJkRePE3z0HMtx0G7eo8eYERBQapHede18V8imv4OpUiua
vmG8cFhF4Lq9KFdEtiVuf1X9/XH3PoEKTGA81oqQ9lLN9USx7ME/Peg6U5ezvEkp
YmQx2LS12DWqYp5PZQTN0CHnfmMLgksmyGELM3JE/dFFCVh4HdpMrh+2wLwWGRJ3
JLzAJh3YwcPhayLpNVgsSF9AtLKTkDoS0bHd43mHnB6VaEKkus8zbeuCxYAsUeMJ
5wCZw3xQjTZEaMMNd1hJN5O/9TX2of+T6Z4C4cacMBmwpD7vX5oXmDYLE/wUHw6m
9Z67fvOvTdIf3MkYSqjGXFKD1JobL/PmwCfaaGUQFVJkbX5WVNDk6C1zgs5FhmuY
U/AcYfadbNdLVXrN3VLnX6Gmb7gFPShOAE1GgXGeszSReI4pbOUy2zopRGAEWSZS
fRu8nyveGjw=
=vxJh
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
get_pid_task() internally already calls rcu_read_lock() and
rcu_read_unlock(), so there is no point to do this one extra time.
This is a drive-by improvement and has no correctness implications.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current implementation of PID filtering logic for multi-uprobes in
uprobe_prog_run() is filtering down to exact *thread*, while the intent
for PID filtering it to filter by *process* instead. The check in
uprobe_prog_run() also differs from the analogous one in
uprobe_multi_link_filter() for some reason. The latter is correct,
checking task->mm, not the task itself.
Fix the check in uprobe_prog_run() to perform the same task->mm check.
While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
type of lookup, given the intent is to get a representative task of an
entire process. This doesn't change behavior, but seems more logical. It
would hold task group leader task now, not any random thread task.
Last but not least, given multi-uprobe support is half-broken due to
this PID filtering logic (depending on whether PID filtering is
important or not), we need to make it easy for user space consumers
(including libbpf) to easily detect whether PID filtering logic was
already fixed.
We do it here by adding an early check on passed pid parameter. If it's
negative (and so has no chance of being a valid PID), we return -EINVAL.
Previous behavior would eventually return -ESRCH ("No process found"),
given there can't be any process with negative PID. This subtle change
won't make any practical change in behavior, but will allow applications
to detect PID filtering fixes easily. Libbpf fixes take advantage of
this in the next patch.
Cc: stable@vger.kernel.org
Acked-by: Jiri Olsa <jolsa@kernel.org>
Fixes: b733eeade4 ("bpf: Add pid filter support for uprobe_multi link")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Otherwise we can cause spurious EBUSY issues when trying to mount the
rootfs later on.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845
Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi>
Signed-off-by: Christian Brauner <brauner@kernel.org>