The Hexagon-specific constant extender optimization in LLVM may crash on
Linux kernel code [1], such as fs/bcache/btree_io.c after
commit 32ed4a620c ("bcachefs: Btree path tracepoints") in 6.12:
clang: llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp:745: bool (anonymous namespace)::HexagonConstExtenders::ExtRoot::operator<(const HCE::ExtRoot &) const: Assertion `ThisB->getParent() == OtherB->getParent()' failed.
Stack dump:
0. Program arguments: clang --target=hexagon-linux-musl ... fs/bcachefs/btree_io.c
1. <eof> parser at end of file
2. Code generation
3. Running pass 'Function Pass Manager' on module 'fs/bcachefs/btree_io.c'.
4. Running pass 'Hexagon constant-extender optimization' on function '@__btree_node_lock_nopath'
Without assertions enabled, there is just a hang during compilation.
This has been resolved in LLVM main (20.0.0) [2] and backported to LLVM
19.1.0 but the kernel supports LLVM 13.0.1 and newer, so disable the
constant expander optimization using the '-mllvm' option when using a
toolchain that is not fixed.
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/issues/99714 [1]
Link: 68df06a0b2 [2]
Link: 2ab8d93061 [3]
Reviewed-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race condition between exiting wb_on_itr and completion write
backs. For example, we are in wb_on_itr mode and a Tx completion is
generated by HW, ready to be written back, as we are re-enabling
interrupts:
HW SW
| |
| | idpf_tx_splitq_clean_all
| | napi_complete_done
| |
| tx_completion_wb | idpf_vport_intr_update_itr_ena_irq
That tx_completion_wb happens before the vector is fully re-enabled.
Continuing with this example, it is a UDP stream and the
tx_completion_wb is the last one in the flow (there are no rx packets).
Because the HW generated the completion before the interrupt is fully
enabled, the HW will not fire the interrupt once the timer expires and
the write back will not happen. NAPI poll won't be called. We have
indicated we're back in interrupt mode but nothing else will trigger the
interrupt. Therefore, the completion goes unprocessed, triggering a Tx
timeout.
To mitigate this, fire a SW triggered interrupt upon exiting wb_on_itr.
This interrupt will catch the rogue completion and avoid the timeout.
Add logic to set the appropriate bits in the vector's dyn_ctl register.
Fixes: 9c4a27da0e ("idpf: enable WB_ON_ITR")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the pending_async_copies count is decremented just
before a struct nfsd4_copy is destroyed. After commit aa0ebd21df
("NFSD: Add nfsd4_copy time-to-live") nfsd4_copy structures sticks
around for 10 lease periods after the COPY itself has completed,
the pending_async_copies count stays high for a long time. This
causes NFSD to avoid the use of background copy even though the
actual background copy workload might no longer be running.
In this patch, decrement pending_async_copies once async copy thread
is done processing the copy work.
Fixes: aa0ebd21df ("NFSD: Add nfsd4_copy time-to-live")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
SW triggered interrupts are guaranteed to fire after their timer
expires, unlike Tx and Rx interrupts which will only fire after the
timer expires _and_ a descriptor write back is available to be processed
by the driver.
Add the necessary fields, defines, and initializations to enable a SW
triggered interrupt in the vector's dyn_ctl register.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Set the cache-line-size parameter of the L2 cache for each core to the
correct value of 64 bytes.
Previously, the L2 cache line size was incorrectly set to 128 bytes
for the Broadcom BCM2712. This causes validation tests for the
Performance Application Programming Interface (PAPI) tool to fail as
they depend on sysfs accurately reporting cache line sizes.
The correct value of 64 bytes is stated in the official documentation of
the ARM Cortex A-72, which is linked in the comments of
arm64/boot/dts/broadcom/bcm2712.dtsi as the source for cache-line-size.
Fixes: faa3381267 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5")
Signed-off-by: Willow Cunningham <willow.e.cunningham@maine.edu>
Link: https://lore.kernel.org/r/20241007212954.214724-1-willow.e.cunningham@maine.edu
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
[BUG]
There is a bug report in the mailing list where btrfs_run_delayed_refs()
failed to drop the ref count for logical 25870311358464 num_bytes
2113536.
The involved leaf dump looks like this:
item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50
extent refs 1 gen 84178 flags 1
ref#0: shared data backref parent 32399126528000 count 0 <<<
ref#1: shared data backref parent 31808973717504 count 1
Notice the count number is 0.
[CAUSE]
There is no concrete evidence yet, but considering 0 -> 1 is also a
single bit flipped, it's possible that hardware memory bitflip is
involved, causing the on-disk extent tree to be corrupted.
[FIX]
To prevent us reading such corrupted extent item, or writing such
damaged extent item back to disk, enhance the handling of
BTRFS_EXTENT_DATA_REF_KEY and BTRFS_SHARED_DATA_REF_KEY keys for both
inlined and key items, to detect such 0 ref count and reject them.
CC: stable@vger.kernel.org # 5.4+
Link: https://lore.kernel.org/linux-btrfs/7c69dd49-c346-4806-86e7-e6f863a66f48@app.fastmail.com/
Reported-by: Frankie Fisher <frankie@terrorise.me.uk>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs like other file systems can't really deal with I/O not aligned to
it's internal block size (which strangely is called sector size in
btrfs, for historical reasons), but the block layer split helper doesn't
even know about that.
Round down the split boundary so that all I/Os are aligned.
Fixes: d5e4377d50 ("btrfs: split zone append bios in btrfs_submit_bio")
CC: stable@vger.kernel.org # 6.12
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Otherwise it won't catch bios turned into regular writes by the block
level zone write plugging. The additional test it adds is for emulated
zone append.
Fixes: 9b1ce7f0c6 ("block: Implement zone append emulation")
CC: stable@vger.kernel.org # 6.12
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have been using the following check
if (generation <= root->root_key.offset)
to make decisions about whether or not to visit a node during snapshot
delete. This is because for normal subvolumes this is set to 0, and for
snapshots it's set to the creation generation. The idea being that if
the generation of the node is less than or equal to our creation
generation then we don't need to visit that node, because it doesn't
belong to us, we can simply drop our reference and move on.
However reloc roots don't have their generation stored in
root->root_key.offset, instead that is the objectid of their
corresponding fs root. This means we can incorrectly not walk into
nodes that need to be dropped when deleting a reloc root.
There are a variety of consequences to making the wrong choice in two
distinct areas.
visit_node_for_delete()
1. False positive. We think we are newer than the block when we really
aren't. We don't visit the node and drop our reference to the node
and carry on. This would result in leaked space.
2. False negative. We do decide to walk down into a block that we
should have just dropped our reference to. However this means that
the child node will have refs > 1, so we will switch to
UPDATE_BACKREF, and then the subsequent walk_down_proc() will notice
that btrfs_header_owner(node) != root->root_key.objectid and it'll
break out of the loop, and then walk_up_proc() will drop our reference,
so this appears to be ok.
do_walk_down()
1. False positive. We are in UPDATE_BACKREF and incorrectly decide that
we are done and don't need to update the backref for our lower nodes.
This is another case that simply won't happen with relocation, as we
only have to do UPDATE_BACKREF if the node below us was shared and
didn't have FULL_BACKREF set, and since we don't own that node
because we're a reloc root we actually won't end up in this case.
2. False negative. Again this is tricky because as described above, we
simply wouldn't be here from relocation, because we don't own any of
the nodes because we never set btrfs_header_owner() to the reloc root
objectid, and we always use FULL_BACKREF, we never actually need to
set FULL_BACKREF on any children.
Having spent a lot of time stressing relocation/snapshot delete recently
I've not seen this pop in practice. But this is objectively incorrect,
so fix this to get the correct starting generation based on the root
we're dropping to keep me from thinking there's a problem here.
CC: stable@vger.kernel.org
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The 'select FB_CORE' statement moved from CONFIG_DRM to DRM_CLIENT_LIB,
but there are now configurations that have code calling into fb_core
as built-in even though the client_lib itself is a loadable module:
x86_64-linux-ld: drivers/gpu/drm/drm_fb_helper.o: in function `drm_fb_helper_set_suspend':
drm_fb_helper.c:(.text+0x2c6): undefined reference to `fb_set_suspend'
x86_64-linux-ld: drivers/gpu/drm/drm_fb_helper.o: in function `drm_fb_helper_resume_worker':
drm_fb_helper.c:(.text+0x2e1): undefined reference to `fb_set_suspend'
In addition to DRM_CLIENT_LIB, the 'select' needs to be at least in
DRM_KMS_HELPER and DRM_GEM_SHMEM_HELPER, so add it here.
This patch is the KMS_HELPER part of [1].
Fixes: dadd28d414 ("drm/client: Add client-lib module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/series/141411/ # 1
Link: https://patchwork.freedesktop.org/patch/msgid/20241216074450.8590-4-tzimmermann@suse.de
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
- Always try to initialize the idle functions when graph tracer starts
A bug was found that when a CPU is offline when graph tracing starts
and then comes online, that CPU is not traced. The fix to that was
to move the initialization of the idle shadow stack over to the
hot plug online logic, which also handle onlined CPUs. The issue was
that it removed the initialization of the shadow stack when graph tracing
starts, but the callbacks to the hot plug logic do nothing if graph
tracing isn't currently running. Although that fix fixed the onlining
of a CPU during tracing, it broke the CPUs that were already online.
- Have microblaze not try to get the "true parent" in function tracing
If function tracing and graph tracing are both enabled at the same time
the parent of the functions traced by the function tracer may sometimes
be the graph tracing trampoline. The graph tracing hijacks the return
pointer of the function to trace it, but that can interfere with the
function tracing parent output. This was fixed by using the
ftrace_graph_ret_addr() function passing in the kernel stack pointer
using the ftrace_regs_get_stack_pointer() function. But Al Viro reported
that Microblaze does not implement the kernel_stack_pointer(regs)
helper function that ftrace_regs_get_stack_pointer() uses and fails
to compile when function graph tracing is enabled.
It was first thought that this was a microblaze issue, but the real
cause is that this only works when an architecture implements
HAVE_DYNAMIC_FTRACE_WITH_ARGS, as a requirement for that config
is to have ftrace always pass a valid ftrace_regs to the callbacks.
That also means that the architecture supports ftrace_regs_get_stack_pointer()
Microblaze does not set HAVE_DYNAMIC_FTRACE_WITH_ARGS nor does it
implement ftrace_regs_get_stack_pointer() which caused it to fail to
build. Only implement the "true parent" logic if an architecture has
that config set.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ2GoLxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrooAQCY2e6mwLFIb3HttmC5KikrEE48YLOj
QEz3UGb2zrxVTQD/ebYtXTiZSU/oS+CHdDsXhKSq7jKdLlRWjqUTx81PJQs=
=mvcR
-----END PGP SIGNATURE-----
Merge tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Always try to initialize the idle functions when graph tracer starts
A bug was found that when a CPU is offline when graph tracing starts
and then comes online, that CPU is not traced. The fix to that was to
move the initialization of the idle shadow stack over to the hot plug
online logic, which also handle onlined CPUs. The issue was that it
removed the initialization of the shadow stack when graph tracing
starts, but the callbacks to the hot plug logic do nothing if graph
tracing isn't currently running. Although that fix fixed the onlining
of a CPU during tracing, it broke the CPUs that were already online.
- Have microblaze not try to get the "true parent" in function tracing
If function tracing and graph tracing are both enabled at the same
time the parent of the functions traced by the function tracer may
sometimes be the graph tracing trampoline. The graph tracing hijacks
the return pointer of the function to trace it, but that can
interfere with the function tracing parent output.
This was fixed by using the ftrace_graph_ret_addr() function passing
in the kernel stack pointer using the ftrace_regs_get_stack_pointer()
function. But Al Viro reported that Microblaze does not implement the
kernel_stack_pointer(regs) helper function that
ftrace_regs_get_stack_pointer() uses and fails to compile when
function graph tracing is enabled.
It was first thought that this was a microblaze issue, but the real
cause is that this only works when an architecture implements
HAVE_DYNAMIC_FTRACE_WITH_ARGS, as a requirement for that config is to
have ftrace always pass a valid ftrace_regs to the callbacks. That
also means that the architecture supports
ftrace_regs_get_stack_pointer()
Microblaze does not set HAVE_DYNAMIC_FTRACE_WITH_ARGS nor does it
implement ftrace_regs_get_stack_pointer() which caused it to fail to
build. Only implement the "true parent" logic if an architecture has
that config set"
* tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Do not find "true_parent" if HAVE_DYNAMIC_FTRACE_WITH_ARGS is not set
fgraph: Still initialize idle shadow stacks when starting
- Fix DirectMap accounting in /proc/meminfo file
- Fix strscpy() return code handling that led to "unsigned 'len' is
never less than zero" warning
- Fix the calculation determining whether to use three- or four-level
paging: account KMSAN modules metadata
-----BEGIN PGP SIGNATURE-----
iI0EABYKADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZ2F/HxccYWdvcmRlZXZA
bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8PN4AQCFKNs3KS+9yjGcKMwKURWdWawi
oCrPOl/S8T0vXaz/NgD/QBem1XzgX+VzcRcCdaNsUWLCk8zcWlJYyN8v4QSp5QM=
=vQEf
-----END PGP SIGNATURE-----
Merge tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix DirectMap accounting in /proc/meminfo file
- Fix strscpy() return code handling that led to "unsigned 'len' is
never less than zero" warning
- Fix the calculation determining whether to use three- or four-level
paging: account KMSAN modules metadata
* tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Consider KMSAN modules metadata for paging levels
s390/ipl: Fix never less than zero warning
s390/mm: Fix DirectMap accounting
Do not select BACKLIGHT_CLASS_DEVICE from FB_BACKLIGHT. The latter
only controls backlight support within fbdev core code and data
structures.
Make fbdev drivers depend on BACKLIGHT_CLASS_DEVICE and let users
select it explicitly. Fixes warnings about recursive dependencies,
such as
error: recursive dependency detected!
symbol BACKLIGHT_CLASS_DEVICE is selected by FB_BACKLIGHT
symbol FB_BACKLIGHT is selected by FB_SH_MOBILE_LCDC
symbol FB_SH_MOBILE_LCDC depends on FB_DEVICE
symbol FB_DEVICE depends on FB_CORE
symbol FB_CORE is selected by DRM_GEM_DMA_HELPER
symbol DRM_GEM_DMA_HELPER is selected by DRM_PANEL_ILITEK_ILI9341
symbol DRM_PANEL_ILITEK_ILI9341 depends on BACKLIGHT_CLASS_DEVICE
BACKLIGHT_CLASS_DEVICE is user-selectable, so making drivers adapt to
it is the correct approach in any case. For most drivers, backlight
support is also configurable separately.
v3:
- Select BACKLIGHT_CLASS_DEVICE in PowerMac defconfigs (Christophe)
- Fix PMAC_BACKLIGHT module dependency corner cases (Christophe)
v2:
- s/BACKLIGHT_DEVICE_CLASS/BACKLIGHT_CLASS_DEVICE (Helge)
- Fix fbdev driver-dependency corner case (Arnd)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241216074450.8590-2-tzimmermann@suse.de
- Fix (pcluster) memory leak and (sbi) UAF after umounting;
- Fix a case of PSI memstall mis-accounting;
- Use buffered I/Os by default for file-backed mounts.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmdhgxQRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qrLmxAAm621Zq5Jz+AlN2HvBpyfIjD8eXtdCEd6
8r6+2e5aw8HpZKyKBo1ET3gTSA9KO4FbdZl0S9e+SfPJDa/Tak4e5mzaF8su1LnS
bzg3MQwU8W7bahsKn6OOnC4pTFvKL1ZdLvujbqjEDYXEP2cUEjxtZbHPpbTCRpte
lhbN9444lfJevtyaNK92SP5NQjPYNDN0J6QJZIZuRMB9IDA2zsiuzBnqUVMkGbRx
iiH3gsWo0l554RXY81rMwLLHMsW79Qc5fBD2pmkzzp1ioH8YyY0+aylZi/ps9tcr
xgOGZNKJT3fouhPVSE/QMdiqlNZW8qd/jwc3S0l8yeYn55pHftKCC0wysrGkXjVw
ODHU6WYWSNtZ2uxCU44lDKVnse4fIksFX7w1/BZer7dZy8kUNZ4hexLQp+kSBpFs
QKK3bJpN85GfNndk9X+vk6MFPHpEougJNiywVMAPCa55heeCMTES+vW5WjpIBjuz
hyU26y5xELAbK4T+VmNlNh16LEbV1rUyvBHaq4vhVJensEQQu8pusqQH0gMYZi3l
Bn5drLmsSG6zaMeeBc14609f3IBJBgkzIi7G5wFuIK4viqcRkh0nCf1c6D10vgST
G+8CTwks6c2TTHANvIPzs3Ciw6FTBQym/CJSItPcoLpc5xoDfcAYA2uuCyhz9khZ
A3kR3lNe0e0=
=Idg8
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"The first one fixes a syzbot UAF report caused by a commit introduced
in this cycle, but it also addresses a longstanding memory leak. The
second one resolves a PSI memstall mis-accounting issue.
The remaining patches switch file-backed mounts to use buffered I/Os
by default instead of direct I/Os, since the page cache of underlay
files is typically valid and maybe even dirty. This change also aligns
with the default policy of loopback devices. A mount option has been
added to try to use direct I/Os explicitly.
Summary:
- Fix (pcluster) memory leak and (sbi) UAF after umounting
- Fix a case of PSI memstall mis-accounting
- Use buffered I/Os by default for file-backed mounts"
* tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: use buffered I/O for file-backed mounts by default
erofs: reference `struct erofs_device_info` for erofs_map_dev
erofs: use `struct erofs_device_info` for the primary device
erofs: add erofs_sb_free() helper
MAINTAINERS: erofs: update Yue Hu's email address
erofs: fix PSI memstall accounting
erofs: fix rare pcluster memory leak after unmounting
- FORTIFY: Silence GCC value range warnings due to warning-only bounds checks
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ2DGsgAKCRA2KwveOeQk
uz9HAP4xx8yjVeXu8tnQCe6usJ0MU36imK6Yw0auEcErrhD7MAD/UGojymCEjO9T
sgwF0X9DVh3L2wJ+AKEyUOT/iN3ZQgo=
=HElM
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fix from Kees Cook:
"Silence a GCC value-range warning that is being ironically triggered
by bounds checking"
* tag 'hardening-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fortify: Hide run-time copy size from value range tracking
The TP_printk() portion of a trace event is executed at the time a event
is read from the trace. This can happen seconds, minutes, hours, days,
months, years possibly later since the event was recorded. If the print
format contains a dereference to a string via "%s", and that string was
allocated, there's a chance that string could be freed before it is read
by the trace file.
To protect against such bugs, there are two functions that verify the
event. The first one is test_event_printk(), which is called when the
event is created. It reads the TP_printk() format as well as its arguments
to make sure nothing may be dereferencing a pointer that was not copied
into the ring buffer along with the event. If it is, it will trigger a
WARN_ON().
For strings that use "%s", it is not so easy. The string may not reside in
the ring buffer but may still be valid. Strings that are static and part
of the kernel proper which will not be freed for the life of the running
system, are safe to dereference. But to know if it is a pointer to a
static string or to something on the heap can not be determined until the
event is triggered.
This brings us to the second function that tests for the bad dereferencing
of strings, trace_check_vprintf(). It would walk through the printf format
looking for "%s", and when it finds it, it would validate that the pointer
is safe to read. If not, it would produces a WARN_ON() as well and write
into the ring buffer "[UNSAFE-MEMORY]".
The problem with this is how it used va_list to have vsnprintf() handle
all the cases that it didn't need to check. Instead of re-implementing
vsnprintf(), it would make a copy of the format up to the %s part, and
call vsnprintf() with the current va_list ap variable, where the ap would
then be ready to point at the string in question.
For architectures that passed va_list by reference this was possible. For
architectures that passed it by copy it was not. A test_can_verify()
function was used to differentiate between the two, and if it wasn't
possible, it would disable it.
Even for architectures where this was feasible, it was a stretch to rely
on such a method that is undocumented, and could cause issues later on
with new optimizations of the compiler.
Instead, the first function test_event_printk() was updated to look at
"%s" as well. If the "%s" argument is a pointer outside the event in the
ring buffer, it would find the field type of the event that is the problem
and mark the structure with a new flag called "needs_test". The event
itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that
this event has a field that needs to be verified before the event can be
printed using the printf format.
When the event fields are created from the field type structure, the
fields would copy the field type's "needs_test" value.
Finally, before being printed, a new function ignore_event() is called
which will check if the event has the TEST_STR flag set (if not, it
returns false). If the flag is set, it then iterates through the events
fields looking for the ones that have the "needs_test" flag set.
Then it uses the offset field from the field structure to find the pointer
in the ring buffer event. It runs the tests to make sure that pointer is
safe to print and if not, it triggers the WARN_ON() and also adds to the
trace output that the event in question has an unsafe memory access.
The ignore_event() makes the trace_check_vprintf() obsolete so it is
removed.
Link: https://lore.kernel.org/all/CAHk-=wh3uOnqnZPpR0PeLZZtyWbZLboZ7cHLCKRWsocvs9Y7hQ@mail.gmail.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.848621576@goodmis.org
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The test_event_printk() code makes sure that when a trace event is
registered, any dereferenced pointers in from the event's TP_printk() are
pointing to content in the ring buffer. But currently it does not handle
"%s", as there's cases where the string pointer saved in the ring buffer
points to a static string in the kernel that will never be freed. As that
is a valid case, the pointer needs to be checked at runtime.
Currently the runtime check is done via trace_check_vprintf(), but to not
have to replicate everything in vsnprintf() it does some logic with the
va_list that may not be reliable across architectures. In order to get rid
of that logic, more work in the test_event_printk() needs to be done. Some
of the strings can be validated at this time when it is obvious the string
is valid because the string will be saved in the ring buffer content.
Do all the validation of strings in the ring buffer at boot in
test_event_printk(), and make sure that the field of the strings that
point into the kernel are accessible. This will allow adding checks at
runtime that will validate the fields themselves and not rely on paring
the TP_printk() format at runtime.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.685917008@goodmis.org
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The process_pointer() helper function looks to see if various trace event
macros are used. These macros are for storing data in the event. This
makes it safe to dereference as the dereference will then point into the
event on the ring buffer where the content of the data stays with the
event itself.
A few helper functions were missing. Those were:
__get_rel_dynamic_array()
__get_dynamic_array_len()
__get_rel_dynamic_array_len()
__get_rel_sockaddr()
Also add a helper function find_print_string() to not need to use a middle
man variable to test if the string exists.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.521836792@goodmis.org
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The test_event_printk() analyzes print formats of trace events looking for
cases where it may dereference a pointer that is not in the ring buffer
which can possibly be a bug when the trace event is read from the ring
buffer and the content of that pointer no longer exists.
The function needs to accurately go from one print format argument to the
next. It handles quotes and parenthesis that may be included in an
argument. When it finds the start of the next argument, it uses a simple
"c = strstr(fmt + i, ',')" to find the end of that argument!
In order to include "%s" dereferencing, it needs to process the entire
content of the print format argument and not just the content of the first
',' it finds. As there may be content like:
({ const char *saved_ptr = trace_seq_buffer_ptr(p); static const char
*access_str[] = { "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"
}; union kvm_mmu_page_role role; role.word = REC->role;
trace_seq_printf(p, "sp gen %u gfn %llx l%u %u-byte q%u%s %s%s" " %snxe
%sad root %u %s%c", REC->mmu_valid_gen, REC->gfn, role.level,
role.has_4_byte_gpte ? 4 : 8, role.quadrant, role.direct ? " direct" : "",
access_str[role.access], role.invalid ? " invalid" : "", role.efer_nx ? ""
: "!", role.ad_disabled ? "!" : "", REC->root_count, REC->unsync ?
"unsync" : "sync", 0); saved_ptr; })
Which is an example of a full argument of an existing event. As the code
already handles finding the next print format argument, process the
argument at the end of it and not the start of it. This way it has both
the start of the argument as well as the end of it.
Add a helper function "process_pointer()" that will do the processing during
the loop as well as at the end. It also makes the code cleaner and easier
to read.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.362271189@goodmis.org
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZ2EoeQAKCRCAXGG7T9hj
vv0FAQDvP7/oSa3bx1rNrlBbmaTOCqAFX9HJRcb39OUsYyzqgQEAt7jGG6uau+xO
VRAE1u/s+9PA0VGQK8/+HEm0kGYA7wA=
=CiGc
-----END PGP SIGNATURE-----
Merge tag 'xsa465+xsa466-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Fix xen netfront crash (XSA-465) and avoid using the hypercall page
that doesn't do speculation mitigations (XSA-466)"
* tag 'xsa465+xsa466-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: remove hypercall page
x86/xen: use new hypercall functions instead of hypercall page
x86/xen: add central hypercall functions
x86/xen: don't do PV iret hypercall through hypercall page
x86/static-call: provide a way to do very early static-call updates
objtool/x86: allow syscall instruction
x86: make get_cpu_vendor() accessible from Xen code
xen/netfront: fix crash when removing device
fs/nfs/super.c should include fs/nfs/nfs4idmap.h for
declaration of nfs_idmap_cache_timeout. This fixes the sparse warning:
fs/nfs/super.c:1397:14: warning: symbol 'nfs_idmap_cache_timeout' was not declared. Should it be static?
Signed-off-by: Zhang Kunbo <zhangkunbo@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When the server is recalling a layout, we should ignore the count of
outstanding layoutget calls, since the server is expected to return
either NFS4ERR_RECALLCONFLICT or NFS4ERR_RETURNCONFLICT for as long as
the recall is outstanding.
Currently, we may end up livelocking, causing the layout to eventually
be forcibly revoked.
Fixes: bf0291dd22 ("pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Chase reports that their tester complaints about a locking context
mismatch:
=============================
[ BUG: Invalid wait context ]
6.13.0-rc1-gf137f14b7ccb-dirty #9 Not tainted
-----------------------------
syz.1.25198/182604 is trying to lock:
ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at: spin_lock_irq
include/linux/spinlock.h:376 [inline]
ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at:
io_match_task_safe io_uring/io_uring.c:218 [inline]
ffff88805e66a358 (&ctx->timeout_lock){-.-.}-{3:3}, at:
io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204
other info that might help us debug this:
context-{5:5}
1 lock held by syz.1.25198/182604:
#0: ffff88802b7d48c0 (&acct->lock){+.+.}-{2:2}, at:
io_acct_cancel_pending_work+0x2d/0x6b0 io_uring/io-wq.c:1049
stack backtrace:
CPU: 0 UID: 0 PID: 182604 Comm: syz.1.25198 Not tainted
6.13.0-rc1-gf137f14b7ccb-dirty #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x82/0xd0 lib/dump_stack.c:120
print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline]
check_wait_context kernel/locking/lockdep.c:4898 [inline]
__lock_acquire+0x883/0x3c80 kernel/locking/lockdep.c:5176
lock_acquire.part.0+0x11b/0x370 kernel/locking/lockdep.c:5849
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:119 [inline]
_raw_spin_lock_irq+0x36/0x50 kernel/locking/spinlock.c:170
spin_lock_irq include/linux/spinlock.h:376 [inline]
io_match_task_safe io_uring/io_uring.c:218 [inline]
io_match_task_safe+0x187/0x250 io_uring/io_uring.c:204
io_acct_cancel_pending_work+0xb8/0x6b0 io_uring/io-wq.c:1052
io_wq_cancel_pending_work io_uring/io-wq.c:1074 [inline]
io_wq_cancel_cb+0xb0/0x390 io_uring/io-wq.c:1112
io_uring_try_cancel_requests+0x15e/0xd70 io_uring/io_uring.c:3062
io_uring_cancel_generic+0x6ec/0x8c0 io_uring/io_uring.c:3140
io_uring_files_cancel include/linux/io_uring.h:20 [inline]
do_exit+0x494/0x27a0 kernel/exit.c:894
do_group_exit+0xb3/0x250 kernel/exit.c:1087
get_signal+0x1d77/0x1ef0 kernel/signal.c:3017
arch_do_signal_or_restart+0x79/0x5b0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xd8/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
which is because io_uring has ctx->timeout_lock nesting inside the
io-wq acct lock, the latter of which is used from inside the scheduler
and hence is a raw spinlock, while the former is a "normal" spinlock
and can hence be sleeping on PREEMPT_RT.
Change ctx->timeout_lock to be a raw spinlock to solve this nesting
dependency on PREEMPT_RT=y.
Reported-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit f8c989a0c8.
Before this commit, svc_export_put or expkey_put will call path_put with
sync mode. After this commit, path_put will be called with async mode.
And this can lead the unexpected results show as follow.
mkfs.xfs -f /dev/sda
echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
exportfs -ra
service nfs-server start
mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
mount /dev/sda /mnt/sda
touch /mnt1/sda/file
exportfs -r
umount /mnt/sda # failed unexcepted
The touch will finally call nfsd_cross_mnt, add refcount to mount, and
then add cache_head. Before this commit, exportfs -r will call
cache_flush to cleanup all cache_head, and path_put in
svc_export_put/expkey_put will be finished with sync mode. So, the
latter umount will always success. However, after this commit, path_put
will be called with async mode, the latter umount may failed, and if
we add some delay, umount will success too. Personally I think this bug
and should be fixed. We first revert before bugfix patch, and then fix
the original bug with a different way.
Fixes: f8c989a0c8 ("nfsd: release svc_expkey/svc_export with rcu_work")
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Coverity reports an uninit pointer read in qed_mcp_nvm_info_populate().
If EOPNOTSUPP is returned from qed_mcp_bist_nvm_get_num_images() ensure
nvm_info.num_images is set to 0 to avoid possible uninit assignment
to p_hwfn->nvm_info.image_att later on in out label.
Closes: https://scan5.scan.coverity.com/#/project-view/63204/10063?selectedIssue=1636666
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Gianfranco Trad <gianf.trad@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241215011733.351325-2-gianf.trad@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The OF node obtained by of_parse_phandle() is not freed. Call
of_node_put() to balance the refcount.
This bug was found by an experimental static analysis tool that I am
developing.
Fixes: 1676aba5ef ("net: ethernet: bgmac: device tree phy enablement")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241214014912.2810315-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Parthiban Veerasooran says:
====================
Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib
This patch series contain the below fixes.
- Infinite loop error when tx credits becomes 0.
- Race condition between tx skb reference pointers.
v2:
- Added mutex lock to protect tx skb reference handling.
v3:
- Added mutex protection in assigning new tx skb to waiting_tx_skb
pointer.
- Explained the possible scenario for the race condition with the time
diagram in the commit message.
v4:
- Replaced mutex with spin_lock_bh() variants as the start_xmit runs in
BH/softirq context which can't take sleeping locks.
====================
Link: https://patch.msgid.link/20241213123159.439739-1-parthiban.veerasooran@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are two skb pointers to manage tx skb's enqueued from n/w stack.
waiting_tx_skb pointer points to the tx skb which needs to be processed
and ongoing_tx_skb pointer points to the tx skb which is being processed.
SPI thread prepares the tx data chunks from the tx skb pointed by the
ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
processed, the tx skb pointed by the waiting_tx_skb is assigned to
ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
Whenever there is a new tx skb from n/w stack, it will be assigned to
waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
handled in two different threads.
Consider a scenario where the SPI thread processed an ongoing_tx_skb and
it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
without doing any NULL check. At this time, if the waiting_tx_skb pointer
is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
stack and there is a chance to overwrite the tx skb pointer with NULL in
the SPI thread. Finally one of the tx skb will be left as unhandled,
resulting packet missing and memory leak.
- Consider the below scenario where the TXC reported from the previous
transfer is 10 and ongoing_tx_skb holds an tx ethernet frame which can be
transported in 20 TXCs and waiting_tx_skb is still NULL.
tx_credits = 10; /* 21 are filled in the previous transfer */
ongoing_tx_skb = 20;
waiting_tx_skb = NULL; /* Still NULL */
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
- After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = 10;
waiting_tx_skb = NULL; /* Still NULL */
- Perform SPI transfer.
- Process SPI rx buffer to get the TXC from footers.
- Now let's assume previously filled 21 TXCs are freed so we are good to
transport the next remaining 10 tx chunks from ongoing_tx_skb.
tx_credits = 21;
ongoing_tx_skb = 10;
waiting_tx_skb = NULL;
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
- In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = NULL;
waiting_tx_skb = NULL;
- Now the below bad case might happen,
Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler)
--------------------------- -----------------------------------
- if waiting_tx_skb is NULL
- if ongoing_tx_skb is NULL
- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
- waiting_tx_skb = NULL
...
- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
To overcome the above issue, protect the moving of tx skb reference from
waiting_tx_skb pointer to ongoing_tx_skb pointer and assigning new tx skb
to waiting_tx_skb pointer, so that the other thread can't access the
waiting_tx_skb pointer until the current thread completes moving the tx
skb reference safely.
Fixes: 53fbde8ab2 ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
SPI thread wakes up to perform SPI transfer whenever there is an TX skb
from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is
transferred based on the availability tx credits in the MAC-PHY which is
reported from the previous SPI transfer. Sometimes there is a possibility
that TX skb is available to transmit but there is no tx credits from
MAC-PHY. In this case, there will not be any SPI transfer but the thread
will be running in an endless loop until tx credits available again.
So checking the availability of tx credits along with TX skb will prevent
the above infinite loop. When the tx credits available again that will be
notified through interrupt which will trigger the SPI transfer to get the
available tx credits.
Fixes: 53fbde8ab2 ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The source and destination rings were incorrectly assigned during the ring
linking process. The "source" ring, which contains the new segments,
was not spliced into the "destination" ring, leading to incorrect ring
expansion.
Fixes: fe688e5006 ("usb: xhci: refactor xhci_link_rings() to use source and destination rings")
Reported-by: Jeff Chua <jeff.chua.linux@gmail.com>
Closes: https://lore.kernel.org/lkml/CAAJw_ZtppNqC9XA=-WVQDr+vaAS=di7jo15CzSqONeX48H75MA@mail.gmail.com/
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241217102122.2316814-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xHC hosts from several vendors have the same issue where endpoints start
so slowly that a later queued 'Stop Endpoint' command may complete before
endpoint is up and running.
The 'Stop Endpoint' command fails with context state error as the endpoint
still appears as stopped.
See commit 42b7581376 ("usb: xhci: Limit Stop Endpoint retries") for
details
CC: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241217102122.2316814-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On gt reset, if a context is running, then accumulate it's active time
into the busyness counter since there will be no chance for the context
to switch out and update it's run time.
v2: Move comment right above the if (John)
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-4-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 7ed047da59cfa1acb558b95169d347acc8d85da1)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Active busyness of an engine is calculated using gt timestamp and the
context switch in time. While capturing the gt timestamp, it's possible
that the context switches out. This race could result in an active
busyness value that is greater than the actual context runtime value by a
small amount. This leads to a negative delta and throws off busyness
calculations for the user.
If a subsequent count is smaller than the previous one, just return the
previous one, since we expect the busyness to catch up.
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-3-umesh.nerlige.ramappa@intel.com
(cherry picked from commit cf907f6d294217985e9dafd9985dce874e04ca37)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
On GT reset, we store total busyness counts for all engines and
re-register the utilization buffer with GuC. At that time we should
reset the buffer, so that we don't get spurious busyness counts on
subsequent queries.
To repro this issue, run igt@perf_pmu@busy-hang followed by
igt@perf_pmu@most-busy-idle-check-all for a couple iterations.
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-2-umesh.nerlige.ramappa@intel.com
(cherry picked from commit abd318237fa6556c1e5225529af145ef15d5ff0d)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
The alias symbol name was renamed. Adjust module_phy_driver macro to
create the proper symbol name to fix module autoloading.
Fixes: 054a9cd395 ("modpost: rename alias symbol for MODULE_DEVICE_TABLE()")
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://patch.msgid.link/20241212130015.238863-1-fujita.tomonori@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The hypercall page is no longer needed. It can be removed, as from the
Xen perspective it is optional.
But, from Linux's perspective, it removes naked RET instructions that
escape the speculative protections that Call Depth Tracking and/or
Untrain Ret are trying to achieve.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Call the Xen hypervisor via the new xen_hypercall_func static-call
instead of the hypercall page.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Add generic hypercall functions usable for all normal (i.e. not iret)
hypercalls. Depending on the guest type and the processor vendor
different functions need to be used due to the to be used instruction
for entering the hypervisor:
- PV guests need to use syscall
- HVM/PVH guests on Intel need to use vmcall
- HVM/PVH guests on AMD and Hygon need to use vmmcall
As PVH guests need to issue hypercalls very early during boot, there
is a 4th hypercall function needed for HVM/PVH which can be used on
Intel and AMD processors. It will check the vendor type and then set
the Intel or AMD specific function to use via static_call().
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
There is a check for NULL at the start of create_txqs() and
create_rxqs() which tess if "nic_dev->txqs" is non-NULL. The
intention is that if the device is already open and the queues
are already created then we don't create them a second time.
However, the bug is that if we have an error in the create_txqs()
then the pointer doesn't get set back to NULL. The NULL check
at the start of the function will say that it's already open when
it's not and the device can't be used.
Set ->txqs back to NULL on cleanup on error.
Fixes: c3e79baf1b ("net-next/hinic: Add logical Txq and Rxq")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0cc98faf-a0ed-4565-a55b-0fa2734bc205@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Small follow-up to align this to an equivalent behavior as the bond driver.
The change in 3625920b62 ("teaming: fix vlan_features computing") removed
the netdevice vlan_features when there is no team port attached, yet it
leaves the full set of enc_features intact.
Instead, leave the default features as pre 3625920b62, and recompute once
we do have ports attached. Also, similarly as in bonding case, call the
netdev_base_features() helper on the enc_features.
Fixes: 3625920b62 ("teaming: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241213123657.401868-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "gl->tot_len" variable is controlled by the user. It comes from
process_responses(). On 32bit systems, the "gl->tot_len +
sizeof(struct cpl_pass_accept_req) + sizeof(struct rss_header)" addition
could have an integer wrapping bug. Use size_add() to prevent this.
Fixes: a089439478 ("crypto: chtls - Register chtls with net tls")
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c6bfb23c-2db2-4e1b-b8ab-ba3925c82ef5@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
netdev: fix repeated netlink messages in queue dumps
Fix dump continuation for queues and queue stats in the netdev family.
Because we used post-increment when saving id of dumped queue next
skb would re-dump the already dumped queue.
====================
Link: https://patch.msgid.link/20241213152244.3080955-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This test already catches a netlink bug fixed by this series,
but only when running on HW with many queues. Make sure the
netdevsim instance created has a lot of queues, and constrain
the size of the recv_buffer used by netlink.
While at it test both rx and tx queues.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20241213152244.3080955-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
recv_size parameter allows constraining the buffer size for dumps.
It's useful in testing kernel handling of dump continuation,
IOW testing dumps which span multiple skbs.
Let the tests set this parameter when initializing the YNL family.
Keep the normal default, we don't want tests to unintentionally
behave very differently than normal code.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20241213152244.3080955-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The context is supposed to record the next queue to dump,
not last dumped. If the dump doesn't fit we will restart
from the already-dumped queue, duplicating the message.
Before this fix and with the selftest improvements later
in this series we see:
# ./run_kselftest.sh -t drivers/net:stats.py
timeout set to 45
selftests: drivers/net: stats.py
KTAP version 1
1..5
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])),
# Check failed 45 != 44 repeated queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1,
# Check failed 45 != 44 missing queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])),
# Check failed 45 != 44 repeated queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1,
# Check failed 45 != 44 missing queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])),
# Check failed 103 != 100 repeated queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1,
# Check failed 103 != 100 missing queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])),
# Check failed 102 != 100 repeated queue keys
# Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex:
# Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1,
# Check failed 102 != 100 missing queue keys
not ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down
# Totals: pass:4 fail:1 xfail:0 xpass:0 skip:0 error:0
With the fix:
# ./ksft-net-drv/run_kselftest.sh -t drivers/net:stats.py
timeout set to 45
selftests: drivers/net: stats.py
KTAP version 1
1..5
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down
# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
Fixes: ab63a2387c ("netdev: add per-queue statistics")
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241213152244.3080955-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The context is supposed to record the next queue to dump,
not last dumped. If the dump doesn't fit we will restart
from the already-dumped queue, duplicating the message.
Before this fix and with the selftest improvements later
in this series we see:
# ./run_kselftest.sh -t drivers/net:queues.py
timeout set to 45
selftests: drivers/net: queues.py
KTAP version 1
1..2
# Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues:
# Check| ksft_eq(queues, expected)
# Check failed 102 != 100
# Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues:
# Check| ksft_eq(queues, expected)
# Check failed 101 != 100
not ok 1 queues.get_queues
ok 2 queues.addremove_queues
# Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0
not ok 1 selftests: drivers/net: queues.py # exit=1
With the fix:
# ./ksft-net-drv/run_kselftest.sh -t drivers/net:queues.py
timeout set to 45
selftests: drivers/net: queues.py
KTAP version 1
1..2
ok 1 queues.get_queues
ok 2 queues.addremove_queues
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Fixes: 6b6171db7f ("netdev-genl: Add netlink framework functions for queue")
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241213152244.3080955-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GCC performs value range tracking for variables as a way to provide better
diagnostics. One place this is regularly seen is with warnings associated
with bounds-checking, e.g. -Wstringop-overflow, -Wstringop-overread,
-Warray-bounds, etc. In order to keep the signal-to-noise ratio high,
warnings aren't emitted when a value range spans the entire value range
representable by a given variable. For example:
unsigned int len;
char dst[8];
...
memcpy(dst, src, len);
If len's value is unknown, it has the full "unsigned int" range of [0,
UINT_MAX], and GCC's compile-time bounds checks against memcpy() will
be ignored. However, when a code path has been able to narrow the range:
if (len > 16)
return;
memcpy(dst, src, len);
Then the range will be updated for the execution path. Above, len is
now [0, 16] when reading memcpy(), so depending on other optimizations,
we might see a -Wstringop-overflow warning like:
error: '__builtin_memcpy' writing between 9 and 16 bytes into region of size 8 [-Werror=stringop-overflow]
When building with CONFIG_FORTIFY_SOURCE, the fortified run-time bounds
checking can appear to narrow value ranges of lengths for memcpy(),
depending on how the compiler constructs the execution paths during
optimization passes, due to the checks against the field sizes. For
example:
if (p_size_field != SIZE_MAX &&
p_size != p_size_field && p_size_field < size)
As intentionally designed, these checks only affect the kernel warnings
emitted at run-time and do not block the potentially overflowing memcpy(),
so GCC thinks it needs to produce a warning about the resulting value
range that might be reaching the memcpy().
We have seen this manifest a few times now, with the most recent being
with cpumasks:
In function ‘bitmap_copy’,
inlined from ‘cpumask_copy’ at ./include/linux/cpumask.h:839:2,
inlined from ‘__padata_set_cpumasks’ at kernel/padata.c:730:2:
./include/linux/fortify-string.h:114:33: error: ‘__builtin_memcpy’ reading between 257 and 536870904 bytes from a region of size 256 [-Werror=stringop-overread]
114 | #define __underlying_memcpy __builtin_memcpy
| ^
./include/linux/fortify-string.h:633:9: note: in expansion of macro ‘__underlying_memcpy’
633 | __underlying_##op(p, q, __fortify_size); \
| ^~~~~~~~~~~~~
./include/linux/fortify-string.h:678:26: note: in expansion of macro ‘__fortify_memcpy_chk’
678 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
| ^~~~~~~~~~~~~~~~~~~~
./include/linux/bitmap.h:259:17: note: in expansion of macro ‘memcpy’
259 | memcpy(dst, src, len);
| ^~~~~~
kernel/padata.c: In function ‘__padata_set_cpumasks’:
kernel/padata.c:713:48: note: source object ‘pcpumask’ of size [0, 256]
713 | cpumask_var_t pcpumask,
| ~~~~~~~~~~~~~~^~~~~~~~
This warning is _not_ emitted when CONFIG_FORTIFY_SOURCE is disabled,
and with the recent -fdiagnostics-details we can confirm the origin of
the warning is due to FORTIFY's bounds checking:
../include/linux/bitmap.h:259:17: note: in expansion of macro 'memcpy'
259 | memcpy(dst, src, len);
| ^~~~~~
'__padata_set_cpumasks': events 1-2
../include/linux/fortify-string.h:613:36:
612 | if (p_size_field != SIZE_MAX &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~
613 | p_size != p_size_field && p_size_field < size)
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
| |
| (1) when the condition is evaluated to false
| (2) when the condition is evaluated to true
'__padata_set_cpumasks': event 3
114 | #define __underlying_memcpy __builtin_memcpy
| ^
| |
| (3) out of array bounds here
Note that the cpumask warning started appearing since bitmap functions
were recently marked __always_inline in commit ed8cd2b3bd ("bitmap:
Switch from inline to __always_inline"), which allowed GCC to gain
visibility into the variables as they passed through the FORTIFY
implementation.
In order to silence these false positives but keep otherwise deterministic
compile-time warnings intact, hide the length variable from GCC with
OPTIMIZE_HIDE_VAR() before calling the builtin memcpy.
Additionally add a comment about why all the macro args have copies with
const storage.
Reported-by: "Thomas Weißschuh" <linux@weissschuh.net>
Closes: https://lore.kernel.org/all/db7190c8-d17f-4a0d-bc2f-5903c79f36c2@t-8ch.de/
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Closes: https://lore.kernel.org/all/20241112124127.1666300-1-nilay@linux.ibm.com/
Tested-by: Nilay Shroff <nilay@linux.ibm.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kees Cook <kees@kernel.org>
The values returned by the driver after processing the contents of the
Temperature Result and the Temperature Limit Registers do not correspond to
the TMP512/TMP513 specifications. A raw register value is converted to a
signed integer value by a sign extension in accordance with the algorithm
provided in the specification, but due to the off-by-one error in the sign
bit index, the result is incorrect.
According to the TMP512 and TMP513 datasheets, the Temperature Result (08h
to 0Bh) and Limit (11h to 14h) Registers are 13-bit two's complement
integer values, shifted left by 3 bits. The value is scaled by 0.0625
degrees Celsius per bit. E.g., if regval = 1 1110 0111 0000 000, the
output should be -25 degrees, but the driver will return +487 degrees.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 59dfa75e5d ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.")
Signed-off-by: Murad Masimov <m.masimov@maxima.ru>
Link: https://lore.kernel.org/r/20241216173648.526-4-m.masimov@maxima.ru
[groeck: fixed description line length]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>