mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-01 10:43:43 +00:00
fe4bfa9b6d
1323404 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Paul Aurich
|
3fa640d035 |
smb: During unmount, ensure all cached dir instances drop their dentry
The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
race with various cached directory operations, which ultimately results
in dentries not being dropped and these kernel BUGs:
BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
VFS: Busy inodes after unmount of cifs (cifs)
------------[ cut here ]------------
kernel BUG at fs/super.c:661!
This happens when a cfid is in the process of being cleaned up when, and
has been removed from the cfids->entries list, including:
- Receiving a lease break from the server
- Server reconnection triggers invalidate_all_cached_dirs(), which
removes all the cfids from the list
- The laundromat thread decides to expire an old cfid.
To solve these problems, dropping the dentry is done in queued work done
in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
flushes that workqueue after it drops all the dentries of which it's
aware. This is a global workqueue (rather than scoped to a mount), but
the queued work is minimal.
The final cleanup work for cleaning up a cfid is performed via work
queued in the serverclose_wq workqueue; this is done separate from
dropping the dentries so that close_all_cached_dirs() doesn't block on
any server operations.
Both of these queued works expect to invoked with a cfid reference and
a tcon reference to avoid those objects from being freed while the work
is ongoing.
While we're here, add proper locking to close_all_cached_dirs(), and
locking around the freeing of cfid->dentry.
Fixes:
|
||
Paulo Alcantara
|
796733054e |
smb: client: fix noisy message when mounting shares
When the client unconditionally attempts to get an DFS referral to check if share is DFS, some servers may return different errors that aren't handled in smb2_get_dfs_refer(), so the following will be logged in dmesg: CIFS: VFS: \\srv\IPC$ smb2_get_dfs_refer: ioctl error... which can confuse some users while mounting an SMB share. Fix this by logging such error with FYI. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Paulo Alcantara
|
36008fe6e3 |
smb: client: don't try following DFS links in cifs_tree_connect()
We can't properly support chasing DFS links in cifs_tree_connect() because (1) We don't support creating new sessions while we're reconnecting, which would be required for DFS interlinks. (2) ->is_path_accessible() can't be called from cifs_tree_connect() as it would deadlock with smb2_reconnect(). This is required for checking if new DFS target is a nested DFS link. By unconditionally trying to get an DFS referral from new DFS target isn't correct because if the new DFS target (interlink) is an DFS standalone namespace, then we would end up getting -ELOOP and then potentially leaving tcon disconnected. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Paulo Alcantara
|
e148107598 |
smb: client: allow reconnect when sending ioctl
cifs_tree_connect() no longer uses ioctl, so allow sessions to be reconnected when sending ioctls. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Paulo Alcantara
|
b2fe4a8fa0 |
smb: client: get rid of @nlsc param in cifs_tree_connect()
We can access local_nls directly from @tcon->ses, so there is no need to pass it as parameter in cifs_tree_connect(). Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Paulo Alcantara
|
28ec614f2f |
smb: client: allow more DFS referrals to be cached
In some DFS setups, a single DFS share may contain hundreds of DFS links and increasing the DFS cache to allow more referrals to be cached improves DFS failover as the client will likely find a cached DFS referral when reconnecting and then avoiding unnecessary remounts. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Masahiro Yamada
|
d171136019 |
kbuild: use 'output' variable to create the output directory
$(KBUILD_OUTPUT) specifies the output directory of kernel builds. Use a more generic name, 'output', to better reflect this code hunk in the context of external module builds. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> |
||
Masahiro Yamada
|
5ea1721654 |
kbuild: rename abs_objtree to abs_output
'objtree' refers to the top of the output directory of kernel builds. Rename abs_objtree to a more generic name, to better reflect its use in the context of external module builds. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> |
||
Masahiro Yamada
|
214c0eea43 |
kbuild: add $(objtree)/ prefix to some in-kernel build artifacts
$(objtree) refers to the top of the output directory of kernel builds. This commit adds the explicit $(objtree)/ prefix to build artifacts needed for building external modules. This change has no immediate impact, as the top-level Makefile currently defines: objtree := . This commit prepares for supporting the building of external modules in a different directory. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> |
||
Masahiro Yamada
|
0afd73c5f5 |
kbuild: replace two $(abs_objtree) with $(CURDIR) in top Makefile
Kbuild changes the working directory until it matches $(abs_objtree). When $(need-sub-make) is empty, $(abs_objtree) is the same as $(CURDIR). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de> |
||
Matt Fleming
|
bcbbf493f2 |
kbuild: deb-pkg: Don't fail if modules.order is missing
Kernels built without CONFIG_MODULES might still want to create -dbg deb
packages but install_linux_image_dbg() assumes modules.order always
exists. This obviously isn't true if no modules were built, so we should
skip reading modules.order in that case.
Fixes:
|
||
Masahiro Yamada
|
dbefa1f31a |
Rename .data.once to .data..once to fix resetting WARN*_ONCE
Commit |
||
Masahiro Yamada
|
bb43a59944 |
Rename .data.unlikely to .data..unlikely
Commit |
||
Rong Xu
|
d63b852430 |
kbuild: Fix Propeller build option
The '-fbasic-block-sections=labels' option has been deprecated in tip of tree clang (20.0.0) [1]. While the option still works, a warning is emitted: clang: warning: argument '-fbasic-block-sections=labels' is deprecated, use '-fbasic-block-address-map' instead [-Wdeprecated] Add a version check to set the proper option. Link: https://github.com/llvm/llvm-project/pull/110039 [1] Signed-off-by: Rong Xu <xur@google.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Rong Xu
|
d5dc958361 |
kbuild: Add Propeller configuration for kernel build
Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This commit is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the host machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Stephane Eranian <eranian@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Rong Xu
|
2fd65f7afd |
AutoFDO: Enable machine function split optimization for AutoFDO
Enable the machine function split optimization for AutoFDO in Clang. Machine function split (MFS) is a pass in the Clang compiler that splits a function into hot and cold parts. The linker groups all cold blocks across functions together. This decreases hot code fragmentation and improves iCache and iTLB utilization. MFS requires a profile so this is enabled only for the AutoFDO builds. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Rong Xu
|
0847420f5e |
AutoFDO: Enable -ffunction-sections for the AutoFDO build
Enable -ffunction-sections by default for the AutoFDO build. With -ffunction-sections, the compiler places each function in its own section named .text.function_name instead of placing all functions in the .text section. In the AutoFDO build, this allows the linker to utilize profile information to reorganize functions for improved utilization of iCache and iTLB. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Rong Xu
|
db0b2991ae |
vmlinux.lds.h: Add markers for text_unlikely and text_hot sections
Add markers like __hot_text_start, __hot_text_end, __unlikely_text_start, and __unlikely_text_end which will be included in System.map. These markers indicate how the compiler groups functions, providing valuable information to developers about the layout and optimization of the code. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Rong Xu
|
0043ecea23 |
vmlinux.lds.h: Adjust symbol ordering in text output section
When the -ffunction-sections compiler option is enabled, each function is placed in a separate section named .text.function_name rather than putting all functions in a single .text section. However, using -function-sections can cause problems with the linker script. The comments included in include/asm-generic/vmlinux.lds.h note these issues.: “TEXT_MAIN here will match .text.fixup and .text.unlikely if dead code elimination is enabled, so these sections should be converted to use ".." first.” It is unclear whether there is a straightforward method for converting a suffix to "..". This patch modifies the order of subsections within the text output section. Specifically, it changes current order: .text.hot, .text, .text_unlikely, .text.unknown, .text.asan to the new order: .text.asan, .text.unknown, .text_unlikely, .text.hot, .text Here is the rationale behind the new layout: The majority of the code resides in three sections: .text.hot, .text, and .text.unlikely, with .text.unknown containing a negligible amount. .text.asan is only generated in ASAN builds. The primary goal is to group code segments based on their execution frequency (hotness). First, we want to place .text.hot adjacent to .text. Since we cannot put .text.hot after .text (Due to constraints with -ffunction-sections, placing .text.hot after .text is problematic), we need to put .text.hot before .text. Then it comes to .text.unlikely, we cannot put it after .text (same -ffunction-sections issue) . Therefore, we position .text.unlikely before .text.hot. .text.unknown and .tex.asan follow the same logic. This revised ordering effectively reverses the original arrangement (for .text.unlikely, .text.unknown, and .tex.asan), maintaining a similar level of affinity between sections. It also places .text.hot section at the beginning of a page to better utilize the TLB entry. Note that the limitation arises because the linker script employs glob patterns instead of regular expressions for string matching. While there is a method to maintain the current order using complex patterns, this significantly complicates the pattern and increases the likelihood of errors. This patch also changes vmlinux.lds.S for the sparc64 architecture to accommodate specific symbol placement requirements. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Rong Xu
|
52892ed6b0 |
MIPS: Place __kernel_entry at the beginning of text section
Mark __kernel_entry as ".head.text" and place HEAD_TEXT before TEXT_TEXT in the linker script. This ensures that __kernel_entry will be placed at the beginning of text section. Drop mips from scripts/head-object-list.txt. Signed-off-by: Rong Xu <xur@google.com> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Closes: https://lore.kernel.org/lkml/c6719149-8531-4174-824e-a3caf4bc6d0e@alliedtelesis.co.nz/T/ Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Linus Torvalds
|
b50ecc5aca |
perf tools changes for v6.13
perf record ----------- * Enable leader sampling for inherited task events. It was supported only for system-wide events but the kernel started to support such a setup since v6.12. This is to reduce the number of PMU interrupts. The samples of the leader event will contain counts of other events and no samples will be generated for the other member events. $ perf record -e '{cycles,instructions}:S' ${MYPROG} perf report ----------- * Fix --branch-history option to display more branch-related information like prediction, abort and cycles which is available on Intel machines. $ perf record -bg -- perf test -w brstack $ perf report --branch-history ... # # Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage] # ........ ........................ .............. .................... ......... ..... ...... .................... # 8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - - | ---xas_load xarray.h:171 | |--5.68%--xas_load xarray.c:245 (cycles:1) | xas_load xarray.c:242 | xas_load xarray.h:1260 (cycles:1) | xas_descend xarray.c:146 | xas_load xarray.c:244 (cycles:2) | xas_load xarray.c:245 | xas_descend xarray.c:218 (cycles:10) ... perf stat --------- * Add HWMON PMU support. The HWMON provides various system information like CPU/GPU temperature, fan speed and so on. Expose them as PMU events so that users can see the values using perf stat commands. $ perf stat -e temp_cpu,fan1 true Performance counter stats for 'true': 60.00 'C temp_cpu 0 rpm fan1 0.000745382 seconds time elapsed 0.000883000 seconds user 0.000000000 seconds sys * Display metric threshold in JSON output. Some metrics define thresholds to classify value ranges. It used to be in a different color but it won't work for JSON. Add "metric-threshold" field to the JSON that can be one of "good", "less good", "nearly bad" and "bad". # perf stat -a -M TopdownL1 -j true {"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"} {"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, } ... perf sched ---------- * Add -P/--pre-migrations option for 'timehist' sub-command to track time a task waited on a run-queue before migrating to a different CPU. $ perf sched timehist -P time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000 585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000 585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000 585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000 585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000 585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000 585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000 585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000 585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001 585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000 ... Build ----- * Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf tools are generally built with libelf and libdw which has unwinder functionality. The libunwind support predates it and no need to have duplicate unwinders by default. * Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's using libdw for handling DWARF information. Internals --------- * Do not set exclude_guest bit in the perf_event_attr by default. This was causing a trouble in AMD IBS PMU as it doesn't support the bit. The bit will be set when it's needed later by the fallback logic. Also update the missing feature detection logic to make sure not clear supported bits unnecessarily. * Run perf test in parallel by default and mark flaky tests "exclusive" to run them serially at the end. Some test numbers are changed but the test can complete in less than half the time. JSON vendor events ------------------ * Add AMD Zen 5 events and metrics. * Add i.MX91 and i.MX95 DDR metrics * Fix HiSilicon HIP08 Topdown metric name. * Support compat events on PowerPC. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ0Qi3gAKCRCMstVUGiXM g6NIAP49eoSmQF40u55sJN0J7RpYd+bTgXZkahv0IUCBX98TLwEA2NrK0oUcB84C xeanq28/3JxNM/oBpsEvvB8mb/0lGwI= =FAVF -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf record: - Enable leader sampling for inherited task events. It was supported only for system-wide events but the kernel started to support such a setup since v6.12. This is to reduce the number of PMU interrupts. The samples of the leader event will contain counts of other events and no samples will be generated for the other member events. $ perf record -e '{cycles,instructions}:S' ${MYPROG} perf report: - Fix --branch-history option to display more branch-related information like prediction, abort and cycles which is available on Intel machines. $ perf record -bg -- perf test -w brstack $ perf report --branch-history ... # # Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage] # ........ ........................ .............. .................... ......... ..... ...... .................... # 8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - - | ---xas_load xarray.h:171 | |--5.68%--xas_load xarray.c:245 (cycles:1) | xas_load xarray.c:242 | xas_load xarray.h:1260 (cycles:1) | xas_descend xarray.c:146 | xas_load xarray.c:244 (cycles:2) | xas_load xarray.c:245 | xas_descend xarray.c:218 (cycles:10) ... perf stat: - Add HWMON PMU support. The HWMON provides various system information like CPU/GPU temperature, fan speed and so on. Expose them as PMU events so that users can see the values using perf stat commands. $ perf stat -e temp_cpu,fan1 true Performance counter stats for 'true': 60.00 'C temp_cpu 0 rpm fan1 0.000745382 seconds time elapsed 0.000883000 seconds user 0.000000000 seconds sys - Display metric threshold in JSON output. Some metrics define thresholds to classify value ranges. It used to be in a different color but it won't work for JSON. Add "metric-threshold" field to the JSON that can be one of "good", "less good", "nearly bad" and "bad". # perf stat -a -M TopdownL1 -j true {"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"} {"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, } ... perf sched: - Add -P/--pre-migrations option for 'timehist' sub-command to track time a task waited on a run-queue before migrating to a different CPU. $ perf sched timehist -P time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000 585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000 585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000 585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000 585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000 585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000 585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000 585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000 585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001 585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000 ... Build: - Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf tools are generally built with libelf and libdw which has unwinder functionality. The libunwind support predates it and no need to have duplicate unwinders by default. - Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's using libdw for handling DWARF information. Internals: - Do not set exclude_guest bit in the perf_event_attr by default. This was causing a trouble in AMD IBS PMU as it doesn't support the bit. The bit will be set when it's needed later by the fallback logic. Also update the missing feature detection logic to make sure not clear supported bits unnecessarily. - Run perf test in parallel by default and mark flaky tests "exclusive" to run them serially at the end. Some test numbers are changed but the test can complete in less than half the time. JSON vendor events: - Add AMD Zen 5 events and metrics. - Add i.MX91 and i.MX95 DDR metrics - Fix HiSilicon HIP08 Topdown metric name. - Support compat events on PowerPC" * tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (232 commits) perf tests: Fix hwmon parsing with PMU name test perf hwmon_pmu: Ensure hwmon key union is zeroed before use perf tests hwmon_pmu: Remove double evlist__delete() perf/test: fix perf ftrace test on s390 perf bpf-filter: Return -ENOMEM directly when pfi allocation fails perf test: Correct hwmon test PMU detection perf: Remove unused del_perf_probe_events() perf pmu: Move pmu_metrics_table__find and remove ARM override perf jevents: Add map_for_cpu() perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str perf header: Avoid transitive PMU includes perf arm64 header: Use cpu argument in get_cpuid perf header: Refactor get_cpuid to take a CPU for ARM perf header: Move is_cpu_online to numa bench perf jevents: fix breakage when do perf stat on system metric perf test: Add missing __exit calls in tool/hwmon tests perf tests: Make leader sampling test work without branch event perf util: Remove kernel version deadcode perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved perf test shell trace_exit_race: Show what went wrong in verbose mode ... |
||
Linus Torvalds
|
9160b68e0c |
parisc architecture fixes for kernel v6.13-rc1:
- parisc/ftrace: Fix function graph tracing disablement -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZ0X0gwAKCRD3ErUQojoP X8UfAQDUczrsLEWKsFgOjAsHfZPGYXuC3EwHGwkSirbYUhX65QEAmcUn/w63siIi 4kYWylQXDuTMzgYmuHlEDV3lvmQU8QU= =At7d -----END PGP SIGNATURE----- Merge tag 'parisc-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture update from Helge Deller: - Fix function graph tracing disablement on parisc * tag 'parisc-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc/ftrace: Fix function graph tracing disablement |
||
Linus Torvalds
|
7ebe7afed7 |
m68knommu: updates and fixes for v6.13
Fixes include: . only include FEC platform entries when hardware supports it . fix typo in ifdef config name -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAmdFEe0QHGdlcmdAa2Vy bmVsLm9yZwAKCRBOJBWpob0vgHNwEACLr57yTIXryLYqWtx5J/OooIWS6GdYga2V Hgyfy79ADQWwQ9puNnc2NczlYllv82gy/l8zqMQRChbi5CKwXWyH2wPKQe5MrWH5 jz84pLcOVCo1SoT5Zhj/q1SyHy53fWMReK8dQiDjn+EcprsrO8D/0clmkFRfen9R JZ2YRaNHa6fpQ2u32wDE7FWEyHkjIc4NUh1jn+5uelOvRNNXdPNxmCmLFzJmBs96 zlrqOh5udKnsNNHUdr6Fz6te5DUdpObcmALtZ+2hvdaKjGeSA/3k9ZsnLxMEK4yu EsxzWjabK+Eb6toLScsmEXUDxxXSV4rWqQ9VpeISuntgpBC/S8AWvrPPPAzlRAUa Oudb7TjOsM3bucqadfBDHgjDtFk/RhYibghzkahARFD7EAKhXTrVeFU94dMm+vQe PeFr6BV87qFzAhs5VprrTgC3giZF/zG9XCmViBz7D3Fo+I0ttmqRhA9iAllY8PZU kJyU7zBeCWDRSeMnodufXmrbZ3Nmjz3z/Lt2EfZ7Q9wPcTPafQBus+pg3MP9BPXb 7sLBxjzikaFkg/jwD4WsNGoC3N2mRdbSBjf05nzgG2+RM2bgxA4iQgDFhzAsLEYX GNRSDUjiaKGLrsMztYX4EZbcDB8cyfVH+x+t06whgOnj4p1e8gV7MRlouAr3L35i A0I2bDwRdg== =zhGC -----END PGP SIGNATURE----- Merge tag 'm68knommu-for-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: - only include FEC platform entries when hardware supports it - fix typo in ifdef config name * tag 'm68knommu-for-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: coldfire/device.c: only build FEC when HW macros are defined m68k: mcfgpio: Fix incorrect register offset for CONFIG_M5441x |
||
Linus Torvalds
|
798bb342e0 |
Rust changes for v6.13
Toolchain and infrastructure: - Enable a series of lints, including safety-related ones, e.g. the compiler will now warn about missing safety comments, as well as unnecessary ones. How safety documentation is organized is a frequent source of review comments, thus having the compiler guide new developers on where they are expected (and where not) is very nice. - Start using '#[expect]': an interesting feature in Rust (stabilized in 1.81.0) that makes the compiler warn if an expected warning was _not_ emitted. This is useful to avoid forgetting cleaning up locally ignored diagnostics ('#[allow]'s). - Introduce '.clippy.toml' configuration file for Clippy, the Rust linter, which will allow us to tweak its behaviour. For instance, our first use cases are declaring a disallowed macro and, more importantly, enabling the checking of private items. - Lints-related fixes and cleanups related to the items above. - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the kernel into stable Rust, one of the major pieces of the puzzle is the support to write custom types that can be used as 'self', i.e. as receivers, since the kernel needs to write types such as 'Arc' that common userspace Rust would not. 'arbitrary_self_types' has been accepted to become stable, and this is one of the steps required to get there. - Remove usage of the 'new_uninit' unstable feature. - Use custom C FFI types. Includes a new 'ffi' crate to contain our custom mapping, instead of using the standard library 'core::ffi' one. The actual remapping will be introduced in a later cycle. - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize' instead of 32/64-bit integers. - Fix 'size_t' in bindgen generated prototypes of C builtins. - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue in the projects, which we managed to trigger with the upcoming tracepoint support. It includes a build test since some distributions backported the fix (e.g. Debian -- thanks!). All major distributions we list should be now OK except Ubuntu non-LTS. 'macros' crate: - Adapt the build system to be able run the doctests there too; and clean up and enable the corresponding doctests. 'kernel' crate: - Add 'alloc' module with generic kernel allocator support and remove the dependency on the Rust standard library 'alloc' and the extension traits we used to provide fallible methods with flags. Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'. Add the 'Box' type (a heap allocation for a single value of type 'T' that is also generic over an allocator and considers the kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add 'ArrayLayout' type. Add 'Vec' (a contiguous growable array type) and its shorthand aliases '{K,V,KV}Vec', including iterator support. For instance, now we may write code such as: let mut v = KVec::new(); v.push(1, GFP_KERNEL)?; assert_eq!(&v, &[1]); Treewide, move as well old users to these new types. - 'sync' module: add global lock support, including the 'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types and the 'global_lock!' macro. Add the 'Lock::try_lock' method. - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make conversion functions public. - 'page' module: add 'page_align' function. - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes' traits. - 'block::mq::request' module: improve rendered documentation. - 'types' module: extend 'Opaque' type documentation and add simple examples for the 'Either' types. drm/panic: - Clean up a series of Clippy warnings. Documentation: - Add coding guidelines for lints and the '#[expect]' feature. - Add Ubuntu to the list of distributions in the Quick Start guide. MAINTAINERS: - Add Danilo Krummrich as maintainer of the new 'alloc' module. And a few other small cleanups and fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmdFMIgACgkQGXyLc2ht IW16hQ/+KX/jmdGoXtNXx7T6yG6SJ/txPOieGAWfhBf6C3bqkGrU9Gnw/O3VWrxf eyj1QLQaIVUkmumWCefeiy9u3xRXx5fpS0tWJOjUtxC5NcS7vCs0AHQs1skIa6H+ YD6UKDPOy7CB5fVYqo13B5xnFAlciU0dLo6IGB6bB/lSpCudGLE9+nukfn5H3/R1 DTc3/fbSoYQU6Ij/FKscB+D/A7ojdYaReodhbzNw1lChg1MrJlCpqoQvHPE8ijg+ UDljHFFvgKdhSQL9GTa3LC7X4DsnihMWzXt14m6mMOqBa6TqF47WUhhgC77pHEI2 v/Yy8MLq0pdIzT1wFjsqs6opuvXc7K5Yk5Y60HDsDyIyjk2xgOjh6ZlD0EV161gS 7w1NtaKd/Cn7hnL7Ua51yJDxJTMllne3fTWemhs3Zd63j7ham98yOoiw+6L2QaM4 C9nW48vfUuTwDuYJ5HU0uSugubuHW3Ng5JEvMcvd4QjmaI1bQNkgVzefR5j3dLw8 9kEOTzJoxHpu5B7PZVTEd/L95hlmk1csSQObxi7JYCCimMkusF1S+heBzV/SqWD5 5ioEhCnSKE05fhQs0Uxns1HkcFle8Bn6r3aSAWV6yaR8oF94yHcuaZRUKxKMHw+1 cmBE2X8Yvtldw+CYDwEGWjKDtwOStbqk+b/ZzP7f7/p56QH9lSg= =Kn7b -----END PGP SIGNATURE----- Merge tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux Pull rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Enable a series of lints, including safety-related ones, e.g. the compiler will now warn about missing safety comments, as well as unnecessary ones. How safety documentation is organized is a frequent source of review comments, thus having the compiler guide new developers on where they are expected (and where not) is very nice. - Start using '#[expect]': an interesting feature in Rust (stabilized in 1.81.0) that makes the compiler warn if an expected warning was _not_ emitted. This is useful to avoid forgetting cleaning up locally ignored diagnostics ('#[allow]'s). - Introduce '.clippy.toml' configuration file for Clippy, the Rust linter, which will allow us to tweak its behaviour. For instance, our first use cases are declaring a disallowed macro and, more importantly, enabling the checking of private items. - Lints-related fixes and cleanups related to the items above. - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the kernel into stable Rust, one of the major pieces of the puzzle is the support to write custom types that can be used as 'self', i.e. as receivers, since the kernel needs to write types such as 'Arc' that common userspace Rust would not. 'arbitrary_self_types' has been accepted to become stable, and this is one of the steps required to get there. - Remove usage of the 'new_uninit' unstable feature. - Use custom C FFI types. Includes a new 'ffi' crate to contain our custom mapping, instead of using the standard library 'core::ffi' one. The actual remapping will be introduced in a later cycle. - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize' instead of 32/64-bit integers. - Fix 'size_t' in bindgen generated prototypes of C builtins. - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue in the projects, which we managed to trigger with the upcoming tracepoint support. It includes a build test since some distributions backported the fix (e.g. Debian -- thanks!). All major distributions we list should be now OK except Ubuntu non-LTS. 'macros' crate: - Adapt the build system to be able run the doctests there too; and clean up and enable the corresponding doctests. 'kernel' crate: - Add 'alloc' module with generic kernel allocator support and remove the dependency on the Rust standard library 'alloc' and the extension traits we used to provide fallible methods with flags. Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'. Add the 'Box' type (a heap allocation for a single value of type 'T' that is also generic over an allocator and considers the kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add 'ArrayLayout' type. Add 'Vec' (a contiguous growable array type) and its shorthand aliases '{K,V,KV}Vec', including iterator support. For instance, now we may write code such as: let mut v = KVec::new(); v.push(1, GFP_KERNEL)?; assert_eq!(&v, &[1]); Treewide, move as well old users to these new types. - 'sync' module: add global lock support, including the 'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types and the 'global_lock!' macro. Add the 'Lock::try_lock' method. - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make conversion functions public. - 'page' module: add 'page_align' function. - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes' traits. - 'block::mq::request' module: improve rendered documentation. - 'types' module: extend 'Opaque' type documentation and add simple examples for the 'Either' types. drm/panic: - Clean up a series of Clippy warnings. Documentation: - Add coding guidelines for lints and the '#[expect]' feature. - Add Ubuntu to the list of distributions in the Quick Start guide. MAINTAINERS: - Add Danilo Krummrich as maintainer of the new 'alloc' module. And a few other small cleanups and fixes" * tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux: (82 commits) rust: alloc: Fix `ArrayLayout` allocations docs: rust: remove spurious item in `expect` list rust: allow `clippy::needless_lifetimes` rust: warn on bindgen < 0.69.5 and libclang >= 19.1 rust: use custom FFI integer types rust: map `__kernel_size_t` and friends also to usize/isize rust: fix size_t in bindgen prototypes of C builtins rust: sync: add global lock support rust: macros: enable the rest of the tests rust: macros: enable paste! use from macro_rules! rust: enable macros::module! tests rust: kbuild: expand rusttest target for macros rust: types: extend `Opaque` documentation rust: block: fix formatting of `kernel::block::mq::request` module rust: macros: fix documentation of the paste! macro rust: kernel: fix THIS_MODULE header path in ThisModule doc comment rust: page: add Rust version of PAGE_ALIGN rust: helpers: remove unnecessary header includes rust: exports: improve grammar in commentary drm/panic: allow verbose version check ... |
||
Linus Torvalds
|
e68ce9474a |
A few late-arriving fixes, plus two more significant changes that were
*almost* ready at the beginning of the merge window: - A new document on debugging techniques from Sebastian Fricke - A clarification on MODULE_LICENSE terms meant to head off the sort of confusion that led to the recent Tuxedo Computers mess. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmdFEf0PHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YLCQH+wY0lGEF5BloFrNOcwKoB96rXQjLMlPVpccP lWVprapS+NrlhTq4RZ9b6qbQ1RAdu0JCppew1viwclO8g8SmUoXmqNnlYIFH+3MB HZETbPWUHK2BRQqV7h3VkgvO30hUa0kHL3WfmKpGEG1P6FsQQ5o3WDi3YN8GM6xk tfHSiR4rgBw40VLyeDtRi++aEgYa/DfWpdtco58poCiAS6soTDDEWCxSBdibeDOQ YDuj1NtqieMk963z8CoJm/Qbw/ZLfW2jd3A43cZ0h6g/oloVYSucFcMjXpePvoZr 9BSkU9OyX5BRfhU/6EbU8eWYjgu0BuBk5uvCwkQgHz1p05MGRDE= =MmbD -----END PGP SIGNATURE----- Merge tag 'docs-6.13-2' of git://git.lwn.net/linux Pull more documentation updates from Jonathan Corbet: "A few late-arriving fixes, plus two more significant changes that were *almost* ready at the beginning of the merge window: - A new document on debugging techniques from Sebastian Fricke - A clarification on MODULE_LICENSE terms meant to head off the sort of confusion that led to the recent Tuxedo Computers mess" * tag 'docs-6.13-2' of git://git.lwn.net/linux: docs: Add debugging guide for the media subsystem docs: Add debugging section to process docs/licensing: Clarify wording about "GPL" and "Proprietary" docs: core-api/gfp_mask-from-fs-io: indicate that vmalloc supports GFP_NOFS/GFP_NOIO Documentation: kernel-doc: enumerate identifier *type*s Documentation: pwrseq: Fix trivial misspellings Documentation: filesystems: update filename extensions |
||
Linus Torvalds
|
6daf0882c6 |
vfs-6.13.ecryptfs.mount.api
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcoSAAKCRCRxhvAZXjc otZlAP9h7WaciOu+BWvPJzAhCy+zfjaMdB3HffjKZ7m6Kg2YIAEApu/FftxXEyLC XCKVmuHQzb58X+bIyuS62YIBjOfA2gY= =EUT7 -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull ecryptfs mount api conversion from Christian Brauner: "Convert ecryptfs to the new mount api" * tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ecryptfs: Fix spelling mistake "validationg" -> "validating" ecryptfs: Convert ecryptfs to use the new mount API ecryptfs: Factor out mount option validation |
||
Linus Torvalds
|
1675db5c42 |
vfs-6.13.exportfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzckSQAKCRCRxhvAZXjc oroxAQDqt3NN64UCM14rrmiC2rw48SYjaYj4Nu0sRaJgScOFLgEA3B2I5Lh+bw6b fVH/uUjOsm50eYuFbqjOmEAp2DNP/QY= =14eq -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs exportfs updates from Christian Brauner: "This contains work to bring NFS connectable file handles to userspace servers. The name_to_handle_at() system call is extended to encode connectable file handles. Such file handles can be resolved to an open file with a connected path. So far userspace NFS servers couldn't make use of this functionality even though the kernel does already support it. This is achieved by introducing a new flag for name_to_handle_at(). Similarly, the open_by_handle_at() system call is tought to understand connectable file handles explicitly created via name_to_handle_at()" * tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: open_by_handle_at() support for decoding "explicit connectable" file handles fs: name_to_handle_at() support for "explicit connectable" file handles fs: prepare for "explicit connectable" file handles |
||
Linus Torvalds
|
9ad8d22f2f |
vfs-6.13.rust.pid_namespace
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcWKgAKCRCRxhvAZXjc osPAAP9bLzOPIF51IgP9mQTBlKKrpCWCMQVss5xRDseyNEfCEQD/fR9TSSnX9Suw iad9oBkxkzCjyxWIH46rvbdnc38lRwo= =aawA -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_namespace rust bindings from Christian Brauner: "This contains my Rust bindings for pid namespaces needed for various rust drivers. Here's a description of the basic C semantics and how they are mapped to Rust. The pid namespace of a task doesn't ever change once the task is alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not have an effect on the calling task's pid namespace. It will only effect the pid namespace of children created by the calling task. This invariant guarantees that after having acquired a reference to a task's pid namespace it will remain unchanged. When a task has exited and been reaped release_task() will be called. This will set the pid namespace of the task to NULL. So retrieving the pid namespace of a task that is dead will return NULL. Note, that neither holding the RCU lock nor holding a reference count to the task will prevent release_task() from being called. In order to retrieve the pid namespace of a task the task_active_pid_ns() function can be used. There are two cases to consider: (1) retrieving the pid namespace of the current task (2) retrieving the pid namespace of a non-current task From system call context retrieving the pid namespace for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the pid namespace after release_task() for current will return NULL but no codepath like that is exposed to Rust. Retrieving the pid namespace from system call context for (2) requires RCU protection. Accessing a pid namespace outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-current task means NULL can be returned as the non-current task could have already passed through release_task(). To retrieve (1) the current_pid_ns!() macro should be used. It ensures that the returned pid namespace cannot outlive the calling scope. The associated current_pid_ns() function should not be called directly as it could be abused to created an unbounded lifetime for the pid namespace. The current_pid_ns!() macro allows Rust to handle the common case of accessing current's pid namespace without RCU protection and without having to acquire a reference count. For (2) the task_get_pid_ns() method must be used. This will always acquire a reference on the pid namespace and will return an Option to force the caller to explicitly handle the case where pid namespace is None. Something that tends to be forgotten when doing the equivalent operation in C. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note that for (2) the required RCU protection around calling task_active_pid_ns() synchronizes against putting the last reference of the associated struct pid of task->thread_pid. The struct pid stored in that field is used to retrieve the pid namespace of the caller. When release_task() is called task->thread_pid will be NULLed and put_pid() on said struct pid will be delayed in free_pid() via call_rcu() allowing everyone with an RCU protected access to the struct pid acquired from task->thread_pid to finish" * tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: add PidNamespace |
||
Linus Torvalds
|
445d9f05fa |
NFSD 6.13 Release Notes
Jeff Layton contributed a scalability improvement to NFSD's NFSv4 backchannel session implementation. This improvement is intended to increase the rate at which NFSD can safely recall NFSv4 delegations from clients, to avoid the need to revoke them. Revoking requires a slow state recovery process. A wide variety of bug fixes and other incremental improvements make up the bulk of commits in this series. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmdEgLQACgkQM2qzM29m f5cwmg/9HcfG7blepU/2qNHopzSYRO5vZw1YNJQ5/Wi3bmqIea83lf8OcCY1G/aj 6K+jnenzHrwfhaA4u7N2FPXPVl8sPSMuOrJXY5zC4yE5QnIbranjcyEW5l5zlj3n ukkTYQgjUsKre3pHlvn3JmDHfUhNPEfzirsJeorP7DS3omne+OFA1LNncNP6emRu h0aEC6EJ43zUkYiz9nZYqPwIAwrUIA0WOrvVnq7vsi6gR4/Muk7nS+X/y4qFjli3 9enVskEv8sFmmOAIMK3CHJq+exEeKtKEKUuYkD23QgPt2R4+IwqS70o9IM/S1ypf APiv958BIhxm/SwUn1IjoxIckTB5EdksMxU5/4qGr1ZxprPG4/ruKO80BkrxLzW2 n1HmJ4ZNnpWPQvHN7RQ0WOsPNzL8byxJbGr1bpNgU4AGXnTFWPrAnB6juiyX4xb+ YNfgkQGDY79o7r1OJ5UUdCyx0QBSnaLNACTGm2u2FpI/ukMFPdrWIE99QbBgSe1p MgWaiPwSY+9crFfGPJeQ4t6/siRAec6L3RO9KT9Epcd2S7/Uts3NXYRdJfwZ+Qza TkPY2bm7T/WCcMhW7DN372hqgfRHPWOf4tacJ1Tob+As1d6p6qXEX2zi6piCCOLj dmTVDSVPClRXt8YigF9WqosyWv1jUzSnh9ne+eYPBpj93Ag2YBY= =wBvS -----END PGP SIGNATURE----- Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "Jeff Layton contributed a scalability improvement to NFSD's NFSv4 backchannel session implementation. This improvement is intended to increase the rate at which NFSD can safely recall NFSv4 delegations from clients, to avoid the need to revoke them. Revoking requires a slow state recovery process. A wide variety of bug fixes and other incremental improvements make up the bulk of commits in this series. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits) nfsd: allow for up to 32 callback session slots nfs_common: must not hold RCU while calling nfsd_file_put_local nfsd: get rid of include ../internal.h nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur NFSD: Add nfsd4_copy time-to-live NFSD: Add a laundromat reaper for async copy state NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD NFSD: Free async copy information in nfsd4_cb_offload_release() NFSD: Fix nfsd4_shutdown_copy() NFSD: Add a tracepoint to record canceled async COPY operations nfsd: make nfsd4_session->se_flags a bool nfsd: remove nfsd4_session->se_bchannel nfsd: make use of warning provided by refcount_t nfsd: Don't fail OP_SETCLIENTID when there are too many clients. svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init() xdrgen: Remove program_stat_to_errno() call sites xdrgen: Update the files included in client-side source code xdrgen: Remove check for "nfs_ok" in C templates xdrgen: Remove tracepoint call site ... |
||
Linus Torvalds
|
44b4d13b70 |
f2fs-for-6.13-rc1
This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancement: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Bug fix: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9 O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/ Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6 Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn 2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE 6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU= =LOgM -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ... |
||
Jens Axboe
|
49c5c63d48 |
io_uring: fix task_work cap overshooting
A previous commit fixed task_work overrunning by a lot more than what
the user asked for, by adding a retry list. However, it didn't cap the
overall count, hence for multiple task_work runs inside the same wait
loop, it'd still overshoot the target by potentially a large amount.
Cap it generally inside the wait path. Note that this will still
overshoot the default limit of 20, but should overshoot by no more than
limit-1 in addition to the limit. That still provides a ceiling over how
much task_work will be run, rather than still having gaps where it was
uncapped essentially.
Fixes:
|
||
Linus Torvalds
|
fb527fc1f3 |
fuse update for 6.13
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ0Rb/wAKCRDh3BK/laaZ PK80AQDAUgA6S5SSrbJxwRFNOhbwtZxZqJ8fomJR5xuWIEQ9pwEAkpFqhBhBW0y1 0YaREow2aDANQQtSUrfPtgva1ZXFwQU= =Cyx5 -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ... |
||
Linus Torvalds
|
ff2a7a064a |
gfs2 changes
- Fix the code that cleans up left-over unlinked files. Various fixes and minor improvements in deleting files cached or held open remotely. - Simplify the use of dlm's DLM_LKF_QUECVT flag. - A few other minor cleanups. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmdESTYUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpdyA/9EWDxx2Y6JeVeAC+J138pSOYqHtwn wLtMeTdwbycW6M8V5kyW3vCh+lLLS6s0dZuwn2Xv8jx5QytrD4c51Wj3bRYjuidM Zt0L+wohOQISvL1+AViYuIns2pzQQvNZUC2aAVr9J3KGhdIFonbU6PdLOeEN0cZe R08Nseux9oJ/geaKJ3jh/ReX2VZehp2WAaQ4I+PoQkkNflBULPkyysxjkv9sc8tW 9hN1sK7dk/U5OLKr4H6SSi1Uu6N6Wek0x2zo4NxTRqyfBiRXYtZYnXPkdftuB+6N M7N2dAIuhnXiAhQdo7OOe9hZZVXTFhmeQK1tyTsw/FZkQJNMX+bdBn4g7NV94drz CpTliqm+Z5dTnkSdS4cIozkQZ7zID1eibX8uF7QsnozBWm7bjbW6fi7a+z+u5ykN hsWanoMKhH1524oNKaiSjIxT0b1oda114DJQVpdU68HjkyHf5l0GXUTcVpg0dxs3 peXhpZ+CjHbaTMXl5xqGOucD+ACPhMOGXPAX1lF2bIcfbLqgbTVn0fMMUYWeb8j1 medJtQ0itwpiCHZTl62xUOLEOCqCiS5J1/TjrwNuJ1HLJ5JP1UePNl5kjT9nDfsA KXB31sKFfPX99rPVYJgjLXPgRLwslcniSHOg9p+bWwq7ZI1PSxPSauUgtKZpSe6A E3YfnIxjPxRMKks= =BP41 -----END PGP SIGNATURE----- Merge tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the code that cleans up left-over unlinked files. Various fixes and minor improvements in deleting files cached or held open remotely. - Simplify the use of dlm's DLM_LKF_QUECVT flag. - A few other minor cleanups. * tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits) gfs2: Prevent inode creation race gfs2: Only defer deletes when we have an iopen glock gfs2: Simplify DLM_LKF_QUECVT use gfs2: gfs2_evict_inode clarification gfs2: Make gfs2_inode_refresh static gfs2: Use get_random_u32 in gfs2_orlov_skip gfs2: Randomize GLF_VERIFY_DELETE work delay gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict gfs2: Update to the evict / remote delete documentation gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode gfs2: Clean up delete work processing gfs2: Minor delete_work_func cleanup gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock gfs2: Rename dinode_demise to evict_behavior gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE gfs2: Faster gfs2_upgrade_iopen_glock wakeups KMSAN: uninit-value in inode_go_dump (5) gfs2: Fix unlinked inode cleanup gfs2: Allow immediate GLF_VERIFY_DELETE work gfs2: Initialize gl_no_formal_ino earlier ... |
||
Palmer Dabbelt
|
8d4f1e05ff
|
RISC-V: Remove unnecessary include from compat.h
Without this I get a bunch of build errors like In file included from ./include/linux/sched/task_stack.h:12, from ./arch/riscv/include/asm/compat.h:12, from ./arch/riscv/include/asm/pgtable.h:115, from ./include/linux/pgtable.h:6, from ./include/linux/mm.h:30, from arch/riscv/kernel/asm-offsets.c:8: ./include/linux/kasan.h:50:37: error: ‘MAX_PTRS_PER_PTE’ undeclared here (not in a function); did you mean ‘PTRS_PER_PTE’? 50 | extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PTE ./include/linux/kasan.h:51:8: error: unknown type name ‘pmd_t’; did you mean ‘pgd_t’? 51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; | ^~~~~ | pgd_t ./include/linux/kasan.h:51:37: error: ‘MAX_PTRS_PER_PMD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD ./include/linux/kasan.h:52:8: error: unknown type name ‘pud_t’; did you mean ‘pgd_t’? 52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; | ^~~~~ | pgd_t ./include/linux/kasan.h:52:37: error: ‘MAX_PTRS_PER_PUD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD ./include/linux/kasan.h:53:8: error: unknown type name ‘p4d_t’; did you mean ‘pgd_t’? 53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; | ^~~~~ | pgd_t ./include/linux/kasan.h:53:37: error: ‘MAX_PTRS_PER_P4D’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD Link: https://lore.kernel.org/r/20241126143250.29708-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Uwe Kleine-König
|
cc47268cb4 |
irqchip: Switch back to struct platform_driver::remove()
After commit
|
||
Zhou Wang
|
f82e62d470 |
irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
When enabling GICv4.1 in hip09, VMAPP fails to clear some caches during the unmap operation, which can causes vSGIs to be lost. To fix the issue, invalidate the related vPE cache through GICR_INVALLR after VMOVP. Suggested-by: Marc Zyngier <maz@kernel.org> Co-developed-by: Nianyao Tang <tangnianyao@huawei.com> Signed-off-by: Nianyao Tang <tangnianyao@huawei.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> |
||
Russell King (Oracle)
|
12aaf67584 |
irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
Commit |
||
Christian Brauner
|
cf87766dd6
|
Merge branch 'ovl.fixes'
Bring in an overlayfs fix for v6.13-rc1 that fixes a bug introduced by the overlayfs changes merged for v6.13. Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Amir Goldstein
|
2957fa4931
|
fs/backing_file: fix wrong argument in callback
Commit |
||
Edward Adam Davis
|
ed95885549 |
Bluetooth: SCO: remove the redundant sco_conn_put
When adding conn, it is necessary to increase and retain the conn reference
count at the same time.
Another problem was fixed along the way, conn_put is missing when hcon is NULL
in the timeout routine.
Fixes:
|
||
Luiz Augusto von Dentz
|
a66dfaf18f |
Bluetooth: MGMT: Fix possible deadlocks
This fixes possible deadlocks like the following caused by
hci_cmd_sync_dequeue causing the destroy function to run:
INFO: task kworker/u19:0:143 blocked for more than 120 seconds.
Tainted: G W O 6.8.0-2024-03-19-intel-next-iLS-24ww14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u19:0 state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000
Workqueue: hci0 hci_cmd_sync_work [bluetooth]
Call Trace:
<TASK>
__schedule+0x374/0xaf0
schedule+0x3c/0xf0
schedule_preempt_disabled+0x1c/0x30
__mutex_lock.constprop.0+0x3ef/0x7a0
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x3c/0x50
mgmt_set_connectable_complete+0xa4/0x150 [bluetooth]
? kfree+0x211/0x2a0
hci_cmd_sync_dequeue+0xae/0x130 [bluetooth]
? __pfx_cmd_complete_rsp+0x10/0x10 [bluetooth]
cmd_complete_rsp+0x26/0x80 [bluetooth]
mgmt_pending_foreach+0x4d/0x70 [bluetooth]
__mgmt_power_off+0x8d/0x180 [bluetooth]
? _raw_spin_unlock_irq+0x23/0x40
hci_dev_close_sync+0x445/0x5b0 [bluetooth]
hci_set_powered_sync+0x149/0x250 [bluetooth]
set_powered_sync+0x24/0x60 [bluetooth]
hci_cmd_sync_work+0x90/0x150 [bluetooth]
process_one_work+0x13e/0x300
worker_thread+0x2f7/0x420
? __pfx_worker_thread+0x10/0x10
kthread+0x107/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x3d/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Tested-by: Kiran K <kiran.k@intel.com>
Fixes:
|
||
Luiz Augusto von Dentz
|
0b88294066 |
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
This fixes the following crash:
==================================================================
BUG: KASAN: slab-use-after-free in set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
Read of size 8 at addr ffff888029b4dd18 by task kworker/u9:0/54
CPU: 1 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-01155-gf723224742fc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
q kasan_report+0x143/0x180 mm/kasan/report.c:601
set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:328
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
worker_thread+0x86d/0xd10 kernel/workqueue.c:3389
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 5247:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__kmalloc_cache_noprof+0x19c/0x2c0 mm/slub.c:4193
kmalloc_noprof include/linux/slab.h:681 [inline]
kzalloc_noprof include/linux/slab.h:807 [inline]
mgmt_pending_new+0x65/0x250 net/bluetooth/mgmt_util.c:269
mgmt_pending_add+0x36/0x120 net/bluetooth/mgmt_util.c:296
set_powered+0x3cd/0x5e0 net/bluetooth/mgmt.c:1394
hci_mgmt_cmd+0xc47/0x11d0 net/bluetooth/hci_sock.c:1712
hci_sock_sendmsg+0x7b8/0x11c0 net/bluetooth/hci_sock.c:1832
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
sock_write_iter+0x2dd/0x400 net/socket.c:1160
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa72/0xc90 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5246:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2256 [inline]
slab_free mm/slub.c:4477 [inline]
kfree+0x149/0x360 mm/slub.c:4598
settings_rsp+0x2bc/0x390 net/bluetooth/mgmt.c:1443
mgmt_pending_foreach+0xd1/0x130 net/bluetooth/mgmt_util.c:259
__mgmt_power_off+0x112/0x420 net/bluetooth/mgmt.c:9455
hci_dev_close_sync+0x665/0x11a0 net/bluetooth/hci_sync.c:5191
hci_dev_do_close net/bluetooth/hci_core.c:483 [inline]
hci_dev_close+0x112/0x210 net/bluetooth/hci_core.c:508
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83gv
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-by: syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com
Tested-by: syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=03d6270b6425df1605bf
Fixes:
|
||
Pavel Begunkov
|
0c0a4eae26 |
io_uring: check for overflows in io_pin_pages
WARNING: CPU: 0 PID: 5834 at io_uring/memmap.c:144 io_pin_pages+0x149/0x180 io_uring/memmap.c:144 CPU: 0 UID: 0 PID: 5834 Comm: syz-executor825 Not tainted 6.12.0-next-20241118-syzkaller #0 Call Trace: <TASK> __io_uaddr_map+0xfb/0x2d0 io_uring/memmap.c:183 io_rings_map io_uring/io_uring.c:2611 [inline] io_allocate_scq_urings+0x1c0/0x650 io_uring/io_uring.c:3470 io_uring_create+0x5b5/0xc00 io_uring/io_uring.c:3692 io_uring_setup io_uring/io_uring.c:3781 [inline] ... </TASK> io_pin_pages()'s uaddr parameter came directly from the user and can be garbage. Don't just add size to it as it can overflow. Cc: stable@vger.kernel.org Reported-by: syzbot+2159cbb522b02847c053@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1b7520ddb168e1d537d64be47414a0629d0d8f8f.1732581026.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Christoph Hellwig
|
1b0cab327e |
mq-deadline: don't call req_get_ioprio from the I/O completion handler
req_get_ioprio looks at req->bio to find the I/O priority, which is not
set when completing bios that the driver fully iterated through.
Stash away the dd_per_prio in the elevator private data instead of looking
it up again to optimize the code a bit while fixing the regression from
removing the per-request ioprio value.
Fixes:
|
||
Damien Le Moal
|
0b83c86b44 |
block: Prevent potential deadlock in blk_revalidate_disk_zones()
The function blk_revalidate_disk_zones() calls the function
disk_update_zone_resources() after freezing the device queue. In turn,
disk_update_zone_resources() calls queue_limits_start_update() which
takes a queue limits mutex lock, resulting in the ordering:
q->q_usage_counter check -> q->limits_lock. However, the usual ordering
is to always take a queue limit lock before freezing the queue to commit
the limits updates, e.g., the code pattern:
lim = queue_limits_start_update(q);
...
blk_mq_freeze_queue(q);
ret = queue_limits_commit_update(q, &lim);
blk_mq_unfreeze_queue(q);
Thus, blk_revalidate_disk_zones() introduces a potential circular
locking dependency deadlock that lockdep sometimes catches with the
splat:
[ 51.934109] ======================================================
[ 51.935916] WARNING: possible circular locking dependency detected
[ 51.937561] 6.12.0+ #2107 Not tainted
[ 51.938648] ------------------------------------------------------
[ 51.940351] kworker/u16:4/157 is trying to acquire lock:
[ 51.941805] ffff9fff0aa0bea8 (&q->limits_lock){+.+.}-{4:4}, at: disk_update_zone_resources+0x86/0x170
[ 51.944314]
but task is already holding lock:
[ 51.945688] ffff9fff0aa0b890 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: blk_revalidate_disk_zones+0x15f/0x340
[ 51.948527]
which lock already depends on the new lock.
[ 51.951296]
the existing dependency chain (in reverse order) is:
[ 51.953708]
-> #1 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
[ 51.956131] blk_queue_enter+0x1c9/0x1e0
[ 51.957290] blk_mq_alloc_request+0x187/0x2a0
[ 51.958365] scsi_execute_cmd+0x78/0x490 [scsi_mod]
[ 51.959514] read_capacity_16+0x111/0x410 [sd_mod]
[ 51.960693] sd_revalidate_disk.isra.0+0x872/0x3240 [sd_mod]
[ 51.962004] sd_probe+0x2d7/0x520 [sd_mod]
[ 51.962993] really_probe+0xd5/0x330
[ 51.963898] __driver_probe_device+0x78/0x110
[ 51.964925] driver_probe_device+0x1f/0xa0
[ 51.965916] __driver_attach_async_helper+0x60/0xe0
[ 51.967017] async_run_entry_fn+0x2e/0x140
[ 51.968004] process_one_work+0x21f/0x5a0
[ 51.968987] worker_thread+0x1dc/0x3c0
[ 51.969868] kthread+0xe0/0x110
[ 51.970377] ret_from_fork+0x31/0x50
[ 51.970983] ret_from_fork_asm+0x11/0x20
[ 51.971587]
-> #0 (&q->limits_lock){+.+.}-{4:4}:
[ 51.972479] __lock_acquire+0x1337/0x2130
[ 51.973133] lock_acquire+0xc5/0x2d0
[ 51.973691] __mutex_lock+0xda/0xcf0
[ 51.974300] disk_update_zone_resources+0x86/0x170
[ 51.975032] blk_revalidate_disk_zones+0x16c/0x340
[ 51.975740] sd_zbc_revalidate_zones+0x73/0x160 [sd_mod]
[ 51.976524] sd_revalidate_disk.isra.0+0x465/0x3240 [sd_mod]
[ 51.977824] sd_probe+0x2d7/0x520 [sd_mod]
[ 51.978917] really_probe+0xd5/0x330
[ 51.979915] __driver_probe_device+0x78/0x110
[ 51.981047] driver_probe_device+0x1f/0xa0
[ 51.982143] __driver_attach_async_helper+0x60/0xe0
[ 51.983282] async_run_entry_fn+0x2e/0x140
[ 51.984319] process_one_work+0x21f/0x5a0
[ 51.985873] worker_thread+0x1dc/0x3c0
[ 51.987289] kthread+0xe0/0x110
[ 51.988546] ret_from_fork+0x31/0x50
[ 51.989926] ret_from_fork_asm+0x11/0x20
[ 51.991376]
other info that might help us debug this:
[ 51.994127] Possible unsafe locking scenario:
[ 51.995651] CPU0 CPU1
[ 51.996694] ---- ----
[ 51.997716] lock(&q->q_usage_counter(queue)#3);
[ 51.998817] lock(&q->limits_lock);
[ 52.000043] lock(&q->q_usage_counter(queue)#3);
[ 52.001638] lock(&q->limits_lock);
[ 52.002485]
*** DEADLOCK ***
Prevent this issue by moving the calls to blk_mq_freeze_queue() and
blk_mq_unfreeze_queue() around the call to queue_limits_commit_update()
in disk_update_zone_resources(). In case of revalidation failure, the
call to disk_free_zone_resources() in blk_revalidate_disk_zones()
is still done with the queue frozen as before.
Fixes:
|
||
Takashi Iwai
|
db2eee6143 |
ALSA: hda: Show the codec quirk info at probing
Lots of HD-audio devices need the device-specific quirk, and it's often helpful to know which quirk is applied. But currently the driver shows it only as a debug output, hence we'd have to enable the debug option at each time we want to see it (and the output becomes too messy due to other debug messages). This patch changes the messages to the info level, so that they appear at probing normally. Link: https://patch.msgid.link/20241126141010.12567-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Paolo Abeni
|
5dfd7d9400 |
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says: ==================== bnxt_en: Bug fixes This patchset fixes several things: 1. AER recovery for RoCE when NIC interface is down. 2. Set ethtool backplane link modes correctly. 3. Update RSS ring ID during RX queue restart. 4. Crash with XDP and MTU change. 5. PCIe completion timeout when reading PHC after shutdown. ==================== Link: https://patch.msgid.link/20241122224547.984808-1-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Michael Chan
|
3661c05c54 |
bnxt_en: Unregister PTP during PCI shutdown and suspend
If we go through the PCI shutdown or suspend path, we shutdown the
NIC but PTP remains registered. If the kernel continues to run for
a little bit, the periodic PTP .do_aux_work() function may be called
and it will read the PHC from the BAR register. Since the device
has already been disabled, it will cause a PCIe completion timeout.
Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend
handlers. bnxt_ptp_clear() will unregister from PTP and
.do_aux_work() will be canceled.
In bnxt_resume(), we need to re-initialize PTP.
Fixes:
|
||
Michael Chan
|
1e9614cd95 |
bnxt_en: Refactor bnxt_ptp_init()
Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init(). Store it in bp->ptp_cfg so that the caller doesn't need to know what the value should be. In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume() and this will make it easier. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Shravya KN
|
3051a77a09 |
bnxt_en: Fix receive ring space parameters when XDP is active
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler. This is done in bnxt_set_rx_skb_mode().
If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode(). This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:
BUG: kernel NULL pointer dereference, address: 00000000000003c0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1
Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021
RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en]
Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f
RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202
RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff
RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380
RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf
R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980
R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990
FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
__bnxt_poll_work+0x1c2/0x3e0 [bnxt_en]
To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.
Fixes:
|