One of the paths quota writeback is called from is:
freeze_super()
sync_filesystem()
ext4_sync_fs()
dquot_writeback_dquots()
Since we currently don't always flush the quota_release_work queue in
this path, we can end up with the following race:
1. dquot are added to releasing_dquots list during regular operations.
2. FS Freeze starts, however, this does not flush the quota_release_work queue.
3. Freeze completes.
4. Kernel eventually tries to flush the workqueue while FS is frozen which
hits a WARN_ON since transaction gets started during frozen state:
ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
__ext4_journal_start_sb+0x64/0x1c0 [ext4]
ext4_release_dquot+0x90/0x1d0 [ext4]
quota_release_workfn+0x43c/0x4d0
Which is the following line:
WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
Which ultimately results in generic/390 failing due to dmesg
noise. This was detected on powerpc machine 15 cores.
To avoid this, make sure to flush the workqueue during
dquot_writeback_dquots() so we dont have any pending workitems after
freeze.
Reported-by: Disha Goel <disgoel@linux.ibm.com>
CC: stable@vger.kernel.org
Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241121123855.645335-2-ojaswin@linux.ibm.com
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcoSAAKCRCRxhvAZXjc
otZlAP9h7WaciOu+BWvPJzAhCy+zfjaMdB3HffjKZ7m6Kg2YIAEApu/FftxXEyLC
XCKVmuHQzb58X+bIyuS62YIBjOfA2gY=
=EUT7
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull ecryptfs mount api conversion from Christian Brauner:
"Convert ecryptfs to the new mount api"
* tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ecryptfs: Fix spelling mistake "validationg" -> "validating"
ecryptfs: Convert ecryptfs to use the new mount API
ecryptfs: Factor out mount option validation
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzckSQAKCRCRxhvAZXjc
oroxAQDqt3NN64UCM14rrmiC2rw48SYjaYj4Nu0sRaJgScOFLgEA3B2I5Lh+bw6b
fVH/uUjOsm50eYuFbqjOmEAp2DNP/QY=
=14eq
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs exportfs updates from Christian Brauner:
"This contains work to bring NFS connectable file handles to userspace
servers.
The name_to_handle_at() system call is extended to encode connectable
file handles. Such file handles can be resolved to an open file with a
connected path. So far userspace NFS servers couldn't make use of this
functionality even though the kernel does already support it. This is
achieved by introducing a new flag for name_to_handle_at().
Similarly, the open_by_handle_at() system call is tought to understand
connectable file handles explicitly created via name_to_handle_at()"
* tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: open_by_handle_at() support for decoding "explicit connectable" file handles
fs: name_to_handle_at() support for "explicit connectable" file handles
fs: prepare for "explicit connectable" file handles
Jeff Layton contributed a scalability improvement to NFSD's NFSv4
backchannel session implementation. This improvement is intended to
increase the rate at which NFSD can safely recall NFSv4 delegations
from clients, to avoid the need to revoke them. Revoking requires
a slow state recovery process.
A wide variety of bug fixes and other incremental improvements make
up the bulk of commits in this series. As always I am grateful to
the NFSD contributors, reviewers, testers, and bug reporters who
participated during this cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmdEgLQACgkQM2qzM29m
f5cwmg/9HcfG7blepU/2qNHopzSYRO5vZw1YNJQ5/Wi3bmqIea83lf8OcCY1G/aj
6K+jnenzHrwfhaA4u7N2FPXPVl8sPSMuOrJXY5zC4yE5QnIbranjcyEW5l5zlj3n
ukkTYQgjUsKre3pHlvn3JmDHfUhNPEfzirsJeorP7DS3omne+OFA1LNncNP6emRu
h0aEC6EJ43zUkYiz9nZYqPwIAwrUIA0WOrvVnq7vsi6gR4/Muk7nS+X/y4qFjli3
9enVskEv8sFmmOAIMK3CHJq+exEeKtKEKUuYkD23QgPt2R4+IwqS70o9IM/S1ypf
APiv958BIhxm/SwUn1IjoxIckTB5EdksMxU5/4qGr1ZxprPG4/ruKO80BkrxLzW2
n1HmJ4ZNnpWPQvHN7RQ0WOsPNzL8byxJbGr1bpNgU4AGXnTFWPrAnB6juiyX4xb+
YNfgkQGDY79o7r1OJ5UUdCyx0QBSnaLNACTGm2u2FpI/ukMFPdrWIE99QbBgSe1p
MgWaiPwSY+9crFfGPJeQ4t6/siRAec6L3RO9KT9Epcd2S7/Uts3NXYRdJfwZ+Qza
TkPY2bm7T/WCcMhW7DN372hqgfRHPWOf4tacJ1Tob+As1d6p6qXEX2zi6piCCOLj
dmTVDSVPClRXt8YigF9WqosyWv1jUzSnh9ne+eYPBpj93Ag2YBY=
=wBvS
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Jeff Layton contributed a scalability improvement to NFSD's NFSv4
backchannel session implementation. This improvement is intended to
increase the rate at which NFSD can safely recall NFSv4 delegations
from clients, to avoid the need to revoke them. Revoking requires a
slow state recovery process.
A wide variety of bug fixes and other incremental improvements make up
the bulk of commits in this series. As always I am grateful to the
NFSD contributors, reviewers, testers, and bug reporters who
participated during this cycle"
* tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits)
nfsd: allow for up to 32 callback session slots
nfs_common: must not hold RCU while calling nfsd_file_put_local
nfsd: get rid of include ../internal.h
nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur
NFSD: Add nfsd4_copy time-to-live
NFSD: Add a laundromat reaper for async copy state
NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations
NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD
NFSD: Free async copy information in nfsd4_cb_offload_release()
NFSD: Fix nfsd4_shutdown_copy()
NFSD: Add a tracepoint to record canceled async COPY operations
nfsd: make nfsd4_session->se_flags a bool
nfsd: remove nfsd4_session->se_bchannel
nfsd: make use of warning provided by refcount_t
nfsd: Don't fail OP_SETCLIENTID when there are too many clients.
svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
xdrgen: Remove program_stat_to_errno() call sites
xdrgen: Update the files included in client-side source code
xdrgen: Remove check for "nfs_ok" in C templates
xdrgen: Remove tracepoint call site
...
This series introduces a device aliasing feature where user can carve out
partitions but reclaim the space back by deleting aliased file in root dir.
In addition to that, there're numerous minor bug fixes in zoned device support,
checkpoint=disable, extent cache management, fiemap, and lazytime mount option.
The full list of noticeable changes can be found below.
Enhancement:
- introduce device aliasing file
- add stats in debugfs to show multiple devices
- add a sysfs node to limit max read extent count per-inode
- modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
- decrease spare area for pinned files for zoned devices
Bug fix:
- Revert "f2fs: remove unreachable lazytime mount option parsing"
- adjust unusable cap before checkpoint=disable mode
- fix to drop all discards after creating snapshot on lvm device
- fix to shrink read extent node in batches
- fix changing cursegs if recovery fails on zoned device
- fix to adjust appropriate length for fiemap
- fix fiemap failure issue when page size is 16KB
- fix to avoid forcing direct write to use buffered IO on inline_data inode
- fix to map blocks correctly for direct write
- fix to account dirty data in __get_secs_required()
- fix null-ptr-deref in f2fs_submit_page_bio()
- f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI
UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9
O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG
QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe
a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/
Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6
Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC
j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn
2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE
6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt
J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V
MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU=
=LOgM
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This series introduces a device aliasing feature where user can carve
out partitions but reclaim the space back by deleting aliased file in
root dir.
In addition to that, there're numerous minor bug fixes in zoned device
support, checkpoint=disable, extent cache management, fiemap, and
lazytime mount option. The full list of noticeable changes can be
found below.
Enhancements:
- introduce device aliasing file
- add stats in debugfs to show multiple devices
- add a sysfs node to limit max read extent count per-inode
- modify f2fs_is_checkpoint_ready logic to allow more data to be
written with the CP disable
- decrease spare area for pinned files for zoned devices
Fixes:
- Revert "f2fs: remove unreachable lazytime mount option parsing"
- adjust unusable cap before checkpoint=disable mode
- fix to drop all discards after creating snapshot on lvm device
- fix to shrink read extent node in batches
- fix changing cursegs if recovery fails on zoned device
- fix to adjust appropriate length for fiemap
- fix fiemap failure issue when page size is 16KB
- fix to avoid forcing direct write to use buffered IO on inline_data
inode
- fix to map blocks correctly for direct write
- fix to account dirty data in __get_secs_required()
- fix null-ptr-deref in f2fs_submit_page_bio()
- fix inconsistent update of i_blocks in release_compress_blocks and
reserve_compress_blocks"
* tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: fix to drop all discards after creating snapshot on lvm device
f2fs: add a sysfs node to limit max read extent count per-inode
f2fs: fix to shrink read extent node in batches
f2fs: print message if fscorrupted was found in f2fs_new_node_page()
f2fs: clear SBI_POR_DOING before initing inmem curseg
f2fs: fix changing cursegs if recovery fails on zoned device
f2fs: adjust unusable cap before checkpoint=disable mode
f2fs: fix to requery extent which cross boundary of inquiry
f2fs: fix to adjust appropriate length for fiemap
f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
f2fs: replace deprecated strcpy with strscpy
Revert "f2fs: remove unreachable lazytime mount option parsing"
f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
f2fs: fix to map blocks correctly for direct write
f2fs: fix race in concurrent f2fs_stop_gc_thread
f2fs: fix fiemap failure issue when page size is 16KB
f2fs: remove redundant atomic file check in defragment
f2fs: fix to convert log type to segment data type correctly
f2fs: clean up the unused variable additional_reserved_segments
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ0Rb/wAKCRDh3BK/laaZ
PK80AQDAUgA6S5SSrbJxwRFNOhbwtZxZqJ8fomJR5xuWIEQ9pwEAkpFqhBhBW0y1
0YaREow2aDANQQtSUrfPtgva1ZXFwQU=
=Cyx5
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add page -> folio conversions (Joanne Koong, Josef Bacik)
- Allow max size of fuse requests to be configurable with a sysctl
(Joanne Koong)
- Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun)
- Fix large kernel reads (like a module load) in virtio_fs (Hou Tao)
- Fix attribute inconsistency in case readdirplus (and plain lookup in
corner cases) is racing with inode eviction (Zhang Tianci)
- Fix a WARN_ON triggered by virtio_fs (Asahi Lina)
* tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits)
virtiofs: dax: remove ->writepages() callback
fuse: check attributes staleness on fuse_iget()
fuse: remove pages for requests and exclusively use folios
fuse: convert direct io to use folios
mm/writeback: add folio_mark_dirty_lock()
fuse: convert writebacks to use folios
fuse: convert retrieves to use folios
fuse: convert ioctls to use folios
fuse: convert writes (non-writeback) to use folios
fuse: convert reads to use folios
fuse: convert readdir to use folios
fuse: convert readlink to use folios
fuse: convert cuse to use folios
fuse: add support in virtio for requests using folios
fuse: support folios in struct fuse_args_pages and fuse_copy_pages()
fuse: convert fuse_notify_store to use folios
fuse: convert fuse_retrieve to use folios
fuse: use the folio based vmstat helpers
fuse: convert fuse_writepage_need_send to take a folio
fuse: convert fuse_do_readpage to use folios
...
- Fix the code that cleans up left-over unlinked files. Various fixes
and minor improvements in deleting files cached or held open remotely.
- Simplify the use of dlm's DLM_LKF_QUECVT flag.
- A few other minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmdESTYUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpdyA/9EWDxx2Y6JeVeAC+J138pSOYqHtwn
wLtMeTdwbycW6M8V5kyW3vCh+lLLS6s0dZuwn2Xv8jx5QytrD4c51Wj3bRYjuidM
Zt0L+wohOQISvL1+AViYuIns2pzQQvNZUC2aAVr9J3KGhdIFonbU6PdLOeEN0cZe
R08Nseux9oJ/geaKJ3jh/ReX2VZehp2WAaQ4I+PoQkkNflBULPkyysxjkv9sc8tW
9hN1sK7dk/U5OLKr4H6SSi1Uu6N6Wek0x2zo4NxTRqyfBiRXYtZYnXPkdftuB+6N
M7N2dAIuhnXiAhQdo7OOe9hZZVXTFhmeQK1tyTsw/FZkQJNMX+bdBn4g7NV94drz
CpTliqm+Z5dTnkSdS4cIozkQZ7zID1eibX8uF7QsnozBWm7bjbW6fi7a+z+u5ykN
hsWanoMKhH1524oNKaiSjIxT0b1oda114DJQVpdU68HjkyHf5l0GXUTcVpg0dxs3
peXhpZ+CjHbaTMXl5xqGOucD+ACPhMOGXPAX1lF2bIcfbLqgbTVn0fMMUYWeb8j1
medJtQ0itwpiCHZTl62xUOLEOCqCiS5J1/TjrwNuJ1HLJ5JP1UePNl5kjT9nDfsA
KXB31sKFfPX99rPVYJgjLXPgRLwslcniSHOg9p+bWwq7ZI1PSxPSauUgtKZpSe6A
E3YfnIxjPxRMKks=
=BP41
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix the code that cleans up left-over unlinked files.
Various fixes and minor improvements in deleting files cached or held
open remotely.
- Simplify the use of dlm's DLM_LKF_QUECVT flag.
- A few other minor cleanups.
* tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits)
gfs2: Prevent inode creation race
gfs2: Only defer deletes when we have an iopen glock
gfs2: Simplify DLM_LKF_QUECVT use
gfs2: gfs2_evict_inode clarification
gfs2: Make gfs2_inode_refresh static
gfs2: Use get_random_u32 in gfs2_orlov_skip
gfs2: Randomize GLF_VERIFY_DELETE work delay
gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict
gfs2: Update to the evict / remote delete documentation
gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode
gfs2: Clean up delete work processing
gfs2: Minor delete_work_func cleanup
gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock
gfs2: Rename dinode_demise to evict_behavior
gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE
gfs2: Faster gfs2_upgrade_iopen_glock wakeups
KMSAN: uninit-value in inode_go_dump (5)
gfs2: Fix unlinked inode cleanup
gfs2: Allow immediate GLF_VERIFY_DELETE work
gfs2: Initialize gl_no_formal_ino earlier
...
Bring in an overlayfs fix for v6.13-rc1 that fixes a bug introduced by
the overlayfs changes merged for v6.13.
Signed-off-by: Christian Brauner <brauner@kernel.org>
A race condition exists between SMB request handling in
`ksmbd_conn_handler_loop()` and the freeing of `ksmbd_conn` in the
workqueue handler `handle_ksmbd_work()`. This leads to a UAF.
- KASAN: slab-use-after-free Read in handle_ksmbd_work
- KASAN: slab-use-after-free in rtlock_slowlock_locked
This race condition arises as follows:
- `ksmbd_conn_handler_loop()` waits for `conn->r_count` to reach zero:
`wait_event(conn->r_count_q, atomic_read(&conn->r_count) == 0);`
- Meanwhile, `handle_ksmbd_work()` decrements `conn->r_count` using
`atomic_dec_return(&conn->r_count)`, and if it reaches zero, calls
`ksmbd_conn_free()`, which frees `conn`.
- However, after `handle_ksmbd_work()` decrements `conn->r_count`,
it may still access `conn->r_count_q` in the following line:
`waitqueue_active(&conn->r_count_q)` or `wake_up(&conn->r_count_q)`
This results in a UAF, as `conn` has already been freed.
The discovery of this UAF can be referenced in the following PR for
syzkaller's support for SMB requests.
Link: https://github.com/google/syzkaller/pull/5524
Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct")
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org # v6.6.55+, v6.10.14+, v6.11.3+
Cc: syzkaller@googlegroups.com
Signed-off-by: Yunseong Kim <yskelg@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
We need to know how many pending requests are left at the end of server
shutdown. That means we need to know how long the server will wait
to process pending requests in case of a server shutdown.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add netdev-up/down event debug print to find what netdev is connected or
disconnected.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add debug prints to know what smb2 requests were received.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add debug print to know if netdevice is RDMA-capable network adapter.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
use msleep instaed of schedule_timeout_interruptible()
to guarantee the task delays as expected.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Prefer to report ENOMEM rather than incur the oom for allocations in
ksmbd. __GFP_NORETRY could not achieve that, It would fail the allocations
just too easily. __GFP_RETRY_MAYFAIL will keep retrying the allocation
until there is no more progress and fail the allocation instead go OOM
and let the caller to deal with it.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
performs some cleanups in the resource management code.
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of task_struct.comm[].
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest.
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code.
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification.
- The series "add detect count for hung tasks" from Lance Yang adds more
userspace visibility into the hung-task detector's activity.
- Apart from that, singelton patches in many places - please see the
individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA
jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m
Ii8igH0UBibIgva7MrCyJedDI1O23AA=
=8BIU
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of
task_struct.comm[]
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification
- The series "add detect count for hung tasks" from Lance Yang adds
more userspace visibility into the hung-task detector's activity
- Apart from that, singelton patches in many places - please see the
individual changelogs for details
* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
gdb: lx-symbols: do not error out on monolithic build
kernel/reboot: replace sprintf() with sysfs_emit()
lib: util_macros_kunit: add kunit test for util_macros.h
util_macros.h: fix/rework find_closest() macros
Improve consistency of '#error' directive messages
ocfs2: fix uninitialized value in ocfs2_file_read_iter()
hung_task: add docs for hung_task_detect_count
hung_task: add detect count for hung tasks
dma-buf: use atomic64_inc_return() in dma_buf_getfile()
fs/proc/kcore.c: fix coccinelle reported ERROR instances
resource: avoid unnecessary resource tree walking in __region_intersects()
ocfs2: remove unused errmsg function and table
ocfs2: cluster: fix a typo
lib/scatterlist: use sg_phys() helper
checkpatch: always parse orig_commit in fixes tag
nilfs2: convert metadata aops from writepage to writepages
nilfs2: convert nilfs_recovery_copy_block() to take a folio
nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
nilfs2: remove nilfs_writepage
nilfs2: convert checkpoint file to be folio-based
...
SMB1 NT_TRANSACT_IOCTL/FSCTL_GET_REPARSE_POINT even in non-UNICODE mode
returns reparse buffer in UNICODE/UTF-16 format.
This is because FSCTL_GET_REPARSE_POINT is NT-based IOCTL which does not
distinguish between 8-bit non-UNICODE and 16-bit UNICODE modes and its path
buffers are always encoded in UTF-16.
This change fixes reading of native symlinks in SMB1 when UNICODE session
is not active.
Fixes: ed3e0a149b58 ("smb: client: implement ->query_reparse_point() for SMB1")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
WSL socket, fifo, char and block devices have empty reparse buffer.
Validate the length of the reparse buffer.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
$LXDEV xattr is for storing block/char device's major and minor number.
Change guard which excludes storing $LXDEV xattr to explicitly filter
everything except block and char device. Current guard is opposite, which
is currently correct but is less-safe. This change is required for adding
support for creating WSL-style symlinks as symlinks also do not use
device's major and minor numbers.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linux CIFS client currently does not implement readlink() for WSL-style
symlinks. It is only able to detect that file is of WSL-style symlink, but
is not able to read target symlink location.
Add this missing functionality and implement support for parsing content of
WSL-style symlink.
The important note is that symlink target location stored for WSL symlink
reparse point (IO_REPARSE_TAG_LX_SYMLINK) is in UTF-8 encoding instead of
UTF-16 (which is used in whole SMB protocol and also in all other symlink
styles). So for proper locale/cp support it is needed to do conversion from
UTF-8 to local_nls.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Check that path buffer has correct length (it is non-zero and in UNICODE
mode it has even number of bytes) and check that buffer does not contain
null character (UTF-16 null codepoint in UNICODE mode or null byte in
non-unicode mode) because Linux cannot process symlink with null byte.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB symlink which has SYMLINK_FLAG_RELATIVE set is relative (as opposite of
the absolute) and it can be relative either to the current directory (where
is the symlink stored) or relative to the top level export path. To what it
is relative depends on the first character of the symlink target path.
If the first character is path separator then symlink is relative to the
export, otherwise to the current directory. Linux (and generally POSIX
systems) supports only symlink paths relative to the current directory
where is symlink stored.
Currently if Linux SMB client reads relative SMB symlink with first
character as path separator (slash), it let as is. Which means that Linux
interpret it as absolute symlink pointing from the root (/). But this
location is different than the top level directory of SMB export (unless
SMB export was mounted to the root) and thefore SMB symlinks relative to
the export are interpreted wrongly by Linux SMB client.
Fix this problem. As Linux does not have equivalent of the path relative to
the top of the mount point, convert such symlink target path relative to
the current directory. Do this by prepending "../" pattern N times before
the SMB target path, where N is the number of path separators found in SMB
symlink path.
So for example, if SMB share is mounted to Linux path /mnt/share/, symlink
is stored in file /mnt/share/test/folder1/symlink (so SMB symlink path is
test\folder1\symlink) and SMB symlink target points to \test\folder2\file,
then convert symlink target path to Linux path ../../test/folder2/file.
Deduplicate code for parsing SMB symlinks in native form from functions
smb2_parse_symlink_response() and parse_reparse_native_symlink() into new
function smb2_parse_native_symlink() and pass into this new function a new
full_path parameter from callers, which specify SMB full path where is
symlink stored.
This change fixes resolving of the native Windows symlinks relative to the
top level directory of the SMB share.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Since commit 8da33fd11c05 ("cifs: avoid deadlocks while updating iface")
cifs_chan_update_iface now takes the chan_lock itself, so update the
comment accordingly.
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change return value from -ENOENT to -EOPNOTSUPP to maintain consistency
with the return value of open_cached_dir() for the same case. This
change is safe as the only calling function does not differentiate
between these return values.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Setting dir_cache_timeout to zero should disable the caching of
directory contents. Currently, even when dir_cache_timeout is zero,
some caching related functions are still invoked, which is unintended
behavior.
Fix the issue by setting tcon->nohandlecache to true when
dir_cache_timeout is zero, ensuring that directory handle caching
is properly disabled.
Fixes: 238b351d0935 ("smb3: allow controlling length of time directory entries are cached with dir leases")
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Checks inside open_cached_dir() can be removed because if dir caching is
disabled then tcon->cfids is necessarily NULL. Therefore, all other checks
are redundant.
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This eliminates a redundant get_current_cred() call, because
ceph_mds_check_access() has already obtained this pointer.
As a side effect, this also fixes a reference leak in
ceph_mds_auth_match(): by omitting the get_current_cred() call, no
additional cred reference is taken.
Cc: stable@vger.kernel.org
Fixes: 596afb0b8933 ("ceph: add ceph_mds_check_access() helper")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
F_SET_RW_HINT controls data placement in the file system and / or
device and should not be available to everyone who can read a given file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241122122931.90408-2-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Before this commit, ->dir and ->entry of exfat_inode_info record the
first cluster of the parent directory and the directory entry index
starting from this cluster.
The directory entry set will be gotten during write-back-inode/rmdir/
unlink/rename. If the clusters of the parent directory are not
continuous, the FAT chain will be traversed from the first cluster of
the parent directory to find the cluster where ->entry is located.
After this commit, ->dir records the cluster where the first directory
entry in the directory entry set is located, and ->entry records the
directory entry index in the cluster, so that there is almost no need
to access the FAT when getting the directory entry set.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
For the root directory and other directories, the clusters
allocated to them can be obtained from exfat_inode_info, and
there is no need to distinguish them.
And there is no need to initialize atime/ctime/mtime/size in
exfat_readdir(), because exfat_iterate() does not use them.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
The output of argument 'p_dir' of exfat_add_entry() is not used
in either exfat_mkdir() or exfat_create(), remove the argument.
Code refinement, no functional changes.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
__exfat_resolve_path() mixes two functions. The first one is to
resolve and check if the path is valid. The second one is to output
the cluster assigned to the directory.
The second one is only needed when need to traverse the directory
entries, and calling exfat_chain_set() so early causes p_dir to be
passed as an argument multiple times, increasing the complexity of
the code.
This commit moves the call to exfat_chain_set() before traversing
directory entries.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
This helper gets the directory entry set of the file for the exfat
inode which has been created.
It's used to remove all the instances of the pattern it replaces
making the code cleaner, it's also a preparation for changing ->dir
to record the cluster where the directory entry set is located and
changing ->entry to record the index of the directory entry within
the cluster.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
In this exfat implementation, the relationship between inode and ei
is ei=EXFAT_I(inode). However, in the arguments of exfat_move_file()
and exfat_rename_file(), argument 'inode' indicates the parent
directory, but argument 'ei' indicates the target file to be renamed.
They do not have the above relationship, which is not friendly to code
readers.
So this commit renames 'inode' to 'parent_inode', making the argument
name match its role.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
To determine whether it is a directory, there is no need to read its
directory entry, just use S_ISDIR(inode->i_mode).
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Unaligned direct writes are invalid and should return an error
without making any changes, rather than extending ->valid_size
and then returning an error. Therefore, alignment checking is
required before extending ->valid_size.
Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Co-developed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
There is no check if stream size and start_clu are invalid.
If start_clu is EOF cluster and stream size is 4096, It will
cause uninit value access. because ei->hint_femp.eidx could
be 128(if cluster size is 4K) and wrong hint will allocate
next cluster. and this cluster will be same with the cluster
that is allocated by exfat_extend_valid_size(). The previous
patch will check invalid start_clu, but for clarity, initialize
hint_femp.eidx to zero.
Cc: stable@vger.kernel.org
Reported-by: syzbot+01218003be74b5e1213a@syzkaller.appspotmail.com
Tested-by: syzbot+01218003be74b5e1213a@syzkaller.appspotmail.com
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
In the case of the directory size is greater than or equal to
the cluster size, if start_clu becomes an EOF cluster(an invalid
cluster) due to file system corruption, then the directory entry
where ei->hint_femp.eidx hint is outside the directory, resulting
in an out-of-bounds access, which may cause further file system
corruption.
This commit adds a check for start_clu, if it is an invalid cluster,
the file or directory will be treated as empty.
Cc: stable@vger.kernel.org
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Co-developed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Sergey Senozhatsky improves zram's post-processing selection algorithm.
This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of shadow
entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in the
hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page into
small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio size
rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations" from
Mike Rapoport teaches x86 to use large pages for read-only-execute
module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a single
VMA, rather than requiring that multiple VMAs be created for this.
Improved efficiencies for userspace memory allocators are expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP from
the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep is
enabled.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
=JfWR
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection
algorithm. This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping
code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of
shadow entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in
the hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page
into small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to
do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio
size rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel
Butt removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations"
from Mike Rapoport teaches x86 to use large pages for
read-only-execute module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
tests over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a
single VMA, rather than requiring that multiple VMAs be created for
this. Improved efficiencies for userspace memory allocators are
expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP
from the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep
is enabled.
* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
mm/kfence: add a new kunit test test_use_after_free_read_nofault()
zram: fix NULL pointer in comp_algorithm_show()
memcg/hugetlb: add hugeTLB counters to memcg
vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
zram: ZRAM_DEF_COMP should depend on ZRAM
MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
Docs/mm/damon: recommend academic papers to read and/or cite
mm: define general function pXd_init()
kmemleak: iommu/iova: fix transient kmemleak false positive
mm/list_lru: simplify the list_lru walk callback function
mm/list_lru: split the lock to per-cgroup scope
mm/list_lru: simplify reparenting and initial allocation
mm/list_lru: code clean up for reparenting
mm/list_lru: don't export list_lru_add
mm/list_lru: don't pass unnecessary key parameters
kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
...
Piergiorgio reported a bug in bugzilla as below:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
__issue_discard_cmd+0x1ca/0x350 [f2fs]
issue_discard_thread+0x191/0x480 [f2fs]
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs
The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.
This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().
Cc: stable@vger.kernel.org
Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Quoted:
"at this time, there are still 1086911 extent nodes in this zombie
extent tree that need to be cleaned up.
crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500
node_cnt = {
counter = 1086911
},
"
As reported by Xiuhong, there will be a huge number of extent nodes
in extent tree, it may potentially cause:
- slab memory fragments
- extreme long time shrink on extent tree
- low mapping efficiency
Let's add a sysfs node to limit max read extent count for each inode,
by default, value of this threshold is 10240, it can be updated
according to user's requirement.
Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmdA/W0ACgkQiiy9cAdy
T1HN6Av/ai3dqbNO0d9IrmM7JONITa6CYqBA+DC2zyv7LtNmSOVGAeP+LyEuM0rE
jHbQBNSRVhoNKCv5ywT6GhjMNnDeO0ctZG2aXoQnGfCFdZ5dE/08r8Cc24xQW+x5
cdLbBGR22HBi4MjqqWALL2goL9TpLYo9ht31P1xmiM1pw3/pbh2lsEDLVMS/veeG
RQ1pIg5YpWcWQWAnuwI6RDxGHF2taj7tcm8gZ1mYRUlaPHmsCHeNg6lgHdDXLFKw
0HA4dx8AeH2rdMWgwP42UVIVoxG/H//xPpLym+A8yV+11EJcurjYkskhEsSUxeWq
vzsf0xFxN2MhjwI5DYfr5kIQknjS3qCTOofRsJi6s5GhxJy2hl0Wfto0AJs6+Fl4
/sukKWjWSgrkcpILLokDUZptYEBH7pUNcBVpfcQOScnUeV4qQk8oko3bTKDSfgTq
q3zrh2mPanTOpM6UgcBRfywR4r5tAKzoQOUfJpQuItLvLhgrgAm+WGCWFepztJlT
LPEGTxrG
=K4Ai
-----END PGP SIGNATURE-----
Merge tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Fix two SMB3.1.1 POSIX Extensions problems
- Fixes for special file handling (symlinks and FIFOs)
- Improve compounding
- Four cleanup patches
- Fix use after free in signing
- Add support for handling namespaces for reconnect related upcalls
(e.g. for DNS names resolution and auth)
- Fix various directory lease problems (directory entry caching),
including some important potential use after frees
* tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: prevent use-after-free due to open_cached_dir error paths
smb: Don't leak cfid when reconnect races with open_cached_dir
smb: client: handle max length for SMB symlinks
smb: client: get rid of bounds check in SMB2_ioctl_init()
smb: client: improve compound padding in encryption
smb3: request handle caching when caching directories
cifs: Recognize SFU char/block devices created by Windows NFS server on Windows Server <<2012
CIFS: New mount option for cifs.upcall namespace resolution
smb/client: Prevent error pointer dereference
fs/smb/client: implement chmod() for SMB3 POSIX Extensions
smb: cached directories can be more than root file handle
smb: client: fix use-after-free of signing key
smb: client: Use str_yes_no() helper function
smb: client: memcpy() with surrounding object base address
cifs: Remove pre-historic unused CIFSSMBCopy
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmc90jsACgkQEVvVyTe/
1Wol0A//RhzFCG8geR7Grbptp40CUm9kVISvkr50mPBdvVk3jX9WvH9m/10qapGP
tcGHSdHt+q5qabqutKLmQRiFbwpGEaBMaFOe7JH8na8xWvmSa3p7sJC5kLByS3rm
D2F+cVx3Di7MTscz/Ma724bHdHOUO5RbDuMIcjp7uXRvaNWJ0uZg5xWlBKsNa3h8
DbNSYi5ICihLYpUxI9NglHZ6iqcS2jHsUHSAw52/GJ2Zon1LAAmKoSn6s7hZ27ZJ
f8Rv5fFuYmkRV7nYo/gjLY1gt7KXZFcfUtMT05yd7zcnqDayKEFXEiwI/Bz5fXZL
HmZpOP4RV2M9B8HzhReVR/yG8gZaaUezX+aVQp7plZSc73GhMdFFd1bUyjgJ4Lzf
C2BlBMWafc/Zc7a7r0+X5577i34nED8lGuVMEdYMtjSjstpzIP+1Wlzn2cGi4+5K
VAb+kEravjP9ck7YrmbruRYfVhDaE37BDs4XML4S8gzcZgdaTcEMyGw1ifEhvPjA
vLbRs24a5VO7/cKlks7PWS6i9uExaz7g4re0jUPwUuc+nS+Hv+y8kLSPqLS4CtNY
MxhS2IhKK5gp1Z9XGpLsak+ancTYLSV0OJ15qsAChpqoqSG5Xd9Lt4CWACnF33Ea
ny8z5QpOAHWVb97k6xaEvu/r0dl+PHdG7vfb0MNhXaajNF8SKiU=
=pgoX
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- Fix a syzbot reported NULL pointer deref with bfs lower layers
- Fix a copy up failure of large file from lower fuse fs
- Followup cleanup of backing_file API from Miklos
- Introduction and use of revert/override_creds_light() helpers, that
were suggested by Christian as a mitigation to cache line bouncing
and false sharing of fields in overlayfs creator_cred long lived
struct cred copy.
- Store up to two backing file references (upper and lower) in an
ovl_file container instead of storing a single backing file in
file->private_data.
This is used to avoid the practice of opening a short lived backing
file for the duration of some file operations and to avoid the
specialized use of FDPUT_FPUT in such occasions, that was getting in
the way of Al's fd_file() conversions.
* tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: Filter invalid inodes with missing lookup function
ovl: convert ovl_real_fdget() callers to ovl_real_file()
ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
ovl: store upper real file in ovl_file struct
ovl: allocate a container struct ovl_file for ovl private context
ovl: do not open non-data lower file for fsync
ovl: Optimize override/revert creds
ovl: pass an explicit reference of creators creds to callers
ovl: use wrapper ovl_revert_creds()
fs/backing-file: Convert to revert/override_creds_light()
cred: Add a light version of override/revert_creds()
backing-file: clean up the API
ovl: properly handle large files in ovl_security_fileattr
This update includes:
- A patch by Thomas Weißschuh constifying a read-only struct.
- A patch by André Almeida fixing the error path of unicode_load,
which might trigger a kernel oops if it fails to find the unicode
module.
- One documentation fix by Gan Jie, updating a filename in the README.
- A patch by André Almeida adding the link of my tree to MAINTAINERS.
All but the MAINTAINERS patch have been sitting on my tree and in
linux-next since early in the 6.12 cycle.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZ0D4JwAKCRBOwQgI3t8R
JmjZAP988O9eB4ITF6KHKsHyY3pOhxSRXU5jpr78v7ofDDuGwAD/UBJZyF35wgJz
S2q295kCAEP8bUKxj6RJtyMyQnamQg8=
=irw2
-----END PGP SIGNATURE-----
Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode updates from Gabriel Krisman Bertazi:
- constify a read-only struct (Thomas Weißschuh)
- fix the error path of unicode_load, avoiding a possible kernel oops
if it fails to find the unicode module (André Almeida)
- documentation fix, updating a filename in the README (Gan Jie)
- add the link of my tree to MAINTAINERS (André Almeida)
* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
MAINTAINERS: Add Unicode tree
unicode: change the reference of database file
unicode: Fix utf8_load() error path
unicode: constify utf8 data table
* sysctl ctl_table constification
Constifying ctl_table structs prevents the modification of proc_handler
function pointers. All ctl_table struct arguments are const qualified in the
sysctl API in such a way that the ctl_table arrays being defined elsewhere
and passed through sysctl can be constified one-by-one. We kick the
constification off by qualifying user_table in kernel/ucount.c and expect all
the ctl_tables to be constified in the coming releases.
* Misc fixes
Adjust comments in two places to better reflect the code. Remove superfluous
dput calls. Remove Luis from sysctl maintainership. Replace comments about
holding a lock with calls to lockdep_assert_held.
* Testing
All these went through 0-day and they have all been in linux-next for at
least 1 month (since Oct-24). I also rand these through the sysctl selftest
for x86_64.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmdAXMsACgkQupfNUreW
QU/KfQv8Daq9sew98ohmS/lkdoE1dfpI72motzEn1993CbLjN2h3CZauaHjBPFnr
rpr8qPrphdWTyDbDMgx63oxcNxM07g7a9H0y/K3IwdUsx7fGINgHF5kfWeVn09ov
X8I3NuL/+xSHAZRsLQeBykbY6BD5e0uuxL6ayGzkejrgRd+80dmC3MzXqX207v1z
rlrUFXEXwqKYgxP/H+pxmvmVWKAeFsQt/E49GOkg2qSg9mVFhtKpxHwMJVqS2a8u
qAKHgcZhB5T8TQSb1eKnyCzXLDLpzqUBj9ejqJSsQm16fweawv221Ji6a1k53QYG
chreoB9R8qCZ/jGoWI3ZKGRZ/Vl37l+GF/82X/sDrMbKwVlxvaERpb1KXrnh/D1v
qNze1Eea0eYv22weGGEa3J5N2tKfgX6NcRFioDNe9VEXX6zDcAtJKTKZtbMB3gXX
CzQicH5yXApyAk3aNCq0S3s+WRQR0syGAYCmtxhaRgXRnSu9qifKZ1XhZQyhgKIG
Flt9MsU2
=bOJ0
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
"sysctl ctl_table constification:
- Constifying ctl_table structs prevents the modification of
proc_handler function pointers. All ctl_table struct arguments are
const qualified in the sysctl API in such a way that the ctl_table
arrays being defined elsewhere and passed through sysctl can be
constified one-by-one.
We kick the constification off by qualifying user_table in
kernel/ucount.c and expect all the ctl_tables to be constified in
the coming releases.
Misc fixes:
- Adjust comments in two places to better reflect the code
- Remove superfluous dput calls
- Remove Luis from sysctl maintainership
- Replace comments about holding a lock with calls to
lockdep_assert_held"
* tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: Reduce dput(child) calls in proc_sys_fill_cache()
sysctl: Reorganize kerneldoc parameter names
ucounts: constify sysctl table user_table
sysctl: update comments to new registration APIs
MAINTAINERS: remove me from sysctl
sysctl: Convert locking comments to lockdep assertions
const_structs.checkpatch: add ctl_table
sysctl: make internal ctl_tables const
sysctl: allow registration of const struct ctl_table
sysctl: move internal interfaces to const struct ctl_table
bpf: Constify ctl_table argument of filter function
- Constify range_contains() input parameters to prevent changes.
- Add support for displaying RCD capabilities in sysfs to support lspci for CXL device.
- Downgrade warning message to debug in cxl_probe_component_regs().
- Add support for adding a printf specifier '$pra' to emit 'struct range' content.
- Add sanity tests for 'struct resource'.
- Add documentation for special case.
- Add %pra for 'struct range'.
- Add %pra usage in CXL code.
- Add preparation code for DCD support
- Add range_overlaps().
- Add CDAT DSMAS table shared and read only flag in ACPICA.
- Add documentation to 'struct dev_dax_range'.
- Delay event buffer allocation in CXL PCI code until needed.
- Use guard() in cxl_dpa_set_mode().
- Refactor create region code to consolidate common code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmc84dMACgkQYGjFFmlT
OEoGTg//cSJlQ9X7+xZDbngnzpJwcLzQkR/FXDfe3obtmgs7woDJgNNcYnKSlgyf
wal47Q0UM/1Hv8Dtfrt62Ay1fmOvDL2GSpey35NVJGCEpIsfOqqk1zTCgfgwRHTO
MZJLnOSFUIlDYlVz8ljLNHnNqPjr7dCoUh9tdBefvkw59FqbkHNcWI8hG1lh1SR4
2frtJcqVg54S6vJa2eeWmNVpxz7RZvPFrb8TJzhdrGM8PkTMNFA2oJINAf0j00Ev
8/T6HXTxXvFtNhBH0dtMO1MFh1d6Qr/zFnX/gmrnPWl1l/12HFDMBIZIzq/Whjpo
+7hQ5xK3cwkMevFgFrAhwdZMj8maR84x1dbFItoThaoeDIQ4sGfyQEMPsbkZP/Sc
67i5hQFIBZc+ORLB0W+z9Da52ZFGyVw/xsCmDRzXCw4s7N2twpydIoA7Pvu9NN1X
3JVF35NrsRZ+PyuGWEitNjo0Rj6swNpBC5Xv/T1mgFtSgvVuk1T2QtSHJcPoQyzQ
zbijsCKmvJYbdJBnPiotdrBs1BUxBsP9dBT9IxWzMy6lcEpTJrYpUheRCk2tSHFa
Kk8O8IYNiBKZaSpN9UHKaGzr43H8gNbLf4svSIiu1lZJTSSdtWqfZZYjXFBgB1Vb
l2gBCDmPJ0y7WKZSCa53UmQiOusr+l3Pi+OflZEfCy6JxbSqTTM=
=GNlu
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dave Jiang:
- Constify range_contains() input parameters to prevent changes
- Add support for displaying RCD capabilities in sysfs to support lspci
for CXL device
- Downgrade warning message to debug in cxl_probe_component_regs()
- Add support for adding a printf specifier '%pra' to emit 'struct
range' content:
- Add sanity tests for 'struct resource'
- Add documentation for special case
- Add %pra for 'struct range'
- Add %pra usage in CXL code
- Add preparation code for DCD support:
- Add range_overlaps()
- Add CDAT DSMAS table shared and read only flag in ACPICA
- Add documentation to 'struct dev_dax_range'
- Delay event buffer allocation in CXL PCI code until needed
- Use guard() in cxl_dpa_set_mode()
- Refactor create region code to consolidate common code
* tag 'cxl-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Refactor common create region code
cxl/hdm: Use guard() in cxl_dpa_set_mode()
cxl/pci: Delay event buffer allocation
dax: Document struct dev_dax_range
ACPI/CDAT: Add CDAT/DSMAS shared and read only flag values
range: Add range_overlaps()
cxl/cdat: Use %pra for dpa range outputs
printf: Add print format (%pra) for struct range
Documentation/printf: struct resource add start == end special case
test printf: Add very basic struct resource tests
cxl: downgrade a warning message to debug level in cxl_probe_component_regs()
cxl/pci: Add sysfs attribute for CXL 1.1 device link status
cxl/core/regs: Add rcd_pcie_cap initialization
kernel/range: Const-ify range_contains parameters
If iov_iter_zero succeeds after failed copy_from_kernel_nofault,
we need to reset the ret value to zero otherwise it will be returned
as final return value of read_kcore_iter.
This fixes objdump -d dump over /proc/kcore for me.
Cc: stable@vger.kernel.org
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Fixes: 3d5854d75e31 ("fs/proc/kcore.c: allow translation of physical memory addresses")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20241121231118.3212000-1-jolsa@kernel.org
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>