After str_true_false() has been introduced in the tree,
we can add rules for finding places where str_true_false()
can be used. A simple test can find over 10 locations.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
* KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or deleted.
The former does not have particularly noticeable overhead, but Intel's
TDX will require the guest to re-accept private pages if they are
dropped from the secure EPT, which is a non starter. Actually,
the only reason why this is not already being done is a bug which
was never fully investigated and caused VM instability with assigned
GeForce GPUs, so allow userspace to opt into the new behavior.
* Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
functionality that is on the horizon).
* Rework common MSR handling code to suppress errors on userspace accesses to
unsupported-but-advertised MSRs. This will allow removing (almost?) all of
KVM's exemptions for userspace access to MSRs that shouldn't exist based on
the vCPU model (the actual cleanup is non-trivial future work).
* Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
stores the entire 64-bit value at the ICR offset.
* Fix a bug where KVM would fail to exit to userspace if one was triggered by
a fastpath exit handler.
* Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
there's already a pending wake event at the time of the exit.
* Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
(the SHUTDOWN hits the VM altogether, not the nested guest)
* Overhaul the "unprotect and retry" logic to more precisely identify cases
where retrying is actually helpful, and to harden all retry paths against
putting the guest into an infinite retry loop.
* Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
the shadow MMU.
* Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
adding multi generation LRU support in KVM.
* Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
i.e. when the CPU has already flushed the RSB.
* Trace the per-CPU host save area as a VMCB pointer to improve readability
and cleanup the retrieval of the SEV-ES host save area.
* Remove unnecessary accounting of temporary nested VMCB related allocations.
* Set FINAL/PAGE in the page fault error code for EPT violations if and only
if the GVA is valid. If the GVA is NOT valid, there is no guest-side page
table walk and so stuffing paging related metadata is nonsensical.
* Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
emulating posted interrupt delivery to L2.
* Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
* Harden eVMCS loading against an impossible NULL pointer deref (really truly
should be impossible).
* Minor SGX fix and a cleanup.
* Misc cleanups
Generic:
* Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.
* Enable virtualization when KVM is loaded, not right before the first VM
is created. Together with the previous change, this simplifies a
lot the logic of the callbacks, because their very existence implies
virtualization is enabled.
* Fix a bug that results in KVM prematurely exiting to userspace for coalesced
MMIO/PIO in many cases, clean up the related code, and add a testcase.
* Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
the gpa+len crosses a page boundary, which thankfully is guaranteed to not
happen in the current code base. Add WARNs in more helpers that read/write
guest memory to detect similar bugs.
Selftests:
* Fix a goof that caused some Hyper-V tests to be skipped when run on bare
metal, i.e. NOT in a VM.
* Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
* Explicitly include one-off assets in .gitignore. Past Sean was completely
wrong about not being able to detect missing .gitignore entries.
* Verify userspace single-stepping works when KVM happens to handle a VM-Exit
in its fastpath.
* Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd
cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw
7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw
VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w
B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q
i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw==
=Q7kz
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
- Clean up and improve vdso code: use SYM_* macros for function and
data annotations, add CFI annotations to fix GDB unwinding, optimize
the chacha20 implementation
- Add vfio-ap driver feature advertisement for use by libvirt and mdevctl
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmb39nkACgkQjYWKoQLX
FBiacggAlHrwDYIZdr7bd+0bJa8Wq5STrdo1XiohQhCrRg+4rXRaquEQszWVheRk
qi/9u+MKJFtbd4VMbf+WqEi2P4Pnn5cByv+As3B4RMR0Tp3uQK5TlDbh7wbJ+pIG
AQQoL5D2z05rvbYgc1Wvt/3lgrfs2EzUY3cAyIMmFwSp0On+psDuwuFVtWzH0jCr
fxCwX7mrF6OoBMR0QSUxAvoOujUhd4CANk3n6rNxJfi2AC/gX/ZkFUrV5a3FvJ2Z
X7fQgDc3rivsm5wFuoBvlrH5J1jhmWtsg8Tiygos7BBehpQ5NUsJMDGS8kR+avbF
iYKg1oMy5/ZXekBjqkEMO6Vp3nCrdg==
=Idf7
-----END PGP SIGNATURE-----
Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Clean up and improve vdso code: use SYM_* macros for function and
data annotations, add CFI annotations to fix GDB unwinding, optimize
the chacha20 implementation
- Add vfio-ap driver feature advertisement for use by libvirt and
mdevctl
* tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Driver feature advertisement
s390/vdso: Use one large alternative instead of an alternative branch
s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects
tools: Add additional SYM_*() stubs to linkage.h
s390/vdso: Use macros for annotation of asm functions
s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack()
s390/vdso: Fix comment within __arch_chacha20_blocks_nostack()
s390/vdso: Get rid of permutation constants
There are a few fixes / cleanups from Vincent, Chunhui, and Petr, but the
most important part of this pull request is the Rust community stepping
up to help maintain both C / Rust code for future Rust module support. We
grow the set of modules maintainers by 3 now, and with this hope to scale to
help address what's needed to properly support future Rust module support.
A lot of exciting stuff coming in future kernel releases.
This has been on linux-next for ~ 3 weeks now with no issues.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmb3InQSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinA/IP/RP3O3Cwtyjd51lMNzEmJR0WE0J7/C3z
v4L3teqoiH4vWF0vDd8jVE1SL9RZ0TnrSUUF/Kbf7YolXELPO+WSvPepGqlzeUTd
KH+PZX+AmaGXhwAGmB53AMhcP8HmGci+IZZgyZUnYxZawcFYU24WYO84JAKltNsy
/wqepYXObc0HiNXk+VS3h8Z+1y9nhJ55xluvTf5guQbrtjl1xWXSdVdF1/V5wnjp
qShNSNn1bktFO0lK7IW/UmM0kEoFHHyUslwNcP/rJLIb99lDV3M+Vd3i41dBkuYw
iSCD+a/0fOmUj909Q4VfZQkK4vKEi04XIz1EHb2uYOGKcr75gnWmCRyUL1TJSFO/
oXNd2SlvwMYXxMczsaLppAPERRgSMWnsBEZWZ7nk2uBpuFay43LfEdZcPwknGNkz
7Ns+3PHr6W3phUo1izrgxBk6xTyEDR6etxThSGvq/dhG3VuivV6hRyxFZX9NaTSD
a/uFhIj2f8FuV9TLYUzPO/NwwLklPFe9dCvtWEHgSvtyaeX1pSvyjz8fLbXDGyu/
qVXMp2fegLJ2bq9A0ABtd7nuVNCAN24pl+Nwws+GMRmCg9b1Sfego16WoLUDbbHX
mjVAFTtKgqEg0ePnbjqGm7I7siY/9x8I39aA9WbNoXKNFu3hwMDHLAavATmj+1dV
UlrMxvfv20WQ
=4P89
-----END PGP SIGNATURE-----
Merge tag 'modules-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"There are a few fixes / cleanups from Vincent, Chunhui, and Petr, but
the most important part of this pull request is the Rust community
stepping up to help maintain both C / Rust code for future Rust module
support. We grow the set of modules maintainers by three now, and with
this hope to scale to help address what's needed to properly support
future Rust module support.
A lot of exciting stuff coming in future kernel releases.
This has been on linux-next for ~ 3 weeks now with no issues"
* tag 'modules-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Refine kmemleak scanned areas
module: abort module loading when sysfs setup suffer errors
MAINTAINERS: scale modules with more reviewers
module: Clean up the description of MODULE_SIG_<type>
module: Split modules_install compression and in-kernel decompression
- crash fix in fbcon_putcs
- avoid a possible string memory overflow in sisfb
- minor code optimizations in omapfb and fbcon
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZvfmvgAKCRD3ErUQojoP
XxcKAP444xoWxF5zDLAl4xczeC1lnMazgY5W7ELFMNDxQ2PPlwD/Tmn/V1EzLRoZ
MOiVtMpM5K7kIRUKdcjKVCltI/mpgQ8=
=6eO2
-----END PGP SIGNATURE-----
Merge tag 'fbdev-for-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
- crash fix in fbcon_putcs
- avoid a possible string memory overflow in sisfb
- minor code optimizations in omapfb and fbcon
* tag 'fbdev-for-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: sisfb: Fix strbuf array overflow
fbcon: break earlier in search_fb_in_map and search_for_mapped_con
fbdev: omapfb: Call of_node_put(ep) only once in omapdss_of_find_source_for_first_ep()
fbcon: Fix a NULL pointer dereference issue in fbcon_putcs
i915:
- Fix BMG support to UHBR13.5
- Two PSR fixes
- Fix colorimetry detection for DP
xe
- Fix macro for checking minimum GuC version
- Fix CCS offset calculation for some BMG SKUs
- Fix locking on memory usage reporting via fdinfo and BO destroy
- Fix GPU page fault handler on a closed VM
- Fix overflow in oa batch buffer
amdgpu:
- MES 12 fix
- KFD fence sync fix
- SR-IOV fixes
- VCN 4.0.6 fix
- SDMA 7.x fix
- Bump driver version to note cleared VRAM support
- SWSMU fix
amdgpu:
- CU occupancy logic fix
- SDMA queue fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmb3QQoACgkQDHTzWXnE
hr7+yQ//dBbBOdxwWcQA6N4p5KOQdhMecU1LaGkv0nbV6ft4mxhHq2XSczr7DMJM
C56cgTplw4Lajfo0Q/gwgoLcgK4EzRc3kReb8gC9diyZ+zgolZbS5uSnJsS3IsST
xriP2sb+lsPH6UAqtUZABMA6aOE7VYmJnRZlo0tOulRRYSnX++gTPQIi2PprVzBh
jFlFmLABCqvZ5md0ux8NITzRpE2sODuawKTpdXoTMTVsrXF+YBtRaJD170eC4mj1
3JDmsY90TpvHWri4BHQ98VqJBpzLiIU8COQHaZab2cfV+yH+KfUo2puH1RS4swW5
gbrOAbK/OXzoX+6aT1rYuDihcrX5+88MZovhRW7Ik0dEm5Ysl8PRlRkftuMa2mg9
tUJjjUfmGDf9eiYCUt7/BuDguN2lc+r/TOM4F+2kmB0dxDkYn3u1W95DxseoiLHt
Sq/M2sWm9p/TjDC9XW+vy9dfuoucEyQfdiPqKP27BheckCGF1SskLFW+oZCq3iF9
0RJsvpwQBSxsLR0/oJok9cxmSAhpZoUiV0zKuqCcP+OTIFI4urKujom/XrJIjayU
fg0vaXzPd9crzSZX1rqF8/UDx8uV4uf4IHD9MNrCYIXpiVJHWzx0afU1AE5576F5
sT335W/nG6BHsrV/PIRR62v3QU0yLkjQv6VbWqJwMZumuQ2x/iI=
=r1M/
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes for the week to end the merge window, i915 and xe have a
few each, amdgpu makes up most of it with a bunch of SR-IOV related
fixes amongst others.
i915:
- Fix BMG support to UHBR13.5
- Two PSR fixes
- Fix colorimetry detection for DP
xe:
- Fix macro for checking minimum GuC version
- Fix CCS offset calculation for some BMG SKUs
- Fix locking on memory usage reporting via fdinfo and BO destroy
- Fix GPU page fault handler on a closed VM
- Fix overflow in oa batch buffer
amdgpu:
- MES 12 fix
- KFD fence sync fix
- SR-IOV fixes
- VCN 4.0.6 fix
- SDMA 7.x fix
- Bump driver version to note cleared VRAM support
- SWSMU fix
- CU occupancy logic fix
- SDMA queue fix"
* tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel: (79 commits)
drm/amd/pm: update workload mask after the setting
drm/amdgpu: bump driver version for cleared VRAM
drm/amdgpu: fix vbios fetching for SR-IOV
drm/amdgpu: fix PTE copy corruption for sdma 7
drm/amdkfd: Add SDMA queue quantum support for GFX12
drm/amdgpu/vcn: enable AV1 on both instances
drm/amdkfd: Fix CU occupancy for GFX 9.4.3
drm/amdkfd: Update logic for CU occupancy calculations
drm/amdgpu: skip coredump after job timeout in SRIOV
drm/amdgpu: sync to KFD fences before clearing PTEs
drm/amdgpu/mes12: set enable_level_process_quantum_check
drm/i915/dp: Fix colorimetry detection
drm/amdgpu/mes12: reduce timeout
drm/amdgpu/mes11: reduce timeout
drm/amdgpu: use GEM references instead of TTMs v2
drm/amd/display: Allow backlight to go below `AMDGPU_DM_DEFAULT_MIN_BACKLIGHT`
drm/amd/display: Fix kdoc entry for 'tps' in 'dc_process_dmub_dpia_set_tps_notification'
drm/amdgpu: update golden regs for gfx12
drm/amdgpu: clean up vbios fetching code
drm/amd/display: handle nulled pipe context in DCE110's set_drr()
...
cleanups.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmb3JroTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzizDiB/0elHQQaFxXMjuJRY1IzohozAHi0cHK
gwgE4nEbECE8vRYK/QvyvZ3S+ep+N+r6jOIiIDyqhjtlY3//oSyyxL7RjMJlVFBq
Ie37w8r4q1aL1mn9QDQ4iQxcRYyU+JxcUcPR1UUUvLiKgWaRixmq27zby/WQSrkA
ke2ScBRDtEAYVtdxvxmUJK/DrPr3skwJAGY52KesjwgVhXSL8KG9X1zMUbWdJYDV
THbQzLZsu4NVh7LlAsS/mh+z0EIZsXxQYU5IY3dIVEYcuLK93lXRGZb+7whtmUef
wsDtYIe/w30QVxFdrN28qAQp8daUJhp+3t0EZSyecRcq5OPey6ICx1P4
=+bdB
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"Three CephFS fixes from Xiubo and Luis and a bunch of assorted
cleanups"
* tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client:
ceph: remove the incorrect Fw reference check when dirtying pages
ceph: Remove empty definition in header file
ceph: Fix typo in the comment
ceph: fix a memory leak on cap_auths in MDS client
ceph: flush all caps releases when syncing the whole filesystem
ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases()
libceph: use min() to simplify code in ceph_dns_resolve_name()
ceph: Convert to use jiffies macro
ceph: Remove unused declarations
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmb2y3gACgkQiiy9cAdy
T1EATQwAwQTCQSd916VK4HPzphM/lJLaxNCNNV0AY8QNXsUxhBxQny2FJ+jilFZU
y4G5/zSi8+PX0YtyrPJqtofbFX+eeD6eKRCFT/1YEEkwEYp53mjsCIHWidSPGh6X
S2du6tAebSCQSqlHv5zlTpL24UVhi6amse7aJyXs8v7JZO9ZjtEE0D+a1xqSV4kt
0+6/W3RM49HTEAql7TavduNR3UcesYg2KS48qNVvGhHhY3wcGe92mZ0Sr4NUStfg
IjtpfsxxBJWKiXDJhGBN8M/O6jqBtE++O/CyDknYGOs6M7QtPJ1xtXpESlq+OgWV
JEqNorZI4qvl/5PbY/1+6wJDY3ogv2DhwyRaOdhtVc5CgF1JGLKW4lVBGBIrQz2B
dyHbiGAXEA+Rm7/8UkyFZRmvbmLXDqRM7AEyLrXoeS5Vw51RxS6CT0oDesVGVsdX
+koQ1OQ55AiR1TXhasDj6XvmFAyYKuEPh/qhBz1jEBX8unyhIUUVrG3CnMULl3rY
FWbdmtDB
=4ons
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix querying dentry for char/block special files
- small cleanup patches
* tag 'v6.12-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Correct typos in multiple comments across various files
ksmbd: fix open failure from block and char device file
ksmbd: remove unsafe_memcpy use in session setup
ksmbd: Replace one-element arrays with flexible-array members
ksmbd: fix warning: comparison of distinct pointer types lacks a cast
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmb3FfgACgkQiiy9cAdy
T1HLMQv/ZXkiNPvMmCHJE3rWdSBZQdGLDV1qOhxuW9y4CvenIhukwDmwOq7wjWOn
3dQHDaqFkVTVurosozFOJK9Aw94iz2Ad9dlryMcNN+Gb4vY3d9l3AvsbqmgbZSsg
DmdqOg1SA9NgDaHl6RFsFQQY9O5BsjkEeHBX71gZZUYMw1d6CTpFUT+wTD43L0LQ
g8r0Ksil7edw9f5WGvu8YzB4rclR45QiTVG1OMgXmr43cvoJz3GrIPXeHNm7j5D7
hzx6ELNviY77DPKnxSd55UGPngVms6c1qqWCOJMefsJRY5bhh3lEc+TQyX11HGft
Kta1TQI2gI1xgueqpR2Dh/bUuprWcc6vCbNjxezXpFOiSMt6qNfGzQDE9gZAALAj
568lRpwcMPgS9laqK4Sh9v5+Vw6E8T+FUAJKvtLRidv5iBoqI0+50mhHTBlBgiy6
XdyiAdUGSzejcu/OiOGc9C4PvVRgf0qw9+hqqZeZAsRydKgw3QwPtJ6yW+RYu86p
xQ5CE6zs
=mpUs
-----END PGP SIGNATURE-----
Merge tag '6.12rc-more-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull xmb client fixes from Steve French:
- Noisy log message cleanup
- Important netfs fix for cifs crash in generic/074
- Three minor improvements to use of hashing (multichannel and mount
improvements)
- Fix decryption crash for large read with small esize
* tag '6.12rc-more-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: make SHA-512 TFM ephemeral
smb: client: make HMAC-MD5 TFM ephemeral
smb: client: stop flooding dmesg in smb2_calc_signature()
smb: client: allocate crypto only for primary server
smb: client: fix UAF in async decryption
netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)
if an inode backpointer points to a dirent that doesn't point back,
that's an error we should warn about.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If the reader acquires the read lock and then the writer enters the slow
path, while the reader proceeds to the unlock path, the following scenario
can occur without the change:
writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0)
reader: this_cpu_dec(*lock->readers)
reader: smp_mb()
reader: state = atomic_read(&lock->state) (there is no waiting flag set)
writer: six_set_bitmask()
then the writer will sleep forever.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a filesystem flag to indicate whether we did a clean recovery -
using c->sb.clean after we've got rw is incorrect, since c->sb is
updated whenever we write the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We had a bug where disk accounting keys didn't always have their version
field set in journal replay; change the BUG_ON() to a WARN(), and
exclude this case since it's now checked for elsewhere (in the bkey
validate function).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This was added to avoid double-counting accounting keys in journal
replay. But applied incorrectly (easily done since it applies to the
transaction commit, not a particular update), it leads to skipping
in-mem accounting for real accounting updates, and failure to give them
a version number - which leads to journal replay becoming very confused
the next time around.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, check_inode() would delete unlinked inodes if they weren't
on the deleted list - this code dating from before there was a deleted
list.
But, if we crash during a logged op (truncate or finsert/fcollapse) of
an unlinked file, logged op resume will get confused if the inode has
already been deleted - instead, just add it to the deleted list if it
needs to be there; delete_dead_inodes runs after logged op resume.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BCH_SB_ERRS() has a field for the actual enum val so that we can reorder
to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for
this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
__bch2_fsck_err() warns if the current task has a btree_trans object and
it wasn't passed in, because if it has to prompt for user input it has
to be able to unlock it.
But plumbing the btree_trans through bkey_validate(), as well as
transaction restarts, is problematic - so instead make bkey fsck errors
FSCK_AUTOFIX, which doesn't need to warn.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In order to check for accounting keys with version=0, we need to run
validation after they've been assigned version numbers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
accounting read was checking if accounting replicas entries were marked
in the superblock prior to applying accounting from the journal,
which meant that a recently removed device could spuriously trigger a
"not marked in superblocked" error (when journal entries zero out the
offending counter).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Minor refactoring - replace multiple bool arguments with an enum; prep
work for fixing a bug in accounting read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dealing with outside state within a btree transaction is always tricky.
check_extents() and check_dirents() have to accumulate counters for
i_sectors and i_nlink (for subdirectories). There were two bugs:
- transaction commit may return a restart; therefore we have to commit
before accumulating to those counters
- get_inode_all_snapshots() may return a transaction restart, before
updating w->last_pos; then, on the restart,
check_i_sectors()/check_subdir_count() would see inodes that were not
for w->last_pos
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Returning a positive integer instead of an error code causes error paths
to become very confused.
Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The pointer clean points the memory allocated by kmemdup, when the
return value of bch2_sb_clean_validate_late is not zero. The memory
pointed by clean is leaked. So we should free it in this case.
Fixes: a37ad1a3ab ("bcachefs: sb-clean.c")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In downgrade_table_extra, the return value is needed. When it
return failed, we should exit immediately.
Fixes: 7773df19c3 ("bcachefs: metadata version bucket_stripe_sectors")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
check_topology doesn't need the srcu lock and doesn't use normal btree
transactions - we can just drop the srcu lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
fsck_err() jumps to the fsck_err label when bailing out; need to make
sure bp_iter was initialized...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a kasan splat in propagate_key_to_snapshot_leaves() -
varint_decode_fast() does reads (that it never uses) up to 7 bytes past
the end of the integer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Most or all errors will be autofix in the future, we're currently just
doing the ones that we know are well tested.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The values of the variables xres and yres are placed in strbuf.
These variables are obtained from strbuf1.
The strbuf1 array contains digit characters
and a space if the array contains non-digit characters.
Then, when executing sprintf(strbuf, "%ux%ux8", xres, yres);
more than 16 bytes will be written to strbuf.
It is suggested to increase the size of the strbuf array to 24.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Shumilin <shum.sdl@nppct.ru>
Signed-off-by: Helge Deller <deller@gmx.de>
Fix idle states enumeration in the intel_idle driver on platforms
supporting multiple flavors of the C6 idle state (Artem Bityutskiy).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmb3EYUSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxGMYP/in7Rn12lx5fafFwpX2lWeRPeql3dMHL
gRuj0vR5Ao5gXTCGVAfC707x92nXdXx8PK9TyWxYbX+ZmY/sjOZPp33Wd9R/sVeN
rs6gsAis3symLBlSdBrDpB19lR/4frug0JglruC+G8Pf5OMdAiNcl5eEK0R/rCEv
pKg8OYbEp460aoO0OD/BIBBY4K1jdFN7IOVbDI6TH3B1mupe8kg17sYjLXyqbddE
eC3BKQAdmJMc5ibVLAWrwlyRgSgIhiwrYf9pehX4UHdmWcRX6EG2QdGK16+42KR3
HzEFo6qRFs4LAAOFomnUOFBMZiQXzXbLeYVt1qNYnuJauk0V5BaypWwIjBgqopJ4
3ysGc3A6Jksj0mvtPPqqKTn70GKv5xNmlVF2csO+a4kh5ycz7tdM5RgJ/wAOT0St
Wx1yhxTzbVCRh5cGK2QkgB1M9LCtxIS6aNSpyCbE/s0u0D25ms70Zm229XElypP9
R7umyUw9Biv1JspQld6+KXJh/LAnkFw+5oc4Y51P+fO2sEJ8DRIsaZ92UTb/F3Zg
GT7mf8oryKZD0V6+/4LiIto6tzbOwde9TSpvF7R3u2uvgrSkCQuHdYSJp7qSzQ+K
d1o8/4ARuaUXxF16fN5NXRjxrZA7cwLq83zIoSRSMU/wdpKb8GT+8ZNIxS+XuwOk
MfXxFMytjm0C
=jiG2
-----END PGP SIGNATURE-----
Merge tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix idle states enumeration in the intel_idle driver on platforms
supporting multiple flavors of the C6 idle state (Artem Bityutskiy)"
* tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: fix ACPI _CST matching for newer Xeon platforms
The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems.
For example, currently, there is a possible deadlock involving
scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside
scx_fork_rwsem due to locking order existing in other subsystems. However,
there exists a dependency in the other direction during hotplug if hotplug
needs to fork a new task, which happens in some cases. This leads to the
following deadlock:
scx_ops_enable() hotplug
percpu_down_write(&cpu_hotplug_lock)
percpu_down_write(&scx_fork_rwsem)
block on cpu_hotplug_lock
kthread_create() waits for kthreadd
kthreadd blocks on scx_fork_rwsem
Note that this doesn't trigger lockdep because the hotplug side dependency
bounces through kthreadd.
With the preceding scx_cgroup_enabled change, this can be solved by
decoupling cpus_read_lock, which is needed for static_key manipulations,
from the other two locks.
- Move the first block of static_key manipulations outside of scx_fork_rwsem
and scx_cgroup_rwsem. This is now safe with the preceding
scx_cgroup_enabled change.
- Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration
blocks so that __scx_ops_enabled static_key enabling is outside the two
rwsems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems. With the preceding scx_cgroup_enabled change, we can
decouple them:
- As cgroup disabling no longer requires modifying a static_key which
requires cpus_read_lock(), no need to grab cpus_read_lock() before
grabbing scx_cgroup_rwsem.
- cgroup can now be independently disabled before tasks are moved back to
the fair class.
Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
now unnecessary cpus_read_lock() and move static_key operations out of
scx_fork_rwsem. This decouples all three locks in the disable path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online()
didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if
implemented, won't be called from scx_tg_offline(). This is because
SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations
are enabled and ops.cgroup_init() exists.
Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup
operations and use SCX_HAS_OP(cgroup_init) only to test whether
ops.cgroup_init() exists. Make all cgroup operations consistently use
scx_cgroup_enabled to test whether cgroup operations are enabled.
scx_cgroup_enabled is added instead of using scx_enabled() to ease planned
locking updates.
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path
were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned
on before the first scx_ops_init_task() loop in scx_ops_enable(). However,
if an external entity causes sched_class switch before the loop is complete,
tasks which are not initialized could be switched to SCX.
The following can be reproduced by running a program which keeps toggling a
process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2).
sched_ext: Invalid task state transition 0 -> 3 for fish[1623]
WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200
...
Sched_ext: simple (enabling)
RIP: 0010:scx_ops_enable_task+0x1a1/0x200
...
switching_to_scx+0x13/0xa0
__sched_setscheduler+0x850/0xa50
do_sched_setscheduler+0x104/0x1c0
__x64_sys_sched_setscheduler+0x18/0x30
do_syscall_64+0x7b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix it by gating scx_ops_init_task() separately using
scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are
finished with scx_ops_init_task().
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_ops_enable() has two task iteration loops. The first one calls
scx_ops_init_task() on every task and the latter switches the eligible ones
into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
second loop switched it into READY before switching the task into SCX.
The distinction between INIT and READY is only meaningful in the fork path
where it's used to tell whether the task finished forking so that we can
tell ops.exit_task() accordingly. Leaving task in INIT state between the two
loops is incosistent with the fork path and incorrect. The following can be
triggered by running a program which keeps toggling a task between
SCHED_OTHER and SCHED_SCX while enabling a task:
sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
...
Sched_ext: qmap (enabling+all)
RIP: 0010:scx_ops_enable_task+0x1a1/0x200
...
switching_to_scx+0x13/0xa0
__sched_setscheduler+0x850/0xa50
do_sched_setscheduler+0x104/0x1c0
__x64_sys_sched_setscheduler+0x18/0x30
do_syscall_64+0x7b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix it by transitioning to READY in the first loop right after
scx_ops_init_task() succeeds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>