19 hotfixes, 8 of which are cc:stable.

Mainly MM singleton fixes.  And a couple of ocfs2 regression fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZnCEQAAKCRDdBJ7gKXxA
 jmgSAQDk3BYs1n67cnwx/Zi04yMYDyfYTCYg2udPfT2a+GpmbwD+N5dJd/vCztXH
 5eLpP11xd/yr2+I9FefyZeUuA80KtgQ=
 =2agY
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"

* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kcov: don't lose track of remote references during softirqs
  mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
  mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
  mm: fix possible OOB in numa_rebuild_large_mapping()
  mm/migrate: fix kernel BUG at mm/compaction.c:2761!
  selftests: mm: make map_fixed_noreplace test names stable
  mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
  mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
  gcov: add support for GCC 14
  zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
  mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
  lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
  MAINTAINERS: remove Lorenzo as vmalloc reviewer
  Revert "mm: init_mlocked_on_free_v3"
  mm/page_table_check: fix crash on ZONE_DEVICE
  gcc: disable '-Warray-bounds' for gcc-9
  ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
  ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
This commit is contained in:
Linus Torvalds 2024-06-17 12:30:07 -07:00
commit e6b324fbf2
29 changed files with 345 additions and 222 deletions

View File

@ -2192,12 +2192,6 @@
Format: 0 | 1
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
init_mlocked_on_free= [MM] Fill freed userspace memory with zeroes if
it was mlock'ed and not explicitly munlock'ed
afterwards.
Format: 0 | 1
Default set by CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON
init_pkru= [X86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can

View File

@ -32,6 +32,7 @@ Security-related interfaces
seccomp_filter
landlock
lsm
mfd_noexec
spec_ctrl
tee

View File

@ -0,0 +1,86 @@
.. SPDX-License-Identifier: GPL-2.0
==================================
Introduction of non-executable mfd
==================================
:Author:
Daniel Verkamp <dverkamp@chromium.org>
Jeff Xu <jeffxu@chromium.org>
:Contributor:
Aleksa Sarai <cyphar@cyphar.com>
Since Linux introduced the memfd feature, memfds have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.
However, in a secure-by-default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP of this kind.
On the other hand, executable memfd has its legit use: runc uses memfds
seal and executable feature to copy the contents of the binary then
execute them. For such a system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].
To address those above:
- Let memfd_create() set X bit at creation time.
- Let memfd be sealed for modifying X bit when NX is set.
- Add a new pid namespace sysctl: vm.memfd_noexec to help applications in
migrating and enforcing non-executable MFD.
User API
========
``int memfd_create(const char *name, unsigned int flags)``
``MFD_NOEXEC_SEAL``
When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created
with NX. F_SEAL_EXEC is set and the memfd can't be modified to
add X later. MFD_ALLOW_SEALING is also implied.
This is the most common case for the application to use memfd.
``MFD_EXEC``
When MFD_EXEC bit is set in the ``flags``, memfd is created with X.
Note:
``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that
an app doesn't want sealing, it can add F_SEAL_SEAL after creation.
Sysctl:
========
``pid namespaced sysctl vm.memfd_noexec``
The new pid namespaced sysctl vm.memfd_noexec has 3 values:
- 0: MEMFD_NOEXEC_SCOPE_EXEC
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_EXEC was set.
- 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_NOEXEC_SEAL was set.
- 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
The sysctl allows finer control of memfd_create for old software that
doesn't set the executable bit; for example, a container with
vm.memfd_noexec=1 means the old software will create non-executable memfd
by default while new software can create executable memfd by setting
MFD_EXEC.
The value of vm.memfd_noexec is passed to child namespace at creation
time. In addition, the setting is hierarchical, i.e. during memfd_create,
we will search from current ns to root ns and use the most restrictive
setting.
[1] https://crbug.com/1305267
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1
[3] https://lwn.net/Articles/781013/

View File

@ -23974,7 +23974,6 @@ VMALLOC
M: Andrew Morton <akpm@linux-foundation.org>
R: Uladzislau Rezki <urezki@gmail.com>
R: Christoph Hellwig <hch@infradead.org>
R: Lorenzo Stoakes <lstoakes@gmail.com>
L: linux-mm@kvack.org
S: Maintained
W: http://www.linux-mm.org

View File

@ -1046,10 +1046,21 @@ config ARCH_MMAP_RND_BITS_MAX
config ARCH_MMAP_RND_BITS_DEFAULT
int
config FORCE_MAX_MMAP_RND_BITS
bool "Force maximum number of bits to use for ASLR of mmap base address"
default y if !64BIT
help
ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS represent the number
of bits to use for ASLR and if no custom value is assigned (EXPERT)
then the architecture's lower bound (minimum) value is assumed.
This toggle changes that default assumption to assume the arch upper
bound (maximum) value instead.
config ARCH_MMAP_RND_BITS
int "Number of bits to use for ASLR of mmap base address" if EXPERT
range ARCH_MMAP_RND_BITS_MIN ARCH_MMAP_RND_BITS_MAX
default ARCH_MMAP_RND_BITS_DEFAULT if ARCH_MMAP_RND_BITS_DEFAULT
default ARCH_MMAP_RND_BITS_MAX if FORCE_MAX_MMAP_RND_BITS
default ARCH_MMAP_RND_BITS_MIN
depends on HAVE_ARCH_MMAP_RND_BITS
help
@ -1084,6 +1095,7 @@ config ARCH_MMAP_RND_COMPAT_BITS
int "Number of bits to use for ASLR of mmap base address for compatible applications" if EXPERT
range ARCH_MMAP_RND_COMPAT_BITS_MIN ARCH_MMAP_RND_COMPAT_BITS_MAX
default ARCH_MMAP_RND_COMPAT_BITS_DEFAULT if ARCH_MMAP_RND_COMPAT_BITS_DEFAULT
default ARCH_MMAP_RND_COMPAT_BITS_MAX if FORCE_MAX_MMAP_RND_BITS
default ARCH_MMAP_RND_COMPAT_BITS_MIN
depends on HAVE_ARCH_MMAP_RND_COMPAT_BITS
help

View File

@ -479,12 +479,6 @@ int ocfs2_allocate_extend_trans(handle_t *handle, int thresh)
return status;
}
struct ocfs2_triggers {
struct jbd2_buffer_trigger_type ot_triggers;
int ot_offset;
};
static inline struct ocfs2_triggers *to_ocfs2_trigger(struct jbd2_buffer_trigger_type *triggers)
{
return container_of(triggers, struct ocfs2_triggers, ot_triggers);
@ -548,85 +542,76 @@ static void ocfs2_db_frozen_trigger(struct jbd2_buffer_trigger_type *triggers,
static void ocfs2_abort_trigger(struct jbd2_buffer_trigger_type *triggers,
struct buffer_head *bh)
{
struct ocfs2_triggers *ot = to_ocfs2_trigger(triggers);
mlog(ML_ERROR,
"ocfs2_abort_trigger called by JBD2. bh = 0x%lx, "
"bh->b_blocknr = %llu\n",
(unsigned long)bh,
(unsigned long long)bh->b_blocknr);
ocfs2_error(bh->b_assoc_map->host->i_sb,
ocfs2_error(ot->sb,
"JBD2 has aborted our journal, ocfs2 cannot continue\n");
}
static struct ocfs2_triggers di_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_dinode, i_check),
};
static void ocfs2_setup_csum_triggers(struct super_block *sb,
enum ocfs2_journal_trigger_type type,
struct ocfs2_triggers *ot)
{
BUG_ON(type >= OCFS2_JOURNAL_TRIGGER_COUNT);
static struct ocfs2_triggers eb_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_extent_block, h_check),
};
switch (type) {
case OCFS2_JTR_DI:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_dinode, i_check);
break;
case OCFS2_JTR_EB:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_extent_block, h_check);
break;
case OCFS2_JTR_RB:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_refcount_block, rf_check);
break;
case OCFS2_JTR_GD:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_group_desc, bg_check);
break;
case OCFS2_JTR_DB:
ot->ot_triggers.t_frozen = ocfs2_db_frozen_trigger;
break;
case OCFS2_JTR_XB:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_xattr_block, xb_check);
break;
case OCFS2_JTR_DQ:
ot->ot_triggers.t_frozen = ocfs2_dq_frozen_trigger;
break;
case OCFS2_JTR_DR:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_dx_root_block, dr_check);
break;
case OCFS2_JTR_DL:
ot->ot_triggers.t_frozen = ocfs2_frozen_trigger;
ot->ot_offset = offsetof(struct ocfs2_dx_leaf, dl_check);
break;
case OCFS2_JTR_NONE:
/* To make compiler happy... */
return;
}
static struct ocfs2_triggers rb_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_refcount_block, rf_check),
};
ot->ot_triggers.t_abort = ocfs2_abort_trigger;
ot->sb = sb;
}
static struct ocfs2_triggers gd_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_group_desc, bg_check),
};
void ocfs2_initialize_journal_triggers(struct super_block *sb,
struct ocfs2_triggers triggers[])
{
enum ocfs2_journal_trigger_type type;
static struct ocfs2_triggers db_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_db_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
};
static struct ocfs2_triggers xb_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_xattr_block, xb_check),
};
static struct ocfs2_triggers dq_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_dq_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
};
static struct ocfs2_triggers dr_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_dx_root_block, dr_check),
};
static struct ocfs2_triggers dl_triggers = {
.ot_triggers = {
.t_frozen = ocfs2_frozen_trigger,
.t_abort = ocfs2_abort_trigger,
},
.ot_offset = offsetof(struct ocfs2_dx_leaf, dl_check),
};
for (type = OCFS2_JTR_DI; type < OCFS2_JOURNAL_TRIGGER_COUNT; type++)
ocfs2_setup_csum_triggers(sb, type, &triggers[type]);
}
static int __ocfs2_journal_access(handle_t *handle,
struct ocfs2_caching_info *ci,
@ -708,56 +693,91 @@ static int __ocfs2_journal_access(handle_t *handle,
int ocfs2_journal_access_di(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &di_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_DI],
type);
}
int ocfs2_journal_access_eb(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &eb_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_EB],
type);
}
int ocfs2_journal_access_rb(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &rb_triggers,
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_RB],
type);
}
int ocfs2_journal_access_gd(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &gd_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_GD],
type);
}
int ocfs2_journal_access_db(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &db_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_DB],
type);
}
int ocfs2_journal_access_xb(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &xb_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_XB],
type);
}
int ocfs2_journal_access_dq(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &dq_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_DQ],
type);
}
int ocfs2_journal_access_dr(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &dr_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_DR],
type);
}
int ocfs2_journal_access_dl(handle_t *handle, struct ocfs2_caching_info *ci,
struct buffer_head *bh, int type)
{
return __ocfs2_journal_access(handle, ci, bh, &dl_triggers, type);
struct ocfs2_super *osb = OCFS2_SB(ocfs2_metadata_cache_get_super(ci));
return __ocfs2_journal_access(handle, ci, bh,
&osb->s_journal_triggers[OCFS2_JTR_DL],
type);
}
int ocfs2_journal_access(handle_t *handle, struct ocfs2_caching_info *ci,
@ -778,13 +798,15 @@ void ocfs2_journal_dirty(handle_t *handle, struct buffer_head *bh)
if (!is_handle_aborted(handle)) {
journal_t *journal = handle->h_transaction->t_journal;
mlog(ML_ERROR, "jbd2_journal_dirty_metadata failed. "
"Aborting transaction and journal.\n");
mlog(ML_ERROR, "jbd2_journal_dirty_metadata failed: "
"handle type %u started at line %u, credits %u/%u "
"errcode %d. Aborting transaction and journal.\n",
handle->h_type, handle->h_line_no,
handle->h_requested_credits,
jbd2_handle_buffer_credits(handle), status);
handle->h_err = status;
jbd2_journal_abort_handle(handle);
jbd2_journal_abort(journal, status);
ocfs2_abort(bh->b_assoc_map->host->i_sb,
"Journal already aborted.\n");
}
}
}

View File

@ -284,6 +284,30 @@ enum ocfs2_mount_options
#define OCFS2_OSB_ERROR_FS 0x0004
#define OCFS2_DEFAULT_ATIME_QUANTUM 60
struct ocfs2_triggers {
struct jbd2_buffer_trigger_type ot_triggers;
int ot_offset;
struct super_block *sb;
};
enum ocfs2_journal_trigger_type {
OCFS2_JTR_DI,
OCFS2_JTR_EB,
OCFS2_JTR_RB,
OCFS2_JTR_GD,
OCFS2_JTR_DB,
OCFS2_JTR_XB,
OCFS2_JTR_DQ,
OCFS2_JTR_DR,
OCFS2_JTR_DL,
OCFS2_JTR_NONE /* This must be the last entry */
};
#define OCFS2_JOURNAL_TRIGGER_COUNT OCFS2_JTR_NONE
void ocfs2_initialize_journal_triggers(struct super_block *sb,
struct ocfs2_triggers triggers[]);
struct ocfs2_journal;
struct ocfs2_slot_info;
struct ocfs2_recovery_map;
@ -351,6 +375,9 @@ struct ocfs2_super
struct ocfs2_journal *journal;
unsigned long osb_commit_interval;
/* Journal triggers for checksum */
struct ocfs2_triggers s_journal_triggers[OCFS2_JOURNAL_TRIGGER_COUNT];
struct delayed_work la_enable_wq;
/*

View File

@ -1075,9 +1075,11 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
debugfs_create_file("fs_state", S_IFREG|S_IRUSR, osb->osb_debug_root,
osb, &ocfs2_osb_debug_fops);
if (ocfs2_meta_ecc(osb))
if (ocfs2_meta_ecc(osb)) {
ocfs2_initialize_journal_triggers(sb, osb->s_journal_triggers);
ocfs2_blockcheck_stats_debugfs_install( &osb->osb_ecc_stats,
osb->osb_debug_root);
}
status = ocfs2_mount_volume(sb);
if (status < 0)

View File

@ -21,6 +21,8 @@ enum kcov_mode {
KCOV_MODE_TRACE_PC = 2,
/* Collecting comparison operands mode. */
KCOV_MODE_TRACE_CMP = 3,
/* The process owns a KCOV remote reference. */
KCOV_MODE_REMOTE = 4,
};
#define KCOV_IN_CTXSW (1 << 30)

View File

@ -3779,13 +3779,6 @@ static inline bool want_init_on_free(void)
&init_on_free);
}
DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON, init_mlocked_on_free);
static inline bool want_init_mlocked_on_free(void)
{
return static_branch_maybe(CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON,
&init_mlocked_on_free);
}
extern bool _debug_pagealloc_enabled_early;
DECLARE_STATIC_KEY_FALSE(_debug_pagealloc_enabled);

View File

@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
*/
static inline bool mapping_large_folio_support(struct address_space *mapping)
{
/* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
"Anonymous mapping always supports large folio");
return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
}

View File

@ -37,6 +37,9 @@ static inline union codetag_ref *get_page_tag_ref(struct page *page)
static inline void put_page_tag_ref(union codetag_ref *ref)
{
if (WARN_ON(!ref))
return;
page_ext_put(page_ext_from_codetag_ref(ref));
}
@ -102,10 +105,12 @@ static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
union codetag_ref *ref = get_page_tag_ref(page);
alloc_tag_sub_check(ref);
if (ref && ref->ct)
if (ref) {
if (ref->ct)
tag = ct_to_alloc_tag(ref->ct);
put_page_tag_ref(ref);
}
}
return tag;
}

View File

@ -883,7 +883,7 @@ config GCC10_NO_ARRAY_BOUNDS
config CC_NO_ARRAY_BOUNDS
bool
default y if CC_IS_GCC && GCC_VERSION >= 100000 && GCC10_NO_ARRAY_BOUNDS
default y if CC_IS_GCC && GCC_VERSION >= 90000 && GCC10_NO_ARRAY_BOUNDS
# Currently, disable -Wstringop-overflow for GCC globally.
config GCC_NO_STRINGOP_OVERFLOW

View File

@ -18,7 +18,9 @@
#include <linux/mm.h>
#include "gcov.h"
#if (__GNUC__ >= 10)
#if (__GNUC__ >= 14)
#define GCOV_COUNTERS 9
#elif (__GNUC__ >= 10)
#define GCOV_COUNTERS 8
#elif (__GNUC__ >= 7)
#define GCOV_COUNTERS 9

View File

@ -632,6 +632,7 @@ static int kcov_ioctl_locked(struct kcov *kcov, unsigned int cmd,
return -EINVAL;
kcov->mode = mode;
t->kcov = kcov;
t->kcov_mode = KCOV_MODE_REMOTE;
kcov->t = t;
kcov->remote = true;
kcov->remote_size = remote_arg->area_size;

View File

@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
*/
do {
clear_thread_flag(TIF_SIGPENDING);
clear_thread_flag(TIF_NOTIFY_SIGNAL);
rc = kernel_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD);

View File

@ -227,6 +227,7 @@ struct page_ext_operations page_alloc_tagging_ops = {
};
EXPORT_SYMBOL(page_alloc_tagging_ops);
#ifdef CONFIG_SYSCTL
static struct ctl_table memory_allocation_profiling_sysctls[] = {
{
.procname = "mem_profiling",
@ -241,6 +242,17 @@ static struct ctl_table memory_allocation_profiling_sysctls[] = {
{ }
};
static void __init sysctl_init(void)
{
if (!mem_profiling_support)
memory_allocation_profiling_sysctls[0].mode = 0444;
register_sysctl_init("vm", memory_allocation_profiling_sysctls);
}
#else /* CONFIG_SYSCTL */
static inline void sysctl_init(void) {}
#endif /* CONFIG_SYSCTL */
static int __init alloc_tag_init(void)
{
const struct codetag_type_desc desc = {
@ -253,9 +265,7 @@ static int __init alloc_tag_init(void)
if (IS_ERR(alloc_tag_cttype))
return PTR_ERR(alloc_tag_cttype);
if (!mem_profiling_support)
memory_allocation_profiling_sysctls[0].mode = 0444;
register_sysctl_init("vm", memory_allocation_profiling_sysctls);
sysctl_init();
procfs_init();
return 0;

View File

@ -40,22 +40,7 @@
* Please refer Documentation/mm/arch_pgtable_helpers.rst for the semantics
* expectations that are being validated here. All future changes in here
* or the documentation need to be in sync.
*
* On s390 platform, the lower 4 bits are used to identify given page table
* entry type. But these bits might affect the ability to clear entries with
* pxx_clear() because of how dynamic page table folding works on s390. So
* while loading up the entries do not change the lower 4 bits. It does not
* have affect any other platform. Also avoid the 62nd bit on ppc64 that is
* used to mark a pte entry.
*/
#define S390_SKIP_MASK GENMASK(3, 0)
#if __BITS_PER_LONG == 64
#define PPC64_SKIP_MASK GENMASK(62, 62)
#else
#define PPC64_SKIP_MASK 0x0
#endif
#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
#define RANDOM_NZVALUE GENMASK(7, 0)
struct pgtable_debug_args {
@ -511,8 +496,7 @@ static void __init pud_clear_tests(struct pgtable_debug_args *args)
return;
pr_debug("Validating PUD clear\n");
pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
WRITE_ONCE(*args->pudp, pud);
WARN_ON(pud_none(pud));
pud_clear(args->pudp);
pud = READ_ONCE(*args->pudp);
WARN_ON(!pud_none(pud));
@ -548,8 +532,7 @@ static void __init p4d_clear_tests(struct pgtable_debug_args *args)
return;
pr_debug("Validating P4D clear\n");
p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE);
WRITE_ONCE(*args->p4dp, p4d);
WARN_ON(p4d_none(p4d));
p4d_clear(args->p4dp);
p4d = READ_ONCE(*args->p4dp);
WARN_ON(!p4d_none(p4d));
@ -582,8 +565,7 @@ static void __init pgd_clear_tests(struct pgtable_debug_args *args)
return;
pr_debug("Validating PGD clear\n");
pgd = __pgd(pgd_val(pgd) | RANDOM_ORVALUE);
WRITE_ONCE(*args->pgdp, pgd);
WARN_ON(pgd_none(pgd));
pgd_clear(args->pgdp);
pgd = READ_ONCE(*args->pgdp);
WARN_ON(!pgd_none(pgd));
@ -634,10 +616,8 @@ static void __init pte_clear_tests(struct pgtable_debug_args *args)
if (WARN_ON(!args->ptep))
return;
#ifndef CONFIG_RISCV
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
#endif
set_pte_at(args->mm, args->vaddr, args->ptep, pte);
WARN_ON(pte_none(pte));
flush_dcache_page(page);
barrier();
ptep_clear(args->mm, args->vaddr, args->ptep);
@ -650,8 +630,7 @@ static void __init pmd_clear_tests(struct pgtable_debug_args *args)
pmd_t pmd = READ_ONCE(*args->pmdp);
pr_debug("Validating PMD clear\n");
pmd = __pmd(pmd_val(pmd) | RANDOM_ORVALUE);
WRITE_ONCE(*args->pmdp, pmd);
WARN_ON(pmd_none(pmd));
pmd_clear(args->pmdp);
pmd = READ_ONCE(*args->pmdp);
WARN_ON(!pmd_none(pmd));

View File

@ -3009,30 +3009,36 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
if (new_order >= folio_order(folio))
return -EINVAL;
/* Cannot split anonymous THP to order-1 */
if (new_order == 1 && folio_test_anon(folio)) {
if (folio_test_anon(folio)) {
/* order-1 is not supported for anonymous THP. */
if (new_order == 1) {
VM_WARN_ONCE(1, "Cannot split to order-1 folio");
return -EINVAL;
}
if (new_order) {
/* Only swapping a whole PMD-mapped folio is supported */
if (folio_test_swapcache(folio))
return -EINVAL;
} else if (new_order) {
/* Split shmem folio to non-zero order not supported */
if (shmem_mapping(folio->mapping)) {
VM_WARN_ONCE(1,
"Cannot split shmem folio to non-0 order");
return -EINVAL;
}
/* No split if the file system does not support large folio */
if (!mapping_large_folio_support(folio->mapping)) {
/*
* No split if the file system does not support large folio.
* Note that we might still have THPs in such mappings due to
* CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
* does not actually support large folios properly.
*/
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
!mapping_large_folio_support(folio->mapping)) {
VM_WARN_ONCE(1,
"Cannot split file folio to non-0 order");
return -EINVAL;
}
}
/* Only swapping a whole PMD-mapped folio is supported */
if (folio_test_swapcache(folio) && new_order)
return -EINVAL;
is_hzp = is_huge_zero_folio(folio);
if (is_hzp) {

View File

@ -588,7 +588,6 @@ extern void __putback_isolated_page(struct page *page, unsigned int order,
extern void memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order);
extern void __free_pages_core(struct page *page, unsigned int order);
extern void kernel_init_pages(struct page *page, int numpages);
/*
* This will have no effect, other than possibly generating a warning, if the

View File

@ -7745,8 +7745,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
* @new: Replacement folio.
*
* Charge @new as a replacement folio for @old. @old will
* be uncharged upon free. This is only used by the page cache
* (in replace_page_cache_folio()).
* be uncharged upon free.
*
* Both folios must be locked, @new->mapping must be set up.
*/

View File

@ -1507,12 +1507,6 @@ static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb,
if (unlikely(folio_mapcount(folio) < 0))
print_bad_pte(vma, addr, ptent, page);
}
if (want_init_mlocked_on_free() && folio_test_mlocked(folio) &&
!delay_rmap && folio_test_anon(folio)) {
kernel_init_pages(page, folio_nr_pages(folio));
}
if (unlikely(__tlb_remove_folio_pages(tlb, page, nr, delay_rmap))) {
*force_flush = true;
*force_break = true;
@ -5106,10 +5100,16 @@ static void numa_rebuild_large_mapping(struct vm_fault *vmf, struct vm_area_stru
bool ignore_writable, bool pte_write_upgrade)
{
int nr = pte_pfn(fault_pte) - folio_pfn(folio);
unsigned long start = max(vmf->address - nr * PAGE_SIZE, vma->vm_start);
unsigned long end = min(vmf->address + (folio_nr_pages(folio) - nr) * PAGE_SIZE, vma->vm_end);
pte_t *start_ptep = vmf->pte - (vmf->address - start) / PAGE_SIZE;
unsigned long addr;
unsigned long start, end, addr = vmf->address;
unsigned long addr_start = addr - (nr << PAGE_SHIFT);
unsigned long pt_start = ALIGN_DOWN(addr, PMD_SIZE);
pte_t *start_ptep;
/* Stay within the VMA and within the page table. */
start = max3(addr_start, pt_start, vma->vm_start);
end = min3(addr_start + folio_size(folio), pt_start + PMD_SIZE,
vma->vm_end);
start_ptep = vmf->pte - ((addr - start) >> PAGE_SHIFT);
/* Restore all PTEs' mapping of the large folio */
for (addr = start; addr != end; start_ptep++, addr += PAGE_SIZE) {

View File

@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from,
/*
* The rare folio on the deferred split list should
* be split now. It should not count as a failure.
* be split now. It should not count as a failure:
* but increment nr_failed because, without doing so,
* migrate_pages() may report success with (split but
* unmigrated) pages still on its fromlist; whereas it
* always reports success when its fromlist is empty.
*
* Only check it without removing it from the list.
* Since the folio can be on deferred_split_scan()
* local list and removing it can cause the local list
@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from,
if (nr_pages > 2 &&
!list_empty(&folio->_deferred_list)) {
if (try_split_folio(folio, split_folios) == 0) {
nr_failed++;
stats->nr_thp_split += is_thp;
stats->nr_split++;
continue;

View File

@ -2523,9 +2523,6 @@ EXPORT_SYMBOL(init_on_alloc);
DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
EXPORT_SYMBOL(init_on_free);
DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON, init_mlocked_on_free);
EXPORT_SYMBOL(init_mlocked_on_free);
static bool _init_on_alloc_enabled_early __read_mostly
= IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON);
static int __init early_init_on_alloc(char *buf)
@ -2543,14 +2540,6 @@ static int __init early_init_on_free(char *buf)
}
early_param("init_on_free", early_init_on_free);
static bool _init_mlocked_on_free_enabled_early __read_mostly
= IS_ENABLED(CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON);
static int __init early_init_mlocked_on_free(char *buf)
{
return kstrtobool(buf, &_init_mlocked_on_free_enabled_early);
}
early_param("init_mlocked_on_free", early_init_mlocked_on_free);
DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled);
/*
@ -2578,21 +2567,12 @@ static void __init mem_debugging_and_hardening_init(void)
}
#endif
if ((_init_on_alloc_enabled_early || _init_on_free_enabled_early ||
_init_mlocked_on_free_enabled_early) &&
if ((_init_on_alloc_enabled_early || _init_on_free_enabled_early) &&
page_poisoning_requested) {
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_alloc, init_on_free "
"and init_mlocked_on_free\n");
"will take precedence over init_on_alloc and init_on_free\n");
_init_on_alloc_enabled_early = false;
_init_on_free_enabled_early = false;
_init_mlocked_on_free_enabled_early = false;
}
if (_init_mlocked_on_free_enabled_early && _init_on_free_enabled_early) {
pr_info("mem auto-init: init_on_free is on, "
"will take precedence over init_mlocked_on_free\n");
_init_mlocked_on_free_enabled_early = false;
}
if (_init_on_alloc_enabled_early) {
@ -2609,17 +2589,9 @@ static void __init mem_debugging_and_hardening_init(void)
static_branch_disable(&init_on_free);
}
if (_init_mlocked_on_free_enabled_early) {
want_check_pages = true;
static_branch_enable(&init_mlocked_on_free);
} else {
static_branch_disable(&init_mlocked_on_free);
}
if (IS_ENABLED(CONFIG_KMSAN) && (_init_on_alloc_enabled_early ||
_init_on_free_enabled_early || _init_mlocked_on_free_enabled_early))
pr_info("mem auto-init: please make sure init_on_alloc, init_on_free and "
"init_mlocked_on_free are disabled when running KMSAN\n");
if (IS_ENABLED(CONFIG_KMSAN) &&
(_init_on_alloc_enabled_early || _init_on_free_enabled_early))
pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
#ifdef CONFIG_DEBUG_PAGEALLOC
if (debug_pagealloc_enabled()) {
@ -2658,10 +2630,9 @@ static void __init report_meminit(void)
else
stack = "off";
pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s, mlocked free:%s\n",
pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
want_init_on_free() ? "on" : "off",
want_init_mlocked_on_free() ? "on" : "off");
want_init_on_free() ? "on" : "off");
if (want_init_on_free())
pr_info("mem auto-init: clearing system memory may take some time...\n");
}

View File

@ -1016,7 +1016,7 @@ static inline bool should_skip_kasan_poison(struct page *page)
return page_kasan_tag(page) == KASAN_TAG_KERNEL;
}
void kernel_init_pages(struct page *page, int numpages)
static void kernel_init_pages(struct page *page, int numpages)
{
int i;

View File

@ -73,6 +73,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
if (!page_ext)
return;
BUG_ON(PageSlab(page));
anon = PageAnon(page);
@ -110,6 +113,9 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
if (!page_ext)
return;
BUG_ON(PageSlab(page));
anon = PageAnon(page);
@ -140,7 +146,10 @@ void __page_table_check_zero(struct page *page, unsigned int order)
BUG_ON(PageSlab(page));
page_ext = page_ext_get(page);
BUG_ON(!page_ext);
if (!page_ext)
return;
for (i = 0; i < (1ul << order); i++) {
struct page_table_check *ptc = get_page_table_check(page_ext);

View File

@ -1786,7 +1786,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
xa_lock_irq(&swap_mapping->i_pages);
error = shmem_replace_entry(swap_mapping, swap_index, old, new);
if (!error) {
mem_cgroup_migrate(old, new);
mem_cgroup_replace_folio(old, new);
__lruvec_stat_mod_folio(new, NR_FILE_PAGES, 1);
__lruvec_stat_mod_folio(new, NR_SHMEM, 1);
__lruvec_stat_mod_folio(old, NR_FILE_PAGES, -1);

View File

@ -255,21 +255,6 @@ config INIT_ON_FREE_DEFAULT_ON
touching "cold" memory areas. Most cases see 3-5% impact. Some
synthetic workloads have measured as high as 8%.
config INIT_MLOCKED_ON_FREE_DEFAULT_ON
bool "Enable mlocked memory zeroing on free"
depends on !KMSAN
help
This config has the effect of setting "init_mlocked_on_free=1"
on the kernel command line. If it is enabled, all mlocked process
memory is zeroed when freed. This restriction to mlocked memory
improves performance over "init_on_free" but can still be used to
protect confidential data like key material from content exposures
to other processes, as well as live forensics and cold boot attacks.
Any non-mlocked memory is not cleared before it is reassigned. This
configuration can be overwritten by setting "init_mlocked_on_free=0"
on the command line. The "init_on_free" boot option takes
precedence over "init_mlocked_on_free".
config CC_HAS_ZERO_CALL_USED_REGS
def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
# https://github.com/ClangBuiltLinux/linux/issues/1766

View File

@ -67,7 +67,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error: munmap failed!?\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() 5*PAGE_SIZE at base\n");
addr = base_addr + page_size;
size = 3 * page_size;
@ -76,7 +77,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error: first mmap() failed unexpectedly\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() 3*PAGE_SIZE at base+PAGE_SIZE\n");
/*
* Exact same mapping again:
@ -93,7 +95,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:1: mmap() succeeded when it shouldn't have\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() 5*PAGE_SIZE at base\n");
/*
* Second mapping contained within first:
@ -111,7 +114,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:2: mmap() succeeded when it shouldn't have\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() 2*PAGE_SIZE at base+PAGE_SIZE\n");
/*
* Overlap end of existing mapping:
@ -128,7 +132,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:3: mmap() succeeded when it shouldn't have\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() 2*PAGE_SIZE at base+(3*PAGE_SIZE)\n");
/*
* Overlap start of existing mapping:
@ -145,7 +150,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:4: mmap() succeeded when it shouldn't have\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() 2*PAGE_SIZE bytes at base\n");
/*
* Adjacent to start of existing mapping:
@ -162,7 +168,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:5: mmap() failed when it shouldn't have\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() PAGE_SIZE at base\n");
/*
* Adjacent to end of existing mapping:
@ -179,7 +186,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:6: mmap() failed when it shouldn't have\n");
}
ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
ksft_test_result_pass("mmap() PAGE_SIZE at base+(4*PAGE_SIZE)\n");
addr = base_addr;
size = 5 * page_size;