slab changes for 6.9

-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmXwH0wACgkQu+CwddJF
 iJq3HAf6A/0m0pSr0QDcwjM8D7TVYQJ+Z/jPC6Mj+HfTcF8Otrgk8c0M6EsHGIGF
 GQNnYJRKmBla3mpVFvDtsVZuiakEtRLCpoP5n23s8p8gY9ibJcl6bpn9NaMVMKrq
 kBnhQ9VdLAgKVcTH8wz6jJqdWiZ7W4jGH5NWO+nr+r0H7vay7jfB0+tur1NO8J09
 HE5I76XE6ArRvaKYxvsZmOx1pihSmsJ7CerXN6Y8U5qcuxNXdUO/9rf+uv5llDIV
 gl54UAU79koZ9k88t5AiSKO2IZVhBgC/j66ds9MRRAFCf/ldxUtJIlsHTOnumfmy
 FApqwtR0MYNPeMPZpzogQbv58oOcNw==
 =XDxn
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Freelist loading optimization (Chengming Zhou)

   When the per-cpu slab is depleted and a new one loaded from the cpu
   partial list, optimize the loading to avoid an irq enable/disable
   cycle. This results in a 3.5% performance improvement on the "perf
   bench sched messaging" test.

 - Kernel boot parameters cleanup after SLAB removal (Xiongwei Song)

   Due to two different main slab implementations we've had boot
   parameters prefixed either slab_ and slub_ with some later becoming
   an alias as both implementations gained the same functionality (i.e.
   slab_nomerge vs slub_nomerge). In order to eventually get rid of the
   implementation-specific names, the canonical and documented
   parameters are now all prefixed slab_ and the slub_ variants become
   deprecated but still working aliases.

 - SLAB_ kmem_cache creation flags cleanup (Vlastimil Babka)

   The flags had hardcoded #define values which became tedious and
   error-prone when adding new ones. Assign the values via an enum that
   takes care of providing unique bit numbers. Also deprecate
   SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since
   SLAB removal. Assign it an explicit zero value. The removals of the
   flag usage are handled independently in the respective subsystems,
   with a final removal of any leftover usage planned for the next
   release.

 - Misc cleanups and fixes (Chengming Zhou, Xiaolei Wang, Zheng Yejian)

   Includes removal of unused code or function parameters and a fix of a
   memleak.

* tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: remove PARTIAL_NODE slab_state
  mm, slab: remove memcg_from_slab_obj()
  mm, slab: remove the corner case of inc_slabs_node()
  mm/slab: Fix a kmemleak in kmem_cache_destroy()
  mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE
  mm, slab: use an enum to define SLAB_ cache creation flags
  mm, slab: deprecate SLAB_MEM_SPREAD flag
  mm, slab: fix the comment of cpu partial list
  mm, slab: remove unused object_size parameter in kmem_cache_flags()
  mm/slub: remove parameter 'flags' in create_kmalloc_caches()
  mm/slub: remove unused parameter in next_freelist_entry()
  mm/slub: remove full list manipulation for non-debug slab
  mm/slub: directly load freelist from cpu partial slab in the likely case
  mm/slub: make the description of slab_min_objects helpful in doc
  mm/slub: replace slub_$params with slab_$params in slub.rst
  mm/slub: unify all sl[au]b parameters with "slab_$param"
  Documentation: kernel-parameters: remove noaliencache
This commit is contained in:
Linus Torvalds 2024-03-12 20:14:54 -07:00
commit 0ea680eda6
11 changed files with 212 additions and 217 deletions

View File

@ -3771,10 +3771,6 @@
no5lvl [X86-64,RISCV,EARLY] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.
noaliencache [MM, NUMA, SLAB] Disables the allocation of alien
caches in the slab allocator. Saves per-node memory,
but will impact performance.
noalign [KNL,ARM]
noaltinstr [S390,EARLY] Disables alternative instructions
@ -5930,11 +5926,42 @@
simeth= [IA-64]
simscsi=
slram= [HW,MTD]
slab_debug[=options[,slabs][;[options[,slabs]]...] [MM]
Enabling slab_debug allows one to determine the
culprit if slab objects become corrupted. Enabling
slab_debug can create guard zones around objects and
may poison objects when not in use. Also tracks the
last alloc / free. For more information see
Documentation/mm/slub.rst.
(slub_debug legacy name also accepted for now)
slab_max_order= [MM]
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. For more information see
Documentation/mm/slub.rst.
(slub_max_order legacy name also accepted for now)
slab_merge [MM]
Enable merging of slabs with similar size when the
kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
(slub_merge legacy name also accepted for now)
slab_min_objects= [MM]
The minimum number of objects per slab. SLUB will
increase the slab order up to slab_max_order to
generate a sufficiently large slab able to contain
the number of objects indicated. The higher the number
of objects the smaller the overhead of tracking slabs
and the less frequently locks need to be acquired.
For more information see Documentation/mm/slub.rst.
(slub_min_objects legacy name also accepted for now)
slab_min_order= [MM]
Determines the minimum page order for slabs. Must be
lower or equal to slab_max_order. For more information see
Documentation/mm/slub.rst.
(slub_min_order legacy name also accepted for now)
slab_nomerge [MM]
Disable merging of slabs with similar size. May be
@ -5948,47 +5975,9 @@
unchanged). Debug options disable merging on their
own.
For more information see Documentation/mm/slub.rst.
(slub_nomerge legacy name also accepted for now)
slab_max_order= [MM, SLAB]
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. Defaults to 1 for systems with
more than 32MB of RAM, 0 otherwise.
slub_debug[=options[,slabs][;[options[,slabs]]...] [MM, SLUB]
Enabling slub_debug allows one to determine the
culprit if slab objects become corrupted. Enabling
slub_debug can create guard zones around objects and
may poison objects when not in use. Also tracks the
last alloc / free. For more information see
Documentation/mm/slub.rst.
slub_max_order= [MM, SLUB]
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. For more information see
Documentation/mm/slub.rst.
slub_min_objects= [MM, SLUB]
The minimum number of objects per slab. SLUB will
increase the slab order up to slub_max_order to
generate a sufficiently large slab able to contain
the number of objects indicated. The higher the number
of objects the smaller the overhead of tracking slabs
and the less frequently locks need to be acquired.
For more information see Documentation/mm/slub.rst.
slub_min_order= [MM, SLUB]
Determines the minimum page order for slabs. Must be
lower than slub_max_order.
For more information see Documentation/mm/slub.rst.
slub_merge [MM, SLUB]
Same with slab_merge.
slub_nomerge [MM, SLUB]
Same with slab_nomerge. This is supported for legacy.
See slab_nomerge for more information.
slram= [HW,MTD]
smart2= [HW]
Format: <io1>[,<io2>[,...,<io8>]]

View File

@ -9,7 +9,7 @@ SLUB can enable debugging only for selected slabs in order to avoid
an impact on overall system performance which may make a bug more
difficult to find.
In order to switch debugging on one can add an option ``slub_debug``
In order to switch debugging on one can add an option ``slab_debug``
to the kernel command line. That will enable full debugging for
all slabs.
@ -26,16 +26,16 @@ be enabled on the command line. F.e. no tracking information will be
available without debugging on and validation can only partially
be performed if debugging was not switched on.
Some more sophisticated uses of slub_debug:
Some more sophisticated uses of slab_debug:
-------------------------------------------
Parameters may be given to ``slub_debug``. If none is specified then full
Parameters may be given to ``slab_debug``. If none is specified then full
debugging is enabled. Format:
slub_debug=<Debug-Options>
slab_debug=<Debug-Options>
Enable options for all slabs
slub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
slab_debug=<Debug-Options>,<slab name1>,<slab name2>,...
Enable options only for select slabs (no spaces
after a comma)
@ -60,23 +60,23 @@ Possible debug options are::
F.e. in order to boot just with sanity checks and red zoning one would specify::
slub_debug=FZ
slab_debug=FZ
Trying to find an issue in the dentry cache? Try::
slub_debug=,dentry
slab_debug=,dentry
to only enable debugging on the dentry cache. You may use an asterisk at the
end of the slab name, in order to cover all slabs with the same prefix. For
example, here's how you can poison the dentry cache as well as all kmalloc
slabs::
slub_debug=P,kmalloc-*,dentry
slab_debug=P,kmalloc-*,dentry
Red zoning and tracking may realign the slab. We can just apply sanity checks
to the dentry cache with::
slub_debug=F,dentry
slab_debug=F,dentry
Debugging options may require the minimum possible slab order to increase as
a result of storing the metadata (for example, caches with PAGE_SIZE object
@ -84,20 +84,20 @@ sizes). This has a higher liklihood of resulting in slab allocation errors
in low memory situations or if there's high fragmentation of memory. To
switch off debugging for such caches by default, use::
slub_debug=O
slab_debug=O
You can apply different options to different list of slab names, using blocks
of options. This will enable red zoning for dentry and user tracking for
kmalloc. All other slabs will not get any debugging enabled::
slub_debug=Z,dentry;U,kmalloc-*
slab_debug=Z,dentry;U,kmalloc-*
You can also enable options (e.g. sanity checks and poisoning) for all caches
except some that are deemed too performance critical and don't need to be
debugged by specifying global debug options followed by a list of slab names
with "-" as options::
slub_debug=FZ;-,zs_handle,zspage
slab_debug=FZ;-,zs_handle,zspage
The state of each debug option for a slab can be found in the respective files
under::
@ -105,7 +105,7 @@ under::
/sys/kernel/slab/<slab name>/
If the file contains 1, the option is enabled, 0 means disabled. The debug
options from the ``slub_debug`` parameter translate to the following files::
options from the ``slab_debug`` parameter translate to the following files::
F sanity_checks
Z red_zone
@ -129,7 +129,7 @@ in order to reduce overhead and increase cache hotness of objects.
Slab validation
===============
SLUB can validate all object if the kernel was booted with slub_debug. In
SLUB can validate all object if the kernel was booted with slab_debug. In
order to do so you must have the ``slabinfo`` tool. Then you can do
::
@ -150,29 +150,29 @@ list_lock once in a while to deal with partial slabs. That overhead is
governed by the order of the allocation for each slab. The allocations
can be influenced by kernel parameters:
.. slub_min_objects=x (default 4)
.. slub_min_order=x (default 0)
.. slub_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER))
.. slab_min_objects=x (default: automatically scaled by number of cpus)
.. slab_min_order=x (default 0)
.. slab_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER))
``slub_min_objects``
``slab_min_objects``
allows to specify how many objects must at least fit into one
slab in order for the allocation order to be acceptable. In
general slub will be able to perform this number of
allocations on a slab without consulting centralized resources
(list_lock) where contention may occur.
``slub_min_order``
``slab_min_order``
specifies a minimum order of slabs. A similar effect like
``slub_min_objects``.
``slab_min_objects``.
``slub_max_order``
specified the order at which ``slub_min_objects`` should no
``slab_max_order``
specified the order at which ``slab_min_objects`` should no
longer be checked. This is useful to avoid SLUB trying to
generate super large order pages to fit ``slub_min_objects``
generate super large order pages to fit ``slab_min_objects``
of a slab cache with large object sizes into one high order
page. Setting command line parameter
``debug_guardpage_minorder=N`` (N > 0), forces setting
``slub_max_order`` to 0, what cause minimum possible order of
``slab_max_order`` to 0, what cause minimum possible order of
slabs allocation.
SLUB Debug output
@ -219,7 +219,7 @@ Here is a sample of slub debug output::
FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
If SLUB encounters a corrupted object (full detection requires the kernel
to be booted with slub_debug) then the following output will be dumped
to be booted with slab_debug) then the following output will be dumped
into the syslog:
1. Description of the problem encountered
@ -239,7 +239,7 @@ into the syslog:
pid=<pid of the process>
(Object allocation / free information is only available if SLAB_STORE_USER is
set for the slab. slub_debug sets that option)
set for the slab. slab_debug sets that option)
2. The object contents if an object was involved.
@ -262,7 +262,7 @@ into the syslog:
the object boundary.
(Redzone information is only available if SLAB_RED_ZONE is set.
slub_debug sets that option)
slab_debug sets that option)
Padding <address> : <bytes>
Unused data to fill up the space in order to get the next object
@ -296,7 +296,7 @@ Emergency operations
Minimal debugging (sanity checks alone) can be enabled by booting with::
slub_debug=F
slab_debug=F
This will be generally be enough to enable the resiliency features of slub
which will keep the system running even if a bad kernel component will
@ -311,13 +311,13 @@ and enabling debugging only for that cache
I.e.::
slub_debug=F,dentry
slab_debug=F,dentry
If the corruption occurs by writing after the end of the object then it
may be advisable to enable a Redzone to avoid corrupting the beginning
of other objects::
slub_debug=FZ,dentry
slab_debug=FZ,dentry
Extended slabinfo mode and plotting
===================================

View File

@ -48,7 +48,7 @@ static void lkdtm_VMALLOC_LINEAR_OVERFLOW(void)
* correctly.
*
* This should get caught by either memory tagging, KASan, or by using
* CONFIG_SLUB_DEBUG=y and slub_debug=ZF (or CONFIG_SLUB_DEBUG_ON=y).
* CONFIG_SLUB_DEBUG=y and slab_debug=ZF (or CONFIG_SLUB_DEBUG_ON=y).
*/
static void lkdtm_SLAB_LINEAR_OVERFLOW(void)
{

View File

@ -429,7 +429,6 @@ struct kasan_cache {
};
size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object);
slab_flags_t kasan_never_merge(void);
void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
slab_flags_t *flags);
@ -446,11 +445,6 @@ static inline size_t kasan_metadata_size(struct kmem_cache *cache,
{
return 0;
}
/* And thus nothing prevents cache merging. */
static inline slab_flags_t kasan_never_merge(void)
{
return 0;
}
/* And no cache-related metadata initialization is required. */
static inline void kasan_cache_create(struct kmem_cache *cache,
unsigned int *size,

View File

@ -21,29 +21,69 @@
#include <linux/cleanup.h>
#include <linux/hash.h>
enum _slab_flag_bits {
_SLAB_CONSISTENCY_CHECKS,
_SLAB_RED_ZONE,
_SLAB_POISON,
_SLAB_KMALLOC,
_SLAB_HWCACHE_ALIGN,
_SLAB_CACHE_DMA,
_SLAB_CACHE_DMA32,
_SLAB_STORE_USER,
_SLAB_PANIC,
_SLAB_TYPESAFE_BY_RCU,
_SLAB_TRACE,
#ifdef CONFIG_DEBUG_OBJECTS
_SLAB_DEBUG_OBJECTS,
#endif
_SLAB_NOLEAKTRACE,
_SLAB_NO_MERGE,
#ifdef CONFIG_FAILSLAB
_SLAB_FAILSLAB,
#endif
#ifdef CONFIG_MEMCG_KMEM
_SLAB_ACCOUNT,
#endif
#ifdef CONFIG_KASAN_GENERIC
_SLAB_KASAN,
#endif
_SLAB_NO_USER_FLAGS,
#ifdef CONFIG_KFENCE
_SLAB_SKIP_KFENCE,
#endif
#ifndef CONFIG_SLUB_TINY
_SLAB_RECLAIM_ACCOUNT,
#endif
_SLAB_OBJECT_POISON,
_SLAB_CMPXCHG_DOUBLE,
_SLAB_FLAGS_LAST_BIT
};
#define __SLAB_FLAG_BIT(nr) ((slab_flags_t __force)(1U << (nr)))
#define __SLAB_FLAG_UNUSED ((slab_flags_t __force)(0U))
/*
* Flags to pass to kmem_cache_create().
* The ones marked DEBUG need CONFIG_SLUB_DEBUG enabled, otherwise are no-op
*/
/* DEBUG: Perform (expensive) checks on alloc/free */
#define SLAB_CONSISTENCY_CHECKS ((slab_flags_t __force)0x00000100U)
#define SLAB_CONSISTENCY_CHECKS __SLAB_FLAG_BIT(_SLAB_CONSISTENCY_CHECKS)
/* DEBUG: Red zone objs in a cache */
#define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U)
#define SLAB_RED_ZONE __SLAB_FLAG_BIT(_SLAB_RED_ZONE)
/* DEBUG: Poison objects */
#define SLAB_POISON ((slab_flags_t __force)0x00000800U)
#define SLAB_POISON __SLAB_FLAG_BIT(_SLAB_POISON)
/* Indicate a kmalloc slab */
#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U)
#define SLAB_KMALLOC __SLAB_FLAG_BIT(_SLAB_KMALLOC)
/* Align objs on cache lines */
#define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U)
#define SLAB_HWCACHE_ALIGN __SLAB_FLAG_BIT(_SLAB_HWCACHE_ALIGN)
/* Use GFP_DMA memory */
#define SLAB_CACHE_DMA ((slab_flags_t __force)0x00004000U)
#define SLAB_CACHE_DMA __SLAB_FLAG_BIT(_SLAB_CACHE_DMA)
/* Use GFP_DMA32 memory */
#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x00008000U)
#define SLAB_CACHE_DMA32 __SLAB_FLAG_BIT(_SLAB_CACHE_DMA32)
/* DEBUG: Store the last owner for bug hunting */
#define SLAB_STORE_USER ((slab_flags_t __force)0x00010000U)
#define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER)
/* Panic if kmem_cache_create() fails */
#define SLAB_PANIC ((slab_flags_t __force)0x00040000U)
#define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC)
/*
* SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS!
*
@ -95,21 +135,19 @@
* Note that SLAB_TYPESAFE_BY_RCU was originally named SLAB_DESTROY_BY_RCU.
*/
/* Defer freeing slabs to RCU */
#define SLAB_TYPESAFE_BY_RCU ((slab_flags_t __force)0x00080000U)
/* Spread some memory over cpuset */
#define SLAB_MEM_SPREAD ((slab_flags_t __force)0x00100000U)
#define SLAB_TYPESAFE_BY_RCU __SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU)
/* Trace allocations and frees */
#define SLAB_TRACE ((slab_flags_t __force)0x00200000U)
#define SLAB_TRACE __SLAB_FLAG_BIT(_SLAB_TRACE)
/* Flag to prevent checks on free */
#ifdef CONFIG_DEBUG_OBJECTS
# define SLAB_DEBUG_OBJECTS ((slab_flags_t __force)0x00400000U)
# define SLAB_DEBUG_OBJECTS __SLAB_FLAG_BIT(_SLAB_DEBUG_OBJECTS)
#else
# define SLAB_DEBUG_OBJECTS 0
# define SLAB_DEBUG_OBJECTS __SLAB_FLAG_UNUSED
#endif
/* Avoid kmemleak tracing */
#define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U)
#define SLAB_NOLEAKTRACE __SLAB_FLAG_BIT(_SLAB_NOLEAKTRACE)
/*
* Prevent merging with compatible kmem caches. This flag should be used
@ -121,25 +159,25 @@
* - performance critical caches, should be very rare and consulted with slab
* maintainers, and not used together with CONFIG_SLUB_TINY
*/
#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U)
#define SLAB_NO_MERGE __SLAB_FLAG_BIT(_SLAB_NO_MERGE)
/* Fault injection mark */
#ifdef CONFIG_FAILSLAB
# define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U)
# define SLAB_FAILSLAB __SLAB_FLAG_BIT(_SLAB_FAILSLAB)
#else
# define SLAB_FAILSLAB 0
# define SLAB_FAILSLAB __SLAB_FLAG_UNUSED
#endif
/* Account to memcg */
#ifdef CONFIG_MEMCG_KMEM
# define SLAB_ACCOUNT ((slab_flags_t __force)0x04000000U)
# define SLAB_ACCOUNT __SLAB_FLAG_BIT(_SLAB_ACCOUNT)
#else
# define SLAB_ACCOUNT 0
# define SLAB_ACCOUNT __SLAB_FLAG_UNUSED
#endif
#ifdef CONFIG_KASAN_GENERIC
#define SLAB_KASAN ((slab_flags_t __force)0x08000000U)
#define SLAB_KASAN __SLAB_FLAG_BIT(_SLAB_KASAN)
#else
#define SLAB_KASAN 0
#define SLAB_KASAN __SLAB_FLAG_UNUSED
#endif
/*
@ -147,23 +185,26 @@
* Intended for caches created for self-tests so they have only flags
* specified in the code and other flags are ignored.
*/
#define SLAB_NO_USER_FLAGS ((slab_flags_t __force)0x10000000U)
#define SLAB_NO_USER_FLAGS __SLAB_FLAG_BIT(_SLAB_NO_USER_FLAGS)
#ifdef CONFIG_KFENCE
#define SLAB_SKIP_KFENCE ((slab_flags_t __force)0x20000000U)
#define SLAB_SKIP_KFENCE __SLAB_FLAG_BIT(_SLAB_SKIP_KFENCE)
#else
#define SLAB_SKIP_KFENCE 0
#define SLAB_SKIP_KFENCE __SLAB_FLAG_UNUSED
#endif
/* The following flags affect the page allocator grouping pages by mobility */
/* Objects are reclaimable */
#ifndef CONFIG_SLUB_TINY
#define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U)
#define SLAB_RECLAIM_ACCOUNT __SLAB_FLAG_BIT(_SLAB_RECLAIM_ACCOUNT)
#else
#define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0)
#define SLAB_RECLAIM_ACCOUNT __SLAB_FLAG_UNUSED
#endif
#define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */
/* Obsolete unused flag, to be removed */
#define SLAB_MEM_SPREAD __SLAB_FLAG_UNUSED
/*
* ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
*

View File

@ -64,11 +64,11 @@ config SLUB_DEBUG_ON
help
Boot with debugging on by default. SLUB boots by default with
the runtime debug capabilities switched off. Enabling this is
equivalent to specifying the "slub_debug" parameter on boot.
equivalent to specifying the "slab_debug" parameter on boot.
There is no support for more fine grained debug control like
possible with slub_debug=xxx. SLUB debugging may be switched
possible with slab_debug=xxx. SLUB debugging may be switched
off in a kernel built with CONFIG_SLUB_DEBUG_ON by specifying
"slub_debug=-".
"slab_debug=-".
config PAGE_OWNER
bool "Track page owner"

View File

@ -334,14 +334,6 @@ DEFINE_ASAN_SET_SHADOW(f3);
DEFINE_ASAN_SET_SHADOW(f5);
DEFINE_ASAN_SET_SHADOW(f8);
/* Only allow cache merging when no per-object metadata is present. */
slab_flags_t kasan_never_merge(void)
{
if (!kasan_requires_meta())
return 0;
return SLAB_KASAN;
}
/*
* Adaptive redzone policy taken from the userspace AddressSanitizer runtime.
* For larger allocations larger redzones are used.
@ -370,15 +362,13 @@ void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
return;
/*
* SLAB_KASAN is used to mark caches that are sanitized by KASAN
* and that thus have per-object metadata.
* Currently this flag is used in two places:
* 1. In slab_ksize() to account for per-object metadata when
* calculating the size of the accessible memory within the object.
* 2. In slab_common.c via kasan_never_merge() to prevent merging of
* caches with per-object metadata.
* SLAB_KASAN is used to mark caches that are sanitized by KASAN and
* that thus have per-object metadata. Currently, this flag is used in
* slab_ksize() to account for per-object metadata when calculating the
* size of the accessible memory within the object. Additionally, we use
* SLAB_NO_MERGE to prevent merging of caches with per-object metadata.
*/
*flags |= SLAB_KASAN;
*flags |= SLAB_KASAN | SLAB_NO_MERGE;
ok_size = *size;

View File

@ -363,7 +363,6 @@ static inline int objs_per_slab(const struct kmem_cache *cache,
enum slab_state {
DOWN, /* No slab functionality yet */
PARTIAL, /* SLUB: kmem_cache_node available */
PARTIAL_NODE, /* SLAB: kmalloc size for node struct available */
UP, /* Slab caches usable but not all extras yet */
FULL /* Everything is working */
};
@ -387,7 +386,7 @@ extern const struct kmalloc_info_struct {
/* Kmalloc array related functions */
void setup_kmalloc_cache_index_table(void);
void create_kmalloc_caches(slab_flags_t);
void create_kmalloc_caches(void);
extern u8 kmalloc_size_index[24];
@ -422,8 +421,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags);
int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags);
void __init kmem_cache_init(void);
void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type,
slab_flags_t flags);
extern void create_boot_cache(struct kmem_cache *, const char *name,
unsigned int size, slab_flags_t flags,
unsigned int useroffset, unsigned int usersize);
@ -435,8 +432,7 @@ struct kmem_cache *
__kmem_cache_alias(const char *name, unsigned int size, unsigned int align,
slab_flags_t flags, void (*ctor)(void *));
slab_flags_t kmem_cache_flags(unsigned int object_size,
slab_flags_t flags, const char *name);
slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name);
static inline bool is_kmalloc_cache(struct kmem_cache *s)
{
@ -469,7 +465,6 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
SLAB_STORE_USER | \
SLAB_TRACE | \
SLAB_CONSISTENCY_CHECKS | \
SLAB_MEM_SPREAD | \
SLAB_NOLEAKTRACE | \
SLAB_RECLAIM_ACCOUNT | \
SLAB_TEMPORARY | \
@ -528,7 +523,7 @@ static inline bool __slub_debug_enabled(void)
#endif
/*
* Returns true if any of the specified slub_debug flags is enabled for the
* Returns true if any of the specified slab_debug flags is enabled for the
* cache. Use only for flags parsed by setup_slub_debug() as it also enables
* the static key.
*/

View File

@ -50,7 +50,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
*/
#define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
SLAB_FAILSLAB | SLAB_NO_MERGE | kasan_never_merge())
SLAB_FAILSLAB | SLAB_NO_MERGE)
#define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
@ -172,7 +172,7 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
size = ALIGN(size, sizeof(void *));
align = calculate_alignment(flags, align, size);
size = ALIGN(size, align);
flags = kmem_cache_flags(size, flags, name);
flags = kmem_cache_flags(flags, name);
if (flags & SLAB_NEVER_MERGE)
return NULL;
@ -282,7 +282,7 @@ kmem_cache_create_usercopy(const char *name,
#ifdef CONFIG_SLUB_DEBUG
/*
* If no slub_debug was enabled globally, the static key is not yet
* If no slab_debug was enabled globally, the static key is not yet
* enabled by setup_slub_debug(). Enable it if the cache is being
* created with any of the debugging flags passed explicitly.
* It's also possible that this is the first cache created with
@ -404,8 +404,12 @@ EXPORT_SYMBOL(kmem_cache_create);
*/
static void kmem_cache_release(struct kmem_cache *s)
{
if (slab_state >= FULL) {
sysfs_slab_unlink(s);
sysfs_slab_release(s);
} else {
slab_kmem_cache_release(s);
}
}
#else
static void kmem_cache_release(struct kmem_cache *s)
@ -766,7 +770,7 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
}
/*
* kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time.
* kmalloc_info[] is to make slab_debug=,kmalloc-xx option work at boot time.
* kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is
* kmalloc-2M.
*/
@ -853,9 +857,10 @@ static unsigned int __kmalloc_minalign(void)
return max(minalign, arch_slab_minalign());
}
void __init
new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
static void __init
new_kmalloc_cache(int idx, enum kmalloc_cache_type type)
{
slab_flags_t flags = 0;
unsigned int minalign = __kmalloc_minalign();
unsigned int aligned_size = kmalloc_info[idx].size;
int aligned_idx = idx;
@ -902,7 +907,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
* may already have been created because they were needed to
* enable allocations for slab creation.
*/
void __init create_kmalloc_caches(slab_flags_t flags)
void __init create_kmalloc_caches(void)
{
int i;
enum kmalloc_cache_type type;
@ -913,7 +918,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) {
for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
if (!kmalloc_caches[type][i])
new_kmalloc_cache(i, type, flags);
new_kmalloc_cache(i, type);
/*
* Caches that are not of the two-to-the-power-of size.
@ -922,10 +927,10 @@ void __init create_kmalloc_caches(slab_flags_t flags)
*/
if (KMALLOC_MIN_SIZE <= 32 && i == 6 &&
!kmalloc_caches[type][1])
new_kmalloc_cache(1, type, flags);
new_kmalloc_cache(1, type);
if (KMALLOC_MIN_SIZE <= 64 && i == 7 &&
!kmalloc_caches[type][2])
new_kmalloc_cache(2, type, flags);
new_kmalloc_cache(2, type);
}
}
#ifdef CONFIG_RANDOM_KMALLOC_CACHES

108
mm/slub.c
View File

@ -295,7 +295,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
/*
* Debugging flags that require metadata to be stored in the slab. These get
* disabled when slub_debug=O is used and a cache's min order increases with
* disabled when slab_debug=O is used and a cache's min order increases with
* metadata.
*/
#define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
@ -306,13 +306,13 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
/* Internal SLUB flags */
/* Poison object */
#define __OBJECT_POISON ((slab_flags_t __force)0x80000000U)
#define __OBJECT_POISON __SLAB_FLAG_BIT(_SLAB_OBJECT_POISON)
/* Use cmpxchg_double */
#ifdef system_has_freelist_aba
#define __CMPXCHG_DOUBLE ((slab_flags_t __force)0x40000000U)
#define __CMPXCHG_DOUBLE __SLAB_FLAG_BIT(_SLAB_CMPXCHG_DOUBLE)
#else
#define __CMPXCHG_DOUBLE ((slab_flags_t __force)0U)
#define __CMPXCHG_DOUBLE __SLAB_FLAG_UNUSED
#endif
/*
@ -391,7 +391,7 @@ struct kmem_cache_cpu {
};
struct slab *slab; /* The slab from which we are allocating */
#ifdef CONFIG_SLUB_CPU_PARTIAL
struct slab *partial; /* Partially allocated frozen slabs */
struct slab *partial; /* Partially allocated slabs */
#endif
local_lock_t lock; /* Protects the fields above */
#ifdef CONFIG_SLUB_STATS
@ -1498,16 +1498,8 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
{
struct kmem_cache_node *n = get_node(s, node);
/*
* May be called early in order to allocate a slab for the
* kmem_cache_node structure. Solve the chicken-egg
* dilemma by deferring the increment of the count during
* bootstrap (see early_kmem_cache_node_alloc).
*/
if (likely(n)) {
atomic_long_inc(&n->nr_slabs);
atomic_long_add(objects, &n->total_objects);
}
}
static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
{
@ -1616,7 +1608,7 @@ static inline int free_consistency_checks(struct kmem_cache *s,
}
/*
* Parse a block of slub_debug options. Blocks are delimited by ';'
* Parse a block of slab_debug options. Blocks are delimited by ';'
*
* @str: start of block
* @flags: returns parsed flags, or DEBUG_DEFAULT_FLAGS if none specified
@ -1677,7 +1669,7 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init)
break;
default:
if (init)
pr_err("slub_debug option '%c' unknown. skipped\n", *str);
pr_err("slab_debug option '%c' unknown. skipped\n", *str);
}
}
check_slabs:
@ -1736,7 +1728,7 @@ static int __init setup_slub_debug(char *str)
/*
* For backwards compatibility, a single list of flags with list of
* slabs means debugging is only changed for those slabs, so the global
* slub_debug should be unchanged (0 or DEBUG_DEFAULT_FLAGS, depending
* slab_debug should be unchanged (0 or DEBUG_DEFAULT_FLAGS, depending
* on CONFIG_SLUB_DEBUG_ON). We can extended that to multiple lists as
* long as there is no option specifying flags without a slab list.
*/
@ -1760,21 +1752,20 @@ out:
return 1;
}
__setup("slub_debug", setup_slub_debug);
__setup("slab_debug", setup_slub_debug);
__setup_param("slub_debug", slub_debug, setup_slub_debug, 0);
/*
* kmem_cache_flags - apply debugging options to the cache
* @object_size: the size of an object without meta data
* @flags: flags to set
* @name: name of the cache
*
* Debug option(s) are applied to @flags. In addition to the debug
* option(s), if a slab name (or multiple) is specified i.e.
* slub_debug=<Debug-Options>,<slab name1>,<slab name2> ...
* slab_debug=<Debug-Options>,<slab name1>,<slab name2> ...
* then only the select slabs will receive the debug option(s).
*/
slab_flags_t kmem_cache_flags(unsigned int object_size,
slab_flags_t flags, const char *name)
slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name)
{
char *iter;
size_t len;
@ -1850,8 +1841,7 @@ static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n,
struct slab *slab) {}
static inline void remove_full(struct kmem_cache *s, struct kmem_cache_node *n,
struct slab *slab) {}
slab_flags_t kmem_cache_flags(unsigned int object_size,
slab_flags_t flags, const char *name)
slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name)
{
return flags;
}
@ -2038,11 +2028,6 @@ void memcg_slab_alloc_error_hook(struct kmem_cache *s, int objects,
obj_cgroup_uncharge(objcg, objects * obj_full_size(s));
}
#else /* CONFIG_MEMCG_KMEM */
static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
{
return NULL;
}
static inline void memcg_free_slab_cgroups(struct slab *slab)
{
}
@ -2243,7 +2228,7 @@ static void __init init_freelist_randomization(void)
}
/* Get the next entry on the pre-computed freelist randomized */
static void *next_freelist_entry(struct kmem_cache *s, struct slab *slab,
static void *next_freelist_entry(struct kmem_cache *s,
unsigned long *pos, void *start,
unsigned long page_limit,
unsigned long freelist_count)
@ -2282,13 +2267,12 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
start = fixup_red_left(s, slab_address(slab));
/* First entry is used as the base of the freelist */
cur = next_freelist_entry(s, slab, &pos, start, page_limit,
freelist_count);
cur = next_freelist_entry(s, &pos, start, page_limit, freelist_count);
cur = setup_object(s, cur);
slab->freelist = cur;
for (idx = 1; idx < slab->objects; idx++) {
next = next_freelist_entry(s, slab, &pos, start, page_limit,
next = next_freelist_entry(s, &pos, start, page_limit,
freelist_count);
next = setup_object(s, next);
set_freepointer(s, cur, next);
@ -3263,7 +3247,7 @@ slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
oo_order(s->min));
if (oo_order(s->min) > get_order(s->object_size))
pr_warn(" %s debugging increased min order, use slub_debug=O to disable.\n",
pr_warn(" %s debugging increased min order, use slab_debug=O to disable.\n",
s->name);
for_each_kmem_cache_node(s, node, n) {
@ -3326,7 +3310,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
counters = slab->counters;
new.counters = counters;
VM_BUG_ON(!new.frozen);
new.inuse = slab->objects;
new.frozen = freelist != NULL;
@ -3498,18 +3481,20 @@ new_slab:
slab = slub_percpu_partial(c);
slub_set_percpu_partial(c, slab);
local_unlock_irqrestore(&s->cpu_slab->lock, flags);
stat(s, CPU_PARTIAL_ALLOC);
if (unlikely(!node_match(slab, node) ||
!pfmemalloc_match(slab, gfpflags))) {
slab->next = NULL;
__put_partials(s, slab);
continue;
if (likely(node_match(slab, node) &&
pfmemalloc_match(slab, gfpflags))) {
c->slab = slab;
freelist = get_freelist(s, slab);
VM_BUG_ON(!freelist);
stat(s, CPU_PARTIAL_ALLOC);
goto load_freelist;
}
freelist = freeze_slab(s, slab);
goto retry_load_slab;
local_unlock_irqrestore(&s->cpu_slab->lock, flags);
slab->next = NULL;
__put_partials(s, slab);
}
#endif
@ -3792,11 +3777,11 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
zero_size = orig_size;
/*
* When slub_debug is enabled, avoid memory initialization integrated
* When slab_debug is enabled, avoid memory initialization integrated
* into KASAN and instead zero out the memory via the memset below with
* the proper size. Otherwise, KASAN might overwrite SLUB redzones and
* cause false-positive reports. This does not lead to a performance
* penalty on production builds, as slub_debug is not intended to be
* penalty on production builds, as slab_debug is not intended to be
* enabled there.
*/
if (__slub_debug_enabled())
@ -4187,7 +4172,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
* then add it.
*/
if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) {
remove_full(s, n, slab);
add_partial(n, slab, DEACTIVATE_TO_TAIL);
stat(s, FREE_ADD_PARTIAL);
}
@ -4201,9 +4185,6 @@ slab_empty:
*/
remove_partial(n, slab);
stat(s, FREE_REMOVE_PARTIAL);
} else {
/* Slab must be on the full list */
remove_full(s, n, slab);
}
spin_unlock_irqrestore(&n->list_lock, flags);
@ -4702,8 +4683,8 @@ static unsigned int slub_min_objects;
* activity on the partial lists which requires taking the list_lock. This is
* less a concern for large slabs though which are rarely used.
*
* slub_max_order specifies the order where we begin to stop considering the
* number of objects in a slab as critical. If we reach slub_max_order then
* slab_max_order specifies the order where we begin to stop considering the
* number of objects in a slab as critical. If we reach slab_max_order then
* we try to keep the page order as low as possible. So we accept more waste
* of space in favor of a small page order.
*
@ -4770,14 +4751,14 @@ static inline int calculate_order(unsigned int size)
* and backing off gradually.
*
* We start with accepting at most 1/16 waste and try to find the
* smallest order from min_objects-derived/slub_min_order up to
* slub_max_order that will satisfy the constraint. Note that increasing
* smallest order from min_objects-derived/slab_min_order up to
* slab_max_order that will satisfy the constraint. Note that increasing
* the order can only result in same or less fractional waste, not more.
*
* If that fails, we increase the acceptable fraction of waste and try
* again. The last iteration with fraction of 1/2 would effectively
* accept any waste and give us the order determined by min_objects, as
* long as at least single object fits within slub_max_order.
* long as at least single object fits within slab_max_order.
*/
for (unsigned int fraction = 16; fraction > 1; fraction /= 2) {
order = calc_slab_order(size, min_order, slub_max_order,
@ -4787,7 +4768,7 @@ static inline int calculate_order(unsigned int size)
}
/*
* Doh this slab cannot be placed using slub_max_order.
* Doh this slab cannot be placed using slab_max_order.
*/
order = get_order(size);
if (order <= MAX_PAGE_ORDER)
@ -4857,7 +4838,6 @@ static void early_kmem_cache_node_alloc(int node)
slab = new_slab(kmem_cache_node, GFP_NOWAIT, node);
BUG_ON(!slab);
inc_slabs_node(kmem_cache_node, slab_nid(slab), slab->objects);
if (slab_nid(slab) != node) {
pr_err("SLUB: Unable to allocate memory from node %d\n", node);
pr_err("SLUB: Allocating a useless per node structure in order to be able to continue\n");
@ -5104,7 +5084,7 @@ static int calculate_sizes(struct kmem_cache *s)
static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
{
s->flags = kmem_cache_flags(s->size, flags, s->name);
s->flags = kmem_cache_flags(flags, s->name);
#ifdef CONFIG_SLAB_FREELIST_HARDENED
s->random = get_random_long();
#endif
@ -5313,7 +5293,9 @@ static int __init setup_slub_min_order(char *str)
return 1;
}
__setup("slub_min_order=", setup_slub_min_order);
__setup("slab_min_order=", setup_slub_min_order);
__setup_param("slub_min_order=", slub_min_order, setup_slub_min_order, 0);
static int __init setup_slub_max_order(char *str)
{
@ -5326,7 +5308,8 @@ static int __init setup_slub_max_order(char *str)
return 1;
}
__setup("slub_max_order=", setup_slub_max_order);
__setup("slab_max_order=", setup_slub_max_order);
__setup_param("slub_max_order=", slub_max_order, setup_slub_max_order, 0);
static int __init setup_slub_min_objects(char *str)
{
@ -5335,7 +5318,8 @@ static int __init setup_slub_min_objects(char *str)
return 1;
}
__setup("slub_min_objects=", setup_slub_min_objects);
__setup("slab_min_objects=", setup_slub_min_objects);
__setup_param("slub_min_objects=", slub_min_objects, setup_slub_min_objects, 0);
#ifdef CONFIG_HARDENED_USERCOPY
/*
@ -5663,7 +5647,7 @@ void __init kmem_cache_init(void)
/* Now we can use the kmem_cache to allocate kmalloc slabs */
setup_kmalloc_cache_index_table();
create_kmalloc_caches(0);
create_kmalloc_caches();
/* Setup random freelists for each cache */
init_freelist_randomization();
@ -6792,13 +6776,11 @@ out_del_kobj:
void sysfs_slab_unlink(struct kmem_cache *s)
{
if (slab_state >= FULL)
kobject_del(&s->kobj);
}
void sysfs_slab_release(struct kmem_cache *s)
{
if (slab_state >= FULL)
kobject_put(&s->kobj);
}

View File

@ -18,7 +18,6 @@ bool slab_is_available(void);
enum slab_state {
DOWN,
PARTIAL,
PARTIAL_NODE,
UP,
FULL
};