A set of changes for debugobjects:

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-12-29 17:25:38 +00:00

- Prevent destroying the kmem_cache on early failure.

Destroying a kmem_cache requires work queues to be set up, but in the
early failure case they are not yet initializated. So rather leak the
cache instead of triggering a BUG.

- Reduce parallel pool fill attempts.

Refilling the object pool requires to take the global pool lock, which
causes a massive performance issue when a large number of CPUs attempt
to refill concurrently. It turns out that it's sufficient to let one
CPU handle the refill from the to free list and in case there are not
enough objects on it to allocate new objects from the kmem cache.

This also splits the free list handling from the actual allocation path
as that yields better results on RT where allocation is restricted to
preemptible code paths. The refill from free list has no such
restrictions.

- Consolidate the global and the per CPU pools to use the same data
structure, so all helper functions can be shared.

- Simplify the object allocation/free logic.

The allocation/free logic is an incomprehensible maze, which tries to
utilize the to free list and the global pool in the best way. This all
can be simplified into a straight forward comprehensible code flow.

- Convert the allocation/free mechanism to batch mode.

Transferring objects from the global pool to the per CPU pools or vice
versa is done by walking the hlist and moving object by object. That
not only increases the pool lock held time, it also dirties up to 17
cache lines.

This can be avoided by storing the pointer to the first object in a
batch of 16 objects in the objects themself and propagate it through
the batch when an object is enqueued into a pool or to a temporary
hlist head on allocation.

This allows to move batches of objects with at max four cache lines
dirtied and reduces the pool lock held time and therefore contention
significantly.

- Improve the object reusage

The current implementation is too agressively freeing unused objects,
which is counterproductive on bursty workloads like a kernel compile.

Address this by:

* increasing the per CPU pool size

* refilling the per CPU pool from the to be freed pool when the per
CPU pool emptied a batch

* keeping track of object usage with a exponentially wheighted
moving average which prevents the work queue callback to free
objects prematuraly.

This combined reduces the allocation/free rate for a full kernel
compile significantly:

kmem_cache_alloc() kmem_cache_free()
Baseline: 380k 330k
Improved: 170k 117k

- A few cleanups and a more cache line friendly layout of debug
information on top.
-----BEGIN PGP SIGNATURE-----

iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7ezETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYqOD/42X0//BzqdCs0W3jAuaSxbcncp14en
kxuKJVcIOwTwiry5xnSD647YYBdXGZyEa1FR84eFpI6cM6O68mCm+Q4Ab+O02MwC
1tAAQ7fS3fhPBHip6RQtBygexH8WXH3I9BeeXkzQgMCyyObkjRSL3oLIGA4Azfuo
q79LNZ5ctp9zd2DMWD/h+DEzYKr7LZfCMeoxXKLv6BdpZSS35cZhX4u7uu7DPryE
AWPCFCE/bEv/QQZ9bUz9Zc8KXsclcgrPXn/ubP8NVK6IHJ2RjIXqBDzQo0C2+QVi
yb/XdjmQJXNxb3RZxOpwwrefy/jhd8h41rY3prnfnHBU8XU7IFUgN6MfAC46peZR
dXOLGxsLhJk2xaGcddqD7rSDA1hm7Dpn6ZtTbgiaxWd+ksUCxQckkzWCYlGXl3Az
4M0LeexWEBKQYBAb1XjAOmfWmndVZWJ6QDFNMN67o0YZt4Uh2APSV/0fevUBGjzT
nVWxDzN0a/0kMuvmFtwnReVnnGKixC4X3AV4/mvNYQOoRhSrTxjwkBn2TxvZ+3Sh
v5uNGkUGe3dXS4XBWbytm/HeDdzKZ/C3KATm+bHSqQ+/ktxuCp13EhiursYf5Yc/
44T8sPEcSTj+xWHLZpsJfz0lpQM4q3KJj0HPQkSIHUD5KWTMkBSFonuBF6jHkf9H
R4OsmrvXTdFG5g==
=zxbA
-----END PGP SIGNATURE-----

Merge tag 'core-debugobjects-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull debugobjects updates from Thomas Gleixner:

- Prevent destroying the kmem_cache on early failure.

Destroying a kmem_cache requires work queues to be set up, but in the
early failure case they are not yet initializated. So rather leak the
cache instead of triggering a BUG.

- Reduce parallel pool fill attempts.

Refilling the object pool requires to take the global pool lock,
which causes a massive performance issue when a large number of CPUs
attempt to refill concurrently. It turns out that it's sufficient to
let one CPU handle the refill from the to free list and in case there
are not enough objects on it to allocate new objects from the kmem
cache.

This also splits the free list handling from the actual allocation
path as that yields better results on RT where allocation is
restricted to preemptible code paths. The refill from free list has
no such restrictions.

- Consolidate the global and the per CPU pools to use the same data
structure, so all helper functions can be shared.

- Simplify the object allocation/free logic.

The allocation/free logic is an incomprehensible maze, which tries to
utilize the to free list and the global pool in the best way. This
all can be simplified into a straight forward comprehensible code
flow.

- Convert the allocation/free mechanism to batch mode.

Transferring objects from the global pool to the per CPU pools or
vice versa is done by walking the hlist and moving object by object.
That not only increases the pool lock held time, it also dirties up
to 17 cache lines.

This allows to move batches of objects with at max four cache lines
dirtied and reduces the pool lock held time and therefore contention
significantly.

- Improve the object reusage

The current implementation is too agressively freeing unused objects,
which is counterproductive on bursty workloads like a kernel compile.

Address this by:

* increasing the per CPU pool size

* refilling the per CPU pool from the to be freed pool when the
per CPU pool emptied a batch

* keeping track of object usage with a exponentially wheighted
moving average which prevents the work queue callback to free
objects prematuraly.

This combined reduces the allocation/free rate for a full kernel
compile significantly:

kmem_cache_alloc() kmem_cache_free()
Baseline: 380k 330k
Improved: 170k 117k

- A few cleanups and a more cache line friendly layout of debug
information on top.

* tag 'core-debugobjects-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
debugobjects: Track object usage to avoid premature freeing of objects
debugobjects: Refill per CPU pool more agressively
debugobjects: Double the per CPU slots
debugobjects: Move pool statistics into global_pool struct
debugobjects: Implement batch processing
debugobjects: Prepare kmem_cache allocations for batching
debugobjects: Prepare for batching
debugobjects: Use static key for boot pool selection
debugobjects: Rework free_object_work()
debugobjects: Rework object freeing
debugobjects: Rework object allocation
debugobjects: Move min/max count into pool struct
debugobjects: Rename and tidy up per CPU pools
debugobjects: Use separate list head for boot pool
debugobjects: Move pools into a datastructure
debugobjects: Reduce parallel pool fill attempts
debugobjects: Make debug_objects_enabled bool
debugobjects: Provide and use free_object_list()
debugobjects: Remove pointless debug printk
debugobjects: Reuse put_objects() on OOM
...

This commit is contained in:

Linus Torvalds

2024-11-19 15:20:04 -08:00

parent a5c93bfec0 ff8d523cc4

commit fb1dd1403c

2 changed files with 495 additions and 384 deletions

									
										12

include/linux/debugobjects.h
									
										View File
										
				@ -23,13 +23,17 @@ struct debug_obj_descr;

				 * @state:	tracked object state

				 * @astate:	current active state

				 * @object:	pointer to the real object

				 * @batch_last:	pointer to the last hlist node in a batch

				 * @descr:	pointer to an object type specific debug description structure

				 */

				struct debug_obj {

					struct hlist_node	node;

					enum debug_obj_state	state;

					unsigned int		astate;

					void			*object;

					struct hlist_node		node;

					enum debug_obj_state		state;

					unsigned int			astate;

					union {

						void			*object;

						struct hlist_node	*batch_last;

					};

					const struct debug_obj_descr *descr;

				};

867

lib/debugobjects.c

View File

File diff suppressed because it is too large Load Diff

A set of changes for debugobjects:

12 include/linux/debugobjects.h Unescape Escape View File

867 lib/debugobjects.c View File

12

include/linux/debugobjects.h

View File

867

lib/debugobjects.c

View File