linux-stable/mm/kfence/kfence.h
Marco Elver 08f6b10630 kfence: limit currently covered allocations when pool nearly full
One of KFENCE's main design principles is that with increasing uptime,
allocation coverage increases sufficiently to detect previously
undetected bugs.

We have observed that frequent long-lived allocations of the same source
(e.g.  pagecache) tend to permanently fill up the KFENCE pool with
increasing system uptime, thus breaking the above requirement.  The
workaround thus far had been increasing the sample interval and/or
increasing the KFENCE pool size, but is no reliable solution.

To ensure diverse coverage of allocations, limit currently covered
allocations of the same source once pool utilization reaches 75%
(configurable via `kfence.skip_covered_thresh`) or above.  The effect is
retaining reasonable allocation coverage when the pool is close to full.

A side-effect is that this also limits frequent long-lived allocations
of the same source filling up the pool permanently.

Uniqueness of an allocation for coverage purposes is based on its
(partial) allocation stack trace (the source).  A Counting Bloom filter
is used to check if an allocation is covered; if the allocation is
currently covered, the allocation is skipped by KFENCE.

Testing was done using:

	(a) a synthetic workload that performs frequent long-lived
	    allocations (default config values; sample_interval=1;
	    num_objects=63), and

	(b) normal desktop workloads on an otherwise idle machine where
	    the problem was first reported after a few days of uptime
	    (default config values).

In both test cases the sampled allocation rate no longer drops to zero
at any point.  In the case of (b) we observe (after 2 days uptime) 15%
unique allocations in the pool, 77% pool utilization, with 20% "skipped
allocations (covered)".

[elver@google.com: simplify and just use hash_32(), use more random stack_hash_seed]
  Link: https://lkml.kernel.org/r/YU3MRGaCaJiYht5g@elver.google.com
[elver@google.com: fix 32 bit]

Link: https://lkml.kernel.org/r/20210923104803.2620285-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Taras Madan <tarasmadan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00

111 lines
3.2 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
/*
* Kernel Electric-Fence (KFENCE). For more info please see
* Documentation/dev-tools/kfence.rst.
*
* Copyright (C) 2020, Google LLC.
*/
#ifndef MM_KFENCE_KFENCE_H
#define MM_KFENCE_KFENCE_H
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/types.h>
#include "../slab.h" /* for struct kmem_cache */
/*
* Get the canary byte pattern for @addr. Use a pattern that varies based on the
* lower 3 bits of the address, to detect memory corruptions with higher
* probability, where similar constants are used.
*/
#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7))
/* Maximum stack depth for reports. */
#define KFENCE_STACK_DEPTH 64
/* KFENCE object states. */
enum kfence_object_state {
KFENCE_OBJECT_UNUSED, /* Object is unused. */
KFENCE_OBJECT_ALLOCATED, /* Object is currently allocated. */
KFENCE_OBJECT_FREED, /* Object was allocated, and then freed. */
};
/* Alloc/free tracking information. */
struct kfence_track {
pid_t pid;
int cpu;
u64 ts_nsec;
int num_stack_entries;
unsigned long stack_entries[KFENCE_STACK_DEPTH];
};
/* KFENCE metadata per guarded allocation. */
struct kfence_metadata {
struct list_head list; /* Freelist node; access under kfence_freelist_lock. */
struct rcu_head rcu_head; /* For delayed freeing. */
/*
* Lock protecting below data; to ensure consistency of the below data,
* since the following may execute concurrently: __kfence_alloc(),
* __kfence_free(), kfence_handle_page_fault(). However, note that we
* cannot grab the same metadata off the freelist twice, and multiple
* __kfence_alloc() cannot run concurrently on the same metadata.
*/
raw_spinlock_t lock;
/* The current state of the object; see above. */
enum kfence_object_state state;
/*
* Allocated object address; cannot be calculated from size, because of
* alignment requirements.
*
* Invariant: ALIGN_DOWN(addr, PAGE_SIZE) is constant.
*/
unsigned long addr;
/*
* The size of the original allocation.
*/
size_t size;
/*
* The kmem_cache cache of the last allocation; NULL if never allocated
* or the cache has already been destroyed.
*/
struct kmem_cache *cache;
/*
* In case of an invalid access, the page that was unprotected; we
* optimistically only store one address.
*/
unsigned long unprotected_page;
/* Allocation and free stack information. */
struct kfence_track alloc_track;
struct kfence_track free_track;
/* For updating alloc_covered on frees. */
u32 alloc_stack_hash;
};
extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
/* KFENCE error types for report generation. */
enum kfence_error_type {
KFENCE_ERROR_OOB, /* Detected a out-of-bounds access. */
KFENCE_ERROR_UAF, /* Detected a use-after-free access. */
KFENCE_ERROR_CORRUPTION, /* Detected a memory corruption on free. */
KFENCE_ERROR_INVALID, /* Invalid access of unknown type. */
KFENCE_ERROR_INVALID_FREE, /* Invalid free. */
};
void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs,
const struct kfence_metadata *meta, enum kfence_error_type type);
void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta);
#endif /* MM_KFENCE_KFENCE_H */