mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-01 10:42:11 +00:00
cgroup: Changes for v6.12
- cpuset isolation improvements. - cpuset cgroup1 support is split into its own file behind the new config option CONFIG_CPUSET_V1. This makes it the second controller which makes cgroup1 support optional after memcg. - Handling of unavailable v1 controller handling improved during cgroup1 mount operations. - union_find applied to cpuset. It makes code simpler and more efficient. - Reduce spurious events in pids.events. - Cleanups and other misc changes. - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes that further changes build upon. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuNU3Q4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGdMsAP9yqPxu//LiJ3lPWhKcVVKtdwrA3AYDLE81VSJO 5VZJhAD+Ic+Ly/jZjDtjjQpZ1U3JsBpBRcVBqzeH0gD7eXaJgwk= =h/+c -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cpuset isolation improvements - cpuset cgroup1 support is split into its own file behind the new config option CONFIG_CPUSET_V1. This makes it the second controller which makes cgroup1 support optional after memcg - Handling of unavailable v1 controller handling improved during cgroup1 mount operations - union_find applied to cpuset. It makes code simpler and more efficient - Reduce spurious events in pids.events - Cleanups and other misc changes - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes that further changes build upon * tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (34 commits) cgroup: Do not report unavailable v1 controllers in /proc/cgroups cgroup: Disallow mounting v1 hierarchies without controller implementation cgroup/cpuset: Expose cpuset filesystem with cpuset v1 only cgroup/cpuset: Move cpu.h include to cpuset-internal.h cgroup/cpuset: add sefltest for cpuset v1 cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1 cgroup/cpuset: rename functions shared between v1 and v2 cgroup/cpuset: move v1 interfaces to cpuset-v1.c cgroup/cpuset: move validate_change_legacy to cpuset-v1.c cgroup/cpuset: move legacy hotplug update to cpuset-v1.c cgroup/cpuset: add callback_lock helper cgroup/cpuset: move memory_spread to cpuset-v1.c cgroup/cpuset: move relax_domain_level to cpuset-v1.c cgroup/cpuset: move memory_pressure to cpuset-v1.c cgroup/cpuset: move common code to cpuset-internal.h cgroup/cpuset: introduce cpuset-v1.c selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs cgroup/cpuset: Account for boot time isolated CPUs cgroup/cpuset: remove use_parent_ecpus of cpuset cgroup/cpuset: remove fetch_xcpus ...
This commit is contained in:
commit
78567e2bc7
@ -533,10 +533,12 @@ cgroup namespace on namespace creation.
|
||||
Because the resource control interface files in a given directory
|
||||
control the distribution of the parent's resources, the delegatee
|
||||
shouldn't be allowed to write to them. For the first method, this is
|
||||
achieved by not granting access to these files. For the second, the
|
||||
kernel rejects writes to all files other than "cgroup.procs" and
|
||||
"cgroup.subtree_control" on a namespace root from inside the
|
||||
namespace.
|
||||
achieved by not granting access to these files. For the second, files
|
||||
outside the namespace should be hidden from the delegatee by the means
|
||||
of at least mount namespacing, and the kernel rejects writes to all
|
||||
files on a namespace root from inside the cgroup namespace, except for
|
||||
those files listed in "/sys/kernel/cgroup/delegate" (including
|
||||
"cgroup.procs", "cgroup.threads", "cgroup.subtree_control", etc.).
|
||||
|
||||
The end results are equivalent for both delegation types. Once
|
||||
delegated, the user can build sub-hierarchy under the directory,
|
||||
@ -981,6 +983,14 @@ All cgroup core files are prefixed with "cgroup."
|
||||
A dying cgroup can consume system resources not exceeding
|
||||
limits, which were active at the moment of cgroup deletion.
|
||||
|
||||
nr_subsys_<cgroup_subsys>
|
||||
Total number of live cgroup subsystems (e.g memory
|
||||
cgroup) at and beneath the current cgroup.
|
||||
|
||||
nr_dying_subsys_<cgroup_subsys>
|
||||
Total number of dying cgroup subsystems (e.g. memory
|
||||
cgroup) at and beneath the current cgroup.
|
||||
|
||||
cgroup.freeze
|
||||
A read-write single value file which exists on non-root cgroups.
|
||||
Allowed values are "0" and "1". The default is "0".
|
||||
@ -2940,8 +2950,8 @@ Deprecated v1 Core Features
|
||||
|
||||
- "cgroup.clone_children" is removed.
|
||||
|
||||
- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
|
||||
at the root instead.
|
||||
- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
|
||||
"cgroup.stat" files at the root instead.
|
||||
|
||||
|
||||
Issues with v1 and Rationales for v2
|
||||
|
@ -49,6 +49,7 @@ Library functionality that is used throughout the kernel.
|
||||
wrappers/atomic_t
|
||||
wrappers/atomic_bitops
|
||||
floating-point
|
||||
union_find
|
||||
|
||||
Low level entry and exit
|
||||
========================
|
||||
|
106
Documentation/core-api/union_find.rst
Normal file
106
Documentation/core-api/union_find.rst
Normal file
@ -0,0 +1,106 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================
|
||||
Union-Find in Linux
|
||||
====================
|
||||
|
||||
|
||||
:Date: June 21, 2024
|
||||
:Author: Xavier <xavier_qy@163.com>
|
||||
|
||||
What is union-find, and what is it used for?
|
||||
------------------------------------------------
|
||||
|
||||
Union-find is a data structure used to handle the merging and querying
|
||||
of disjoint sets. The primary operations supported by union-find are:
|
||||
|
||||
Initialization: Resetting each element as an individual set, with
|
||||
each set's initial parent node pointing to itself.
|
||||
|
||||
Find: Determine which set a particular element belongs to, usually by
|
||||
returning a “representative element” of that set. This operation
|
||||
is used to check if two elements are in the same set.
|
||||
|
||||
Union: Merge two sets into one.
|
||||
|
||||
As a data structure used to maintain sets (groups), union-find is commonly
|
||||
utilized to solve problems related to offline queries, dynamic connectivity,
|
||||
and graph theory. It is also a key component in Kruskal's algorithm for
|
||||
computing the minimum spanning tree, which is crucial in scenarios like
|
||||
network routing. Consequently, union-find is widely referenced. Additionally,
|
||||
union-find has applications in symbolic computation, register allocation,
|
||||
and more.
|
||||
|
||||
Space Complexity: O(n), where n is the number of nodes.
|
||||
|
||||
Time Complexity: Using path compression can reduce the time complexity of
|
||||
the find operation, and using union by rank can reduce the time complexity
|
||||
of the union operation. These optimizations reduce the average time
|
||||
complexity of each find and union operation to O(α(n)), where α(n) is the
|
||||
inverse Ackermann function. This can be roughly considered a constant time
|
||||
complexity for practical purposes.
|
||||
|
||||
This document covers use of the Linux union-find implementation. For more
|
||||
information on the nature and implementation of union-find, see:
|
||||
|
||||
Wikipedia entry on union-find
|
||||
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
|
||||
|
||||
Linux implementation of union-find
|
||||
-----------------------------------
|
||||
|
||||
Linux's union-find implementation resides in the file "lib/union_find.c".
|
||||
To use it, "#include <linux/union_find.h>".
|
||||
|
||||
The union-find data structure is defined as follows::
|
||||
|
||||
struct uf_node {
|
||||
struct uf_node *parent;
|
||||
unsigned int rank;
|
||||
};
|
||||
|
||||
In this structure, parent points to the parent node of the current node.
|
||||
The rank field represents the height of the current tree. During a union
|
||||
operation, the tree with the smaller rank is attached under the tree with the
|
||||
larger rank to maintain balance.
|
||||
|
||||
Initializing union-find
|
||||
-----------------------
|
||||
|
||||
You can complete the initialization using either static or initialization
|
||||
interface. Initialize the parent pointer to point to itself and set the rank
|
||||
to 0.
|
||||
Example::
|
||||
|
||||
struct uf_node my_node = UF_INIT_NODE(my_node);
|
||||
|
||||
or
|
||||
|
||||
uf_node_init(&my_node);
|
||||
|
||||
Find the Root Node of union-find
|
||||
--------------------------------
|
||||
|
||||
This operation is mainly used to determine whether two nodes belong to the same
|
||||
set in the union-find. If they have the same root, they are in the same set.
|
||||
During the find operation, path compression is performed to improve the
|
||||
efficiency of subsequent find operations.
|
||||
Example::
|
||||
|
||||
int connected;
|
||||
struct uf_node *root1 = uf_find(&node_1);
|
||||
struct uf_node *root2 = uf_find(&node_2);
|
||||
if (root1 == root2)
|
||||
connected = 1;
|
||||
else
|
||||
connected = 0;
|
||||
|
||||
Union Two Sets in union-find
|
||||
----------------------------
|
||||
|
||||
To union two sets in the union-find, you first find their respective root nodes
|
||||
and then link the smaller node to the larger node based on the rank of the root
|
||||
nodes.
|
||||
Example::
|
||||
|
||||
uf_union(&node_1, &node_2);
|
@ -49,6 +49,7 @@
|
||||
generic-radix-tree
|
||||
packing
|
||||
this_cpu_ops
|
||||
union_find
|
||||
|
||||
=======
|
||||
|
||||
|
92
Documentation/translations/zh_CN/core-api/union_find.rst
Normal file
92
Documentation/translations/zh_CN/core-api/union_find.rst
Normal file
@ -0,0 +1,92 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: ../disclaimer-zh_CN.rst
|
||||
|
||||
:Original: Documentation/core-api/union_find.rst
|
||||
|
||||
=============================
|
||||
Linux中的并查集(Union-Find)
|
||||
=============================
|
||||
|
||||
|
||||
:日期: 2024年6月21日
|
||||
:作者: Xavier <xavier_qy@163.com>
|
||||
|
||||
何为并查集,它有什么用?
|
||||
------------------------
|
||||
|
||||
并查集是一种数据结构,用于处理一些不交集的合并及查询问题。并查集支持的主要操作:
|
||||
初始化:将每个元素初始化为单独的集合,每个集合的初始父节点指向自身。
|
||||
|
||||
查询:查询某个元素属于哪个集合,通常是返回集合中的一个“代表元素”。这个操作是为
|
||||
了判断两个元素是否在同一个集合之中。
|
||||
|
||||
合并:将两个集合合并为一个。
|
||||
|
||||
并查集作为一种用于维护集合(组)的数据结构,它通常用于解决一些离线查询、动态连通性和
|
||||
图论等相关问题,同时也是用于计算最小生成树的克鲁斯克尔算法中的关键,由于最小生成树在
|
||||
网络路由等场景下十分重要,并查集也得到了广泛的引用。此外,并查集在符号计算,寄存器分
|
||||
配等方面也有应用。
|
||||
|
||||
空间复杂度: O(n),n为节点数。
|
||||
|
||||
时间复杂度:使用路径压缩可以减少查找操作的时间复杂度,使用按秩合并可以减少合并操作的
|
||||
时间复杂度,使得并查集每个查询和合并操作的平均时间复杂度仅为O(α(n)),其中α(n)是反阿
|
||||
克曼函数,可以粗略地认为并查集的操作有常数的时间复杂度。
|
||||
|
||||
本文档涵盖了对Linux并查集实现的使用方法。更多关于并查集的性质和实现的信息,参见:
|
||||
|
||||
维基百科并查集词条
|
||||
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
|
||||
|
||||
并查集的Linux实现
|
||||
------------------
|
||||
|
||||
Linux的并查集实现在文件“lib/union_find.c”中。要使用它,需要
|
||||
“#include <linux/union_find.h>”。
|
||||
|
||||
并查集的数据结构定义如下::
|
||||
|
||||
struct uf_node {
|
||||
struct uf_node *parent;
|
||||
unsigned int rank;
|
||||
};
|
||||
|
||||
其中parent为当前节点的父节点,rank为当前树的高度,在合并时将rank小的节点接到rank大
|
||||
的节点下面以增加平衡性。
|
||||
|
||||
初始化并查集
|
||||
-------------
|
||||
|
||||
可以采用静态或初始化接口完成初始化操作。初始化时,parent 指针指向自身,rank 设置
|
||||
为 0。
|
||||
示例::
|
||||
|
||||
struct uf_node my_node = UF_INIT_NODE(my_node);
|
||||
|
||||
或
|
||||
|
||||
uf_node_init(&my_node);
|
||||
|
||||
查找并查集的根节点
|
||||
------------------
|
||||
|
||||
主要用于判断两个并查集是否属于一个集合,如果根相同,那么他们就是一个集合。在查找过程中
|
||||
会对路径进行压缩,提高后续查找效率。
|
||||
示例::
|
||||
|
||||
int connected;
|
||||
struct uf_node *root1 = uf_find(&node_1);
|
||||
struct uf_node *root2 = uf_find(&node_2);
|
||||
if (root1 == root2)
|
||||
connected = 1;
|
||||
else
|
||||
connected = 0;
|
||||
|
||||
合并两个并查集
|
||||
--------------
|
||||
|
||||
对于两个相交的并查集进行合并,会首先查找它们各自的根节点,然后根据根节点秩大小,将小的
|
||||
节点连接到大的节点下面。
|
||||
示例::
|
||||
|
||||
uf_union(&node_1, &node_2);
|
12
MAINTAINERS
12
MAINTAINERS
@ -5736,9 +5736,12 @@ S: Maintained
|
||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
|
||||
F: Documentation/admin-guide/cgroup-v1/cpusets.rst
|
||||
F: include/linux/cpuset.h
|
||||
F: kernel/cgroup/cpuset-internal.h
|
||||
F: kernel/cgroup/cpuset-v1.c
|
||||
F: kernel/cgroup/cpuset.c
|
||||
F: tools/testing/selftests/cgroup/test_cpuset.c
|
||||
F: tools/testing/selftests/cgroup/test_cpuset_prs.sh
|
||||
F: tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
|
||||
|
||||
CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
|
||||
M: Johannes Weiner <hannes@cmpxchg.org>
|
||||
@ -23606,6 +23609,15 @@ F: drivers/cdrom/cdrom.c
|
||||
F: include/linux/cdrom.h
|
||||
F: include/uapi/linux/cdrom.h
|
||||
|
||||
UNION-FIND
|
||||
M: Xavier <xavier_qy@163.com>
|
||||
L: linux-kernel@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/core-api/union_find.rst
|
||||
F: Documentation/translations/zh_CN/core-api/union_find.rst
|
||||
F: include/linux/union_find.h
|
||||
F: lib/union_find.c
|
||||
|
||||
UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER
|
||||
R: Alim Akhtar <alim.akhtar@samsung.com>
|
||||
R: Avri Altman <avri.altman@wdc.com>
|
||||
|
@ -210,6 +210,14 @@ struct cgroup_subsys_state {
|
||||
* fields of the containing structure.
|
||||
*/
|
||||
struct cgroup_subsys_state *parent;
|
||||
|
||||
/*
|
||||
* Keep track of total numbers of visible descendant CSSes.
|
||||
* The total number of dying CSSes is tracked in
|
||||
* css->cgroup->nr_dying_subsys[ssid].
|
||||
* Protected by cgroup_mutex.
|
||||
*/
|
||||
int nr_descendants;
|
||||
};
|
||||
|
||||
/*
|
||||
@ -470,6 +478,12 @@ struct cgroup {
|
||||
/* Private pointers for each registered subsystem */
|
||||
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
|
||||
|
||||
/*
|
||||
* Keep track of total number of dying CSSes at and below this cgroup.
|
||||
* Protected by cgroup_mutex.
|
||||
*/
|
||||
int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
|
||||
|
||||
struct cgroup_root *root;
|
||||
|
||||
/*
|
||||
|
@ -99,6 +99,7 @@ static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
|
||||
extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
|
||||
const struct task_struct *tsk2);
|
||||
|
||||
#ifdef CONFIG_CPUSETS_V1
|
||||
#define cpuset_memory_pressure_bump() \
|
||||
do { \
|
||||
if (cpuset_memory_pressure_enabled) \
|
||||
@ -106,6 +107,9 @@ extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
|
||||
} while (0)
|
||||
extern int cpuset_memory_pressure_enabled;
|
||||
extern void __cpuset_memory_pressure_bump(void);
|
||||
#else
|
||||
static inline void cpuset_memory_pressure_bump(void) { }
|
||||
#endif
|
||||
|
||||
extern void cpuset_task_status_allowed(struct seq_file *m,
|
||||
struct task_struct *task);
|
||||
@ -113,7 +117,6 @@ extern int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
|
||||
struct pid *pid, struct task_struct *tsk);
|
||||
|
||||
extern int cpuset_mem_spread_node(void);
|
||||
extern int cpuset_slab_spread_node(void);
|
||||
|
||||
static inline int cpuset_do_page_mem_spread(void)
|
||||
{
|
||||
@ -246,11 +249,6 @@ static inline int cpuset_mem_spread_node(void)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int cpuset_slab_spread_node(void)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int cpuset_do_page_mem_spread(void)
|
||||
{
|
||||
return 0;
|
||||
|
@ -1243,7 +1243,6 @@ struct task_struct {
|
||||
/* Sequence number to catch updates: */
|
||||
seqcount_spinlock_t mems_allowed_seq;
|
||||
int cpuset_mem_spread_rotor;
|
||||
int cpuset_slab_spread_rotor;
|
||||
#endif
|
||||
#ifdef CONFIG_CGROUPS
|
||||
/* Control Group info protected by css_set_lock: */
|
||||
|
41
include/linux/union_find.h
Normal file
41
include/linux/union_find.h
Normal file
@ -0,0 +1,41 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __LINUX_UNION_FIND_H
|
||||
#define __LINUX_UNION_FIND_H
|
||||
/**
|
||||
* union_find.h - union-find data structure implementation
|
||||
*
|
||||
* This header provides functions and structures to implement the union-find
|
||||
* data structure. The union-find data structure is used to manage disjoint
|
||||
* sets and supports efficient union and find operations.
|
||||
*
|
||||
* See Documentation/core-api/union_find.rst for documentation and samples.
|
||||
*/
|
||||
|
||||
struct uf_node {
|
||||
struct uf_node *parent;
|
||||
unsigned int rank;
|
||||
};
|
||||
|
||||
/* This macro is used for static initialization of a union-find node. */
|
||||
#define UF_INIT_NODE(node) {.parent = &node, .rank = 0}
|
||||
|
||||
/**
|
||||
* uf_node_init - Initialize a union-find node
|
||||
* @node: pointer to the union-find node to be initialized
|
||||
*
|
||||
* This function sets the parent of the node to itself and
|
||||
* initializes its rank to 0.
|
||||
*/
|
||||
static inline void uf_node_init(struct uf_node *node)
|
||||
{
|
||||
node->parent = node;
|
||||
node->rank = 0;
|
||||
}
|
||||
|
||||
/* find the root of a node */
|
||||
struct uf_node *uf_find(struct uf_node *node);
|
||||
|
||||
/* Merge two intersecting nodes */
|
||||
void uf_union(struct uf_node *node1, struct uf_node *node2);
|
||||
|
||||
#endif /* __LINUX_UNION_FIND_H */
|
13
init/Kconfig
13
init/Kconfig
@ -1143,6 +1143,19 @@ config CPUSETS
|
||||
|
||||
Say N if unsure.
|
||||
|
||||
config CPUSETS_V1
|
||||
bool "Legacy cgroup v1 cpusets controller"
|
||||
depends on CPUSETS
|
||||
default n
|
||||
help
|
||||
Legacy cgroup v1 cpusets controller which has been deprecated by
|
||||
cgroup v2 implementation. The v1 is there for legacy applications
|
||||
which haven't migrated to the new cgroup v2 interface yet. If you
|
||||
do not have any such application then you are completely fine leaving
|
||||
this option disabled.
|
||||
|
||||
Say N if unsure.
|
||||
|
||||
config PROC_PID_CPUSET
|
||||
bool "Include legacy /proc/<pid>/cpuset file"
|
||||
depends on CPUSETS
|
||||
|
@ -5,5 +5,6 @@ obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
|
||||
obj-$(CONFIG_CGROUP_PIDS) += pids.o
|
||||
obj-$(CONFIG_CGROUP_RDMA) += rdma.o
|
||||
obj-$(CONFIG_CPUSETS) += cpuset.o
|
||||
obj-$(CONFIG_CPUSETS_V1) += cpuset-v1.o
|
||||
obj-$(CONFIG_CGROUP_MISC) += misc.o
|
||||
obj-$(CONFIG_CGROUP_DEBUG) += debug.o
|
||||
|
@ -46,6 +46,12 @@ bool cgroup1_ssid_disabled(int ssid)
|
||||
return cgroup_no_v1_mask & (1 << ssid);
|
||||
}
|
||||
|
||||
static bool cgroup1_subsys_absent(struct cgroup_subsys *ss)
|
||||
{
|
||||
/* Check also dfl_cftypes for file-less controllers, i.e. perf_event */
|
||||
return ss->legacy_cftypes == NULL && ss->dfl_cftypes;
|
||||
}
|
||||
|
||||
/**
|
||||
* cgroup_attach_task_all - attach task 'tsk' to all cgroups of task 'from'
|
||||
* @from: attach to all cgroups of a given task
|
||||
@ -675,11 +681,14 @@ int proc_cgroupstats_show(struct seq_file *m, void *v)
|
||||
* cgroup_mutex contention.
|
||||
*/
|
||||
|
||||
for_each_subsys(ss, i)
|
||||
for_each_subsys(ss, i) {
|
||||
if (cgroup1_subsys_absent(ss))
|
||||
continue;
|
||||
seq_printf(m, "%s\t%d\t%d\t%d\n",
|
||||
ss->legacy_name, ss->root->hierarchy_id,
|
||||
atomic_read(&ss->root->nr_cgrps),
|
||||
cgroup_ssid_enabled(i));
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -932,7 +941,8 @@ int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param)
|
||||
if (ret != -ENOPARAM)
|
||||
return ret;
|
||||
for_each_subsys(ss, i) {
|
||||
if (strcmp(param->key, ss->legacy_name))
|
||||
if (strcmp(param->key, ss->legacy_name) ||
|
||||
cgroup1_subsys_absent(ss))
|
||||
continue;
|
||||
if (!cgroup_ssid_enabled(i) || cgroup1_ssid_disabled(i))
|
||||
return invalfc(fc, "Disabled controller '%s'",
|
||||
@ -1024,7 +1034,8 @@ static int check_cgroupfs_options(struct fs_context *fc)
|
||||
mask = ~((u16)1 << cpuset_cgrp_id);
|
||||
#endif
|
||||
for_each_subsys(ss, i)
|
||||
if (cgroup_ssid_enabled(i) && !cgroup1_ssid_disabled(i))
|
||||
if (cgroup_ssid_enabled(i) && !cgroup1_ssid_disabled(i) &&
|
||||
!cgroup1_subsys_absent(ss))
|
||||
enabled |= 1 << i;
|
||||
|
||||
ctx->subsys_mask &= enabled;
|
||||
|
@ -2331,7 +2331,7 @@ static struct file_system_type cgroup2_fs_type = {
|
||||
.fs_flags = FS_USERNS_MOUNT,
|
||||
};
|
||||
|
||||
#ifdef CONFIG_CPUSETS
|
||||
#ifdef CONFIG_CPUSETS_V1
|
||||
static const struct fs_context_operations cpuset_fs_context_ops = {
|
||||
.get_tree = cgroup1_get_tree,
|
||||
.free = cgroup_fs_context_free,
|
||||
@ -3669,12 +3669,40 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
|
||||
static int cgroup_stat_show(struct seq_file *seq, void *v)
|
||||
{
|
||||
struct cgroup *cgroup = seq_css(seq)->cgroup;
|
||||
struct cgroup_subsys_state *css;
|
||||
int dying_cnt[CGROUP_SUBSYS_COUNT];
|
||||
int ssid;
|
||||
|
||||
seq_printf(seq, "nr_descendants %d\n",
|
||||
cgroup->nr_descendants);
|
||||
|
||||
/*
|
||||
* Show the number of live and dying csses associated with each of
|
||||
* non-inhibited cgroup subsystems that is bound to cgroup v2.
|
||||
*
|
||||
* Without proper lock protection, racing is possible. So the
|
||||
* numbers may not be consistent when that happens.
|
||||
*/
|
||||
rcu_read_lock();
|
||||
for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
|
||||
dying_cnt[ssid] = -1;
|
||||
if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
|
||||
(cgroup_subsys[ssid]->root != &cgrp_dfl_root))
|
||||
continue;
|
||||
css = rcu_dereference_raw(cgroup->subsys[ssid]);
|
||||
dying_cnt[ssid] = cgroup->nr_dying_subsys[ssid];
|
||||
seq_printf(seq, "nr_subsys_%s %d\n", cgroup_subsys[ssid]->name,
|
||||
css ? (css->nr_descendants + 1) : 0);
|
||||
}
|
||||
|
||||
seq_printf(seq, "nr_dying_descendants %d\n",
|
||||
cgroup->nr_dying_descendants);
|
||||
|
||||
for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
|
||||
if (dying_cnt[ssid] >= 0)
|
||||
seq_printf(seq, "nr_dying_subsys_%s %d\n",
|
||||
cgroup_subsys[ssid]->name, dying_cnt[ssid]);
|
||||
}
|
||||
rcu_read_unlock();
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -4096,7 +4124,7 @@ static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
|
||||
* If namespaces are delegation boundaries, disallow writes to
|
||||
* files in an non-init namespace root from inside the namespace
|
||||
* except for the files explicitly marked delegatable -
|
||||
* cgroup.procs and cgroup.subtree_control.
|
||||
* eg. cgroup.procs, cgroup.threads and cgroup.subtree_control.
|
||||
*/
|
||||
if ((cgrp->root->flags & CGRP_ROOT_NS_DELEGATE) &&
|
||||
!(cft->flags & CFTYPE_NS_DELEGATABLE) &&
|
||||
@ -5424,6 +5452,8 @@ static void css_release_work_fn(struct work_struct *work)
|
||||
list_del_rcu(&css->sibling);
|
||||
|
||||
if (ss) {
|
||||
struct cgroup *parent_cgrp;
|
||||
|
||||
/* css release path */
|
||||
if (!list_empty(&css->rstat_css_node)) {
|
||||
cgroup_rstat_flush(cgrp);
|
||||
@ -5433,6 +5463,21 @@ static void css_release_work_fn(struct work_struct *work)
|
||||
cgroup_idr_replace(&ss->css_idr, NULL, css->id);
|
||||
if (ss->css_released)
|
||||
ss->css_released(css);
|
||||
|
||||
cgrp->nr_dying_subsys[ss->id]--;
|
||||
/*
|
||||
* When a css is released and ready to be freed, its
|
||||
* nr_descendants must be zero. However, the corresponding
|
||||
* cgrp->nr_dying_subsys[ss->id] may not be 0 if a subsystem
|
||||
* is activated and deactivated multiple times with one or
|
||||
* more of its previous activation leaving behind dying csses.
|
||||
*/
|
||||
WARN_ON_ONCE(css->nr_descendants);
|
||||
parent_cgrp = cgroup_parent(cgrp);
|
||||
while (parent_cgrp) {
|
||||
parent_cgrp->nr_dying_subsys[ss->id]--;
|
||||
parent_cgrp = cgroup_parent(parent_cgrp);
|
||||
}
|
||||
} else {
|
||||
struct cgroup *tcgrp;
|
||||
|
||||
@ -5517,8 +5562,11 @@ static int online_css(struct cgroup_subsys_state *css)
|
||||
rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
|
||||
|
||||
atomic_inc(&css->online_cnt);
|
||||
if (css->parent)
|
||||
if (css->parent) {
|
||||
atomic_inc(&css->parent->online_cnt);
|
||||
while ((css = css->parent))
|
||||
css->nr_descendants++;
|
||||
}
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
@ -5540,6 +5588,16 @@ static void offline_css(struct cgroup_subsys_state *css)
|
||||
RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
|
||||
|
||||
wake_up_all(&css->cgroup->offline_waitq);
|
||||
|
||||
css->cgroup->nr_dying_subsys[ss->id]++;
|
||||
/*
|
||||
* Parent css and cgroup cannot be freed until after the freeing
|
||||
* of child css, see css_free_rwork_fn().
|
||||
*/
|
||||
while ((css = css->parent)) {
|
||||
css->nr_descendants--;
|
||||
css->cgroup->nr_dying_subsys[ss->id]++;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
@ -6178,7 +6236,7 @@ int __init cgroup_init(void)
|
||||
WARN_ON(register_filesystem(&cgroup_fs_type));
|
||||
WARN_ON(register_filesystem(&cgroup2_fs_type));
|
||||
WARN_ON(!proc_create_single("cgroups", 0, NULL, proc_cgroupstats_show));
|
||||
#ifdef CONFIG_CPUSETS
|
||||
#ifdef CONFIG_CPUSETS_V1
|
||||
WARN_ON(register_filesystem(&cpuset_fs_type));
|
||||
#endif
|
||||
|
||||
|
305
kernel/cgroup/cpuset-internal.h
Normal file
305
kernel/cgroup/cpuset-internal.h
Normal file
@ -0,0 +1,305 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0-or-later */
|
||||
|
||||
#ifndef __CPUSET_INTERNAL_H
|
||||
#define __CPUSET_INTERNAL_H
|
||||
|
||||
#include <linux/cgroup.h>
|
||||
#include <linux/cpu.h>
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/cpuset.h>
|
||||
#include <linux/spinlock.h>
|
||||
#include <linux/union_find.h>
|
||||
|
||||
/* See "Frequency meter" comments, below. */
|
||||
|
||||
struct fmeter {
|
||||
int cnt; /* unprocessed events count */
|
||||
int val; /* most recent output value */
|
||||
time64_t time; /* clock (secs) when val computed */
|
||||
spinlock_t lock; /* guards read or write of above */
|
||||
};
|
||||
|
||||
/*
|
||||
* Invalid partition error code
|
||||
*/
|
||||
enum prs_errcode {
|
||||
PERR_NONE = 0,
|
||||
PERR_INVCPUS,
|
||||
PERR_INVPARENT,
|
||||
PERR_NOTPART,
|
||||
PERR_NOTEXCL,
|
||||
PERR_NOCPUS,
|
||||
PERR_HOTPLUG,
|
||||
PERR_CPUSEMPTY,
|
||||
PERR_HKEEPING,
|
||||
PERR_ACCESS,
|
||||
};
|
||||
|
||||
/* bits in struct cpuset flags field */
|
||||
typedef enum {
|
||||
CS_ONLINE,
|
||||
CS_CPU_EXCLUSIVE,
|
||||
CS_MEM_EXCLUSIVE,
|
||||
CS_MEM_HARDWALL,
|
||||
CS_MEMORY_MIGRATE,
|
||||
CS_SCHED_LOAD_BALANCE,
|
||||
CS_SPREAD_PAGE,
|
||||
CS_SPREAD_SLAB,
|
||||
} cpuset_flagbits_t;
|
||||
|
||||
/* The various types of files and directories in a cpuset file system */
|
||||
|
||||
typedef enum {
|
||||
FILE_MEMORY_MIGRATE,
|
||||
FILE_CPULIST,
|
||||
FILE_MEMLIST,
|
||||
FILE_EFFECTIVE_CPULIST,
|
||||
FILE_EFFECTIVE_MEMLIST,
|
||||
FILE_SUBPARTS_CPULIST,
|
||||
FILE_EXCLUSIVE_CPULIST,
|
||||
FILE_EFFECTIVE_XCPULIST,
|
||||
FILE_ISOLATED_CPULIST,
|
||||
FILE_CPU_EXCLUSIVE,
|
||||
FILE_MEM_EXCLUSIVE,
|
||||
FILE_MEM_HARDWALL,
|
||||
FILE_SCHED_LOAD_BALANCE,
|
||||
FILE_PARTITION_ROOT,
|
||||
FILE_SCHED_RELAX_DOMAIN_LEVEL,
|
||||
FILE_MEMORY_PRESSURE_ENABLED,
|
||||
FILE_MEMORY_PRESSURE,
|
||||
FILE_SPREAD_PAGE,
|
||||
FILE_SPREAD_SLAB,
|
||||
} cpuset_filetype_t;
|
||||
|
||||
struct cpuset {
|
||||
struct cgroup_subsys_state css;
|
||||
|
||||
unsigned long flags; /* "unsigned long" so bitops work */
|
||||
|
||||
/*
|
||||
* On default hierarchy:
|
||||
*
|
||||
* The user-configured masks can only be changed by writing to
|
||||
* cpuset.cpus and cpuset.mems, and won't be limited by the
|
||||
* parent masks.
|
||||
*
|
||||
* The effective masks is the real masks that apply to the tasks
|
||||
* in the cpuset. They may be changed if the configured masks are
|
||||
* changed or hotplug happens.
|
||||
*
|
||||
* effective_mask == configured_mask & parent's effective_mask,
|
||||
* and if it ends up empty, it will inherit the parent's mask.
|
||||
*
|
||||
*
|
||||
* On legacy hierarchy:
|
||||
*
|
||||
* The user-configured masks are always the same with effective masks.
|
||||
*/
|
||||
|
||||
/* user-configured CPUs and Memory Nodes allow to tasks */
|
||||
cpumask_var_t cpus_allowed;
|
||||
nodemask_t mems_allowed;
|
||||
|
||||
/* effective CPUs and Memory Nodes allow to tasks */
|
||||
cpumask_var_t effective_cpus;
|
||||
nodemask_t effective_mems;
|
||||
|
||||
/*
|
||||
* Exclusive CPUs dedicated to current cgroup (default hierarchy only)
|
||||
*
|
||||
* The effective_cpus of a valid partition root comes solely from its
|
||||
* effective_xcpus and some of the effective_xcpus may be distributed
|
||||
* to sub-partitions below & hence excluded from its effective_cpus.
|
||||
* For a valid partition root, its effective_cpus have no relationship
|
||||
* with cpus_allowed unless its exclusive_cpus isn't set.
|
||||
*
|
||||
* This value will only be set if either exclusive_cpus is set or
|
||||
* when this cpuset becomes a local partition root.
|
||||
*/
|
||||
cpumask_var_t effective_xcpus;
|
||||
|
||||
/*
|
||||
* Exclusive CPUs as requested by the user (default hierarchy only)
|
||||
*
|
||||
* Its value is independent of cpus_allowed and designates the set of
|
||||
* CPUs that can be granted to the current cpuset or its children when
|
||||
* it becomes a valid partition root. The effective set of exclusive
|
||||
* CPUs granted (effective_xcpus) depends on whether those exclusive
|
||||
* CPUs are passed down by its ancestors and not yet taken up by
|
||||
* another sibling partition root along the way.
|
||||
*
|
||||
* If its value isn't set, it defaults to cpus_allowed.
|
||||
*/
|
||||
cpumask_var_t exclusive_cpus;
|
||||
|
||||
/*
|
||||
* This is old Memory Nodes tasks took on.
|
||||
*
|
||||
* - top_cpuset.old_mems_allowed is initialized to mems_allowed.
|
||||
* - A new cpuset's old_mems_allowed is initialized when some
|
||||
* task is moved into it.
|
||||
* - old_mems_allowed is used in cpuset_migrate_mm() when we change
|
||||
* cpuset.mems_allowed and have tasks' nodemask updated, and
|
||||
* then old_mems_allowed is updated to mems_allowed.
|
||||
*/
|
||||
nodemask_t old_mems_allowed;
|
||||
|
||||
struct fmeter fmeter; /* memory_pressure filter */
|
||||
|
||||
/*
|
||||
* Tasks are being attached to this cpuset. Used to prevent
|
||||
* zeroing cpus/mems_allowed between ->can_attach() and ->attach().
|
||||
*/
|
||||
int attach_in_progress;
|
||||
|
||||
/* for custom sched domain */
|
||||
int relax_domain_level;
|
||||
|
||||
/* number of valid local child partitions */
|
||||
int nr_subparts;
|
||||
|
||||
/* partition root state */
|
||||
int partition_root_state;
|
||||
|
||||
/*
|
||||
* number of SCHED_DEADLINE tasks attached to this cpuset, so that we
|
||||
* know when to rebuild associated root domain bandwidth information.
|
||||
*/
|
||||
int nr_deadline_tasks;
|
||||
int nr_migrate_dl_tasks;
|
||||
u64 sum_migrate_dl_bw;
|
||||
|
||||
/* Invalid partition error code, not lock protected */
|
||||
enum prs_errcode prs_err;
|
||||
|
||||
/* Handle for cpuset.cpus.partition */
|
||||
struct cgroup_file partition_file;
|
||||
|
||||
/* Remote partition silbling list anchored at remote_children */
|
||||
struct list_head remote_sibling;
|
||||
|
||||
/* Used to merge intersecting subsets for generate_sched_domains */
|
||||
struct uf_node node;
|
||||
};
|
||||
|
||||
static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
|
||||
{
|
||||
return css ? container_of(css, struct cpuset, css) : NULL;
|
||||
}
|
||||
|
||||
/* Retrieve the cpuset for a task */
|
||||
static inline struct cpuset *task_cs(struct task_struct *task)
|
||||
{
|
||||
return css_cs(task_css(task, cpuset_cgrp_id));
|
||||
}
|
||||
|
||||
static inline struct cpuset *parent_cs(struct cpuset *cs)
|
||||
{
|
||||
return css_cs(cs->css.parent);
|
||||
}
|
||||
|
||||
/* convenient tests for these bits */
|
||||
static inline bool is_cpuset_online(struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_ONLINE, &cs->flags) && !css_is_dying(&cs->css);
|
||||
}
|
||||
|
||||
static inline int is_cpu_exclusive(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_CPU_EXCLUSIVE, &cs->flags);
|
||||
}
|
||||
|
||||
static inline int is_mem_exclusive(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_MEM_EXCLUSIVE, &cs->flags);
|
||||
}
|
||||
|
||||
static inline int is_mem_hardwall(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_MEM_HARDWALL, &cs->flags);
|
||||
}
|
||||
|
||||
static inline int is_sched_load_balance(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
|
||||
}
|
||||
|
||||
static inline int is_memory_migrate(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
|
||||
}
|
||||
|
||||
static inline int is_spread_page(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_SPREAD_PAGE, &cs->flags);
|
||||
}
|
||||
|
||||
static inline int is_spread_slab(const struct cpuset *cs)
|
||||
{
|
||||
return test_bit(CS_SPREAD_SLAB, &cs->flags);
|
||||
}
|
||||
|
||||
/**
|
||||
* cpuset_for_each_child - traverse online children of a cpuset
|
||||
* @child_cs: loop cursor pointing to the current child
|
||||
* @pos_css: used for iteration
|
||||
* @parent_cs: target cpuset to walk children of
|
||||
*
|
||||
* Walk @child_cs through the online children of @parent_cs. Must be used
|
||||
* with RCU read locked.
|
||||
*/
|
||||
#define cpuset_for_each_child(child_cs, pos_css, parent_cs) \
|
||||
css_for_each_child((pos_css), &(parent_cs)->css) \
|
||||
if (is_cpuset_online(((child_cs) = css_cs((pos_css)))))
|
||||
|
||||
/**
|
||||
* cpuset_for_each_descendant_pre - pre-order walk of a cpuset's descendants
|
||||
* @des_cs: loop cursor pointing to the current descendant
|
||||
* @pos_css: used for iteration
|
||||
* @root_cs: target cpuset to walk ancestor of
|
||||
*
|
||||
* Walk @des_cs through the online descendants of @root_cs. Must be used
|
||||
* with RCU read locked. The caller may modify @pos_css by calling
|
||||
* css_rightmost_descendant() to skip subtree. @root_cs is included in the
|
||||
* iteration and the first node to be visited.
|
||||
*/
|
||||
#define cpuset_for_each_descendant_pre(des_cs, pos_css, root_cs) \
|
||||
css_for_each_descendant_pre((pos_css), &(root_cs)->css) \
|
||||
if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
|
||||
|
||||
void rebuild_sched_domains_locked(void);
|
||||
void cpuset_callback_lock_irq(void);
|
||||
void cpuset_callback_unlock_irq(void);
|
||||
void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus);
|
||||
void cpuset_update_tasks_nodemask(struct cpuset *cs);
|
||||
int cpuset_update_flag(cpuset_flagbits_t bit, struct cpuset *cs, int turning_on);
|
||||
ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
|
||||
char *buf, size_t nbytes, loff_t off);
|
||||
int cpuset_common_seq_show(struct seq_file *sf, void *v);
|
||||
|
||||
/*
|
||||
* cpuset-v1.c
|
||||
*/
|
||||
#ifdef CONFIG_CPUSETS_V1
|
||||
extern struct cftype cpuset1_files[];
|
||||
void fmeter_init(struct fmeter *fmp);
|
||||
void cpuset1_update_task_spread_flags(struct cpuset *cs,
|
||||
struct task_struct *tsk);
|
||||
void cpuset1_update_tasks_flags(struct cpuset *cs);
|
||||
void cpuset1_hotplug_update_tasks(struct cpuset *cs,
|
||||
struct cpumask *new_cpus, nodemask_t *new_mems,
|
||||
bool cpus_updated, bool mems_updated);
|
||||
int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
|
||||
#else
|
||||
static inline void fmeter_init(struct fmeter *fmp) {}
|
||||
static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
|
||||
struct task_struct *tsk) {}
|
||||
static inline void cpuset1_update_tasks_flags(struct cpuset *cs) {}
|
||||
static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
|
||||
struct cpumask *new_cpus, nodemask_t *new_mems,
|
||||
bool cpus_updated, bool mems_updated) {}
|
||||
static inline int cpuset1_validate_change(struct cpuset *cur,
|
||||
struct cpuset *trial) { return 0; }
|
||||
#endif /* CONFIG_CPUSETS_V1 */
|
||||
|
||||
#endif /* __CPUSET_INTERNAL_H */
|
562
kernel/cgroup/cpuset-v1.c
Normal file
562
kernel/cgroup/cpuset-v1.c
Normal file
@ -0,0 +1,562 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-or-later
|
||||
|
||||
#include "cpuset-internal.h"
|
||||
|
||||
/*
|
||||
* Legacy hierarchy call to cgroup_transfer_tasks() is handled asynchrously
|
||||
*/
|
||||
struct cpuset_remove_tasks_struct {
|
||||
struct work_struct work;
|
||||
struct cpuset *cs;
|
||||
};
|
||||
|
||||
/*
|
||||
* Frequency meter - How fast is some event occurring?
|
||||
*
|
||||
* These routines manage a digitally filtered, constant time based,
|
||||
* event frequency meter. There are four routines:
|
||||
* fmeter_init() - initialize a frequency meter.
|
||||
* fmeter_markevent() - called each time the event happens.
|
||||
* fmeter_getrate() - returns the recent rate of such events.
|
||||
* fmeter_update() - internal routine used to update fmeter.
|
||||
*
|
||||
* A common data structure is passed to each of these routines,
|
||||
* which is used to keep track of the state required to manage the
|
||||
* frequency meter and its digital filter.
|
||||
*
|
||||
* The filter works on the number of events marked per unit time.
|
||||
* The filter is single-pole low-pass recursive (IIR). The time unit
|
||||
* is 1 second. Arithmetic is done using 32-bit integers scaled to
|
||||
* simulate 3 decimal digits of precision (multiplied by 1000).
|
||||
*
|
||||
* With an FM_COEF of 933, and a time base of 1 second, the filter
|
||||
* has a half-life of 10 seconds, meaning that if the events quit
|
||||
* happening, then the rate returned from the fmeter_getrate()
|
||||
* will be cut in half each 10 seconds, until it converges to zero.
|
||||
*
|
||||
* It is not worth doing a real infinitely recursive filter. If more
|
||||
* than FM_MAXTICKS ticks have elapsed since the last filter event,
|
||||
* just compute FM_MAXTICKS ticks worth, by which point the level
|
||||
* will be stable.
|
||||
*
|
||||
* Limit the count of unprocessed events to FM_MAXCNT, so as to avoid
|
||||
* arithmetic overflow in the fmeter_update() routine.
|
||||
*
|
||||
* Given the simple 32 bit integer arithmetic used, this meter works
|
||||
* best for reporting rates between one per millisecond (msec) and
|
||||
* one per 32 (approx) seconds. At constant rates faster than one
|
||||
* per msec it maxes out at values just under 1,000,000. At constant
|
||||
* rates between one per msec, and one per second it will stabilize
|
||||
* to a value N*1000, where N is the rate of events per second.
|
||||
* At constant rates between one per second and one per 32 seconds,
|
||||
* it will be choppy, moving up on the seconds that have an event,
|
||||
* and then decaying until the next event. At rates slower than
|
||||
* about one in 32 seconds, it decays all the way back to zero between
|
||||
* each event.
|
||||
*/
|
||||
|
||||
#define FM_COEF 933 /* coefficient for half-life of 10 secs */
|
||||
#define FM_MAXTICKS ((u32)99) /* useless computing more ticks than this */
|
||||
#define FM_MAXCNT 1000000 /* limit cnt to avoid overflow */
|
||||
#define FM_SCALE 1000 /* faux fixed point scale */
|
||||
|
||||
/* Initialize a frequency meter */
|
||||
void fmeter_init(struct fmeter *fmp)
|
||||
{
|
||||
fmp->cnt = 0;
|
||||
fmp->val = 0;
|
||||
fmp->time = 0;
|
||||
spin_lock_init(&fmp->lock);
|
||||
}
|
||||
|
||||
/* Internal meter update - process cnt events and update value */
|
||||
static void fmeter_update(struct fmeter *fmp)
|
||||
{
|
||||
time64_t now;
|
||||
u32 ticks;
|
||||
|
||||
now = ktime_get_seconds();
|
||||
ticks = now - fmp->time;
|
||||
|
||||
if (ticks == 0)
|
||||
return;
|
||||
|
||||
ticks = min(FM_MAXTICKS, ticks);
|
||||
while (ticks-- > 0)
|
||||
fmp->val = (FM_COEF * fmp->val) / FM_SCALE;
|
||||
fmp->time = now;
|
||||
|
||||
fmp->val += ((FM_SCALE - FM_COEF) * fmp->cnt) / FM_SCALE;
|
||||
fmp->cnt = 0;
|
||||
}
|
||||
|
||||
/* Process any previous ticks, then bump cnt by one (times scale). */
|
||||
static void fmeter_markevent(struct fmeter *fmp)
|
||||
{
|
||||
spin_lock(&fmp->lock);
|
||||
fmeter_update(fmp);
|
||||
fmp->cnt = min(FM_MAXCNT, fmp->cnt + FM_SCALE);
|
||||
spin_unlock(&fmp->lock);
|
||||
}
|
||||
|
||||
/* Process any previous ticks, then return current value. */
|
||||
static int fmeter_getrate(struct fmeter *fmp)
|
||||
{
|
||||
int val;
|
||||
|
||||
spin_lock(&fmp->lock);
|
||||
fmeter_update(fmp);
|
||||
val = fmp->val;
|
||||
spin_unlock(&fmp->lock);
|
||||
return val;
|
||||
}
|
||||
|
||||
/*
|
||||
* Collection of memory_pressure is suppressed unless
|
||||
* this flag is enabled by writing "1" to the special
|
||||
* cpuset file 'memory_pressure_enabled' in the root cpuset.
|
||||
*/
|
||||
|
||||
int cpuset_memory_pressure_enabled __read_mostly;
|
||||
|
||||
/*
|
||||
* __cpuset_memory_pressure_bump - keep stats of per-cpuset reclaims.
|
||||
*
|
||||
* Keep a running average of the rate of synchronous (direct)
|
||||
* page reclaim efforts initiated by tasks in each cpuset.
|
||||
*
|
||||
* This represents the rate at which some task in the cpuset
|
||||
* ran low on memory on all nodes it was allowed to use, and
|
||||
* had to enter the kernels page reclaim code in an effort to
|
||||
* create more free memory by tossing clean pages or swapping
|
||||
* or writing dirty pages.
|
||||
*
|
||||
* Display to user space in the per-cpuset read-only file
|
||||
* "memory_pressure". Value displayed is an integer
|
||||
* representing the recent rate of entry into the synchronous
|
||||
* (direct) page reclaim by any task attached to the cpuset.
|
||||
*/
|
||||
|
||||
void __cpuset_memory_pressure_bump(void)
|
||||
{
|
||||
rcu_read_lock();
|
||||
fmeter_markevent(&task_cs(current)->fmeter);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
static int update_relax_domain_level(struct cpuset *cs, s64 val)
|
||||
{
|
||||
#ifdef CONFIG_SMP
|
||||
if (val < -1 || val > sched_domain_level_max + 1)
|
||||
return -EINVAL;
|
||||
#endif
|
||||
|
||||
if (val != cs->relax_domain_level) {
|
||||
cs->relax_domain_level = val;
|
||||
if (!cpumask_empty(cs->cpus_allowed) &&
|
||||
is_sched_load_balance(cs))
|
||||
rebuild_sched_domains_locked();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
|
||||
s64 val)
|
||||
{
|
||||
struct cpuset *cs = css_cs(css);
|
||||
cpuset_filetype_t type = cft->private;
|
||||
int retval = -ENODEV;
|
||||
|
||||
cpus_read_lock();
|
||||
cpuset_lock();
|
||||
if (!is_cpuset_online(cs))
|
||||
goto out_unlock;
|
||||
|
||||
switch (type) {
|
||||
case FILE_SCHED_RELAX_DOMAIN_LEVEL:
|
||||
retval = update_relax_domain_level(cs, val);
|
||||
break;
|
||||
default:
|
||||
retval = -EINVAL;
|
||||
break;
|
||||
}
|
||||
out_unlock:
|
||||
cpuset_unlock();
|
||||
cpus_read_unlock();
|
||||
return retval;
|
||||
}
|
||||
|
||||
static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
|
||||
{
|
||||
struct cpuset *cs = css_cs(css);
|
||||
cpuset_filetype_t type = cft->private;
|
||||
|
||||
switch (type) {
|
||||
case FILE_SCHED_RELAX_DOMAIN_LEVEL:
|
||||
return cs->relax_domain_level;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
|
||||
/* Unreachable but makes gcc happy */
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* update task's spread flag if cpuset's page/slab spread flag is set
|
||||
*
|
||||
* Call with callback_lock or cpuset_mutex held. The check can be skipped
|
||||
* if on default hierarchy.
|
||||
*/
|
||||
void cpuset1_update_task_spread_flags(struct cpuset *cs,
|
||||
struct task_struct *tsk)
|
||||
{
|
||||
if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
|
||||
return;
|
||||
|
||||
if (is_spread_page(cs))
|
||||
task_set_spread_page(tsk);
|
||||
else
|
||||
task_clear_spread_page(tsk);
|
||||
|
||||
if (is_spread_slab(cs))
|
||||
task_set_spread_slab(tsk);
|
||||
else
|
||||
task_clear_spread_slab(tsk);
|
||||
}
|
||||
|
||||
/**
|
||||
* cpuset1_update_tasks_flags - update the spread flags of tasks in the cpuset.
|
||||
* @cs: the cpuset in which each task's spread flags needs to be changed
|
||||
*
|
||||
* Iterate through each task of @cs updating its spread flags. As this
|
||||
* function is called with cpuset_mutex held, cpuset membership stays
|
||||
* stable.
|
||||
*/
|
||||
void cpuset1_update_tasks_flags(struct cpuset *cs)
|
||||
{
|
||||
struct css_task_iter it;
|
||||
struct task_struct *task;
|
||||
|
||||
css_task_iter_start(&cs->css, 0, &it);
|
||||
while ((task = css_task_iter_next(&it)))
|
||||
cpuset1_update_task_spread_flags(cs, task);
|
||||
css_task_iter_end(&it);
|
||||
}
|
||||
|
||||
/*
|
||||
* If CPU and/or memory hotplug handlers, below, unplug any CPUs
|
||||
* or memory nodes, we need to walk over the cpuset hierarchy,
|
||||
* removing that CPU or node from all cpusets. If this removes the
|
||||
* last CPU or node from a cpuset, then move the tasks in the empty
|
||||
* cpuset to its next-highest non-empty parent.
|
||||
*/
|
||||
static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
|
||||
{
|
||||
struct cpuset *parent;
|
||||
|
||||
/*
|
||||
* Find its next-highest non-empty parent, (top cpuset
|
||||
* has online cpus, so can't be empty).
|
||||
*/
|
||||
parent = parent_cs(cs);
|
||||
while (cpumask_empty(parent->cpus_allowed) ||
|
||||
nodes_empty(parent->mems_allowed))
|
||||
parent = parent_cs(parent);
|
||||
|
||||
if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
|
||||
pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
|
||||
pr_cont_cgroup_name(cs->css.cgroup);
|
||||
pr_cont("\n");
|
||||
}
|
||||
}
|
||||
|
||||
static void cpuset_migrate_tasks_workfn(struct work_struct *work)
|
||||
{
|
||||
struct cpuset_remove_tasks_struct *s;
|
||||
|
||||
s = container_of(work, struct cpuset_remove_tasks_struct, work);
|
||||
remove_tasks_in_empty_cpuset(s->cs);
|
||||
css_put(&s->cs->css);
|
||||
kfree(s);
|
||||
}
|
||||
|
||||
void cpuset1_hotplug_update_tasks(struct cpuset *cs,
|
||||
struct cpumask *new_cpus, nodemask_t *new_mems,
|
||||
bool cpus_updated, bool mems_updated)
|
||||
{
|
||||
bool is_empty;
|
||||
|
||||
cpuset_callback_lock_irq();
|
||||
cpumask_copy(cs->cpus_allowed, new_cpus);
|
||||
cpumask_copy(cs->effective_cpus, new_cpus);
|
||||
cs->mems_allowed = *new_mems;
|
||||
cs->effective_mems = *new_mems;
|
||||
cpuset_callback_unlock_irq();
|
||||
|
||||
/*
|
||||
* Don't call cpuset_update_tasks_cpumask() if the cpuset becomes empty,
|
||||
* as the tasks will be migrated to an ancestor.
|
||||
*/
|
||||
if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
|
||||
cpuset_update_tasks_cpumask(cs, new_cpus);
|
||||
if (mems_updated && !nodes_empty(cs->mems_allowed))
|
||||
cpuset_update_tasks_nodemask(cs);
|
||||
|
||||
is_empty = cpumask_empty(cs->cpus_allowed) ||
|
||||
nodes_empty(cs->mems_allowed);
|
||||
|
||||
/*
|
||||
* Move tasks to the nearest ancestor with execution resources,
|
||||
* This is full cgroup operation which will also call back into
|
||||
* cpuset. Execute it asynchronously using workqueue.
|
||||
*/
|
||||
if (is_empty && cs->css.cgroup->nr_populated_csets &&
|
||||
css_tryget_online(&cs->css)) {
|
||||
struct cpuset_remove_tasks_struct *s;
|
||||
|
||||
s = kzalloc(sizeof(*s), GFP_KERNEL);
|
||||
if (WARN_ON_ONCE(!s)) {
|
||||
css_put(&cs->css);
|
||||
return;
|
||||
}
|
||||
|
||||
s->cs = cs;
|
||||
INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
|
||||
schedule_work(&s->work);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* is_cpuset_subset(p, q) - Is cpuset p a subset of cpuset q?
|
||||
*
|
||||
* One cpuset is a subset of another if all its allowed CPUs and
|
||||
* Memory Nodes are a subset of the other, and its exclusive flags
|
||||
* are only set if the other's are set. Call holding cpuset_mutex.
|
||||
*/
|
||||
|
||||
static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
|
||||
{
|
||||
return cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
|
||||
nodes_subset(p->mems_allowed, q->mems_allowed) &&
|
||||
is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
|
||||
is_mem_exclusive(p) <= is_mem_exclusive(q);
|
||||
}
|
||||
|
||||
/*
|
||||
* cpuset1_validate_change() - Validate conditions specific to legacy (v1)
|
||||
* behavior.
|
||||
*/
|
||||
int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
|
||||
{
|
||||
struct cgroup_subsys_state *css;
|
||||
struct cpuset *c, *par;
|
||||
int ret;
|
||||
|
||||
WARN_ON_ONCE(!rcu_read_lock_held());
|
||||
|
||||
/* Each of our child cpusets must be a subset of us */
|
||||
ret = -EBUSY;
|
||||
cpuset_for_each_child(c, css, cur)
|
||||
if (!is_cpuset_subset(c, trial))
|
||||
goto out;
|
||||
|
||||
/* On legacy hierarchy, we must be a subset of our parent cpuset. */
|
||||
ret = -EACCES;
|
||||
par = parent_cs(cur);
|
||||
if (par && !is_cpuset_subset(trial, par))
|
||||
goto out;
|
||||
|
||||
ret = 0;
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
|
||||
{
|
||||
struct cpuset *cs = css_cs(css);
|
||||
cpuset_filetype_t type = cft->private;
|
||||
|
||||
switch (type) {
|
||||
case FILE_CPU_EXCLUSIVE:
|
||||
return is_cpu_exclusive(cs);
|
||||
case FILE_MEM_EXCLUSIVE:
|
||||
return is_mem_exclusive(cs);
|
||||
case FILE_MEM_HARDWALL:
|
||||
return is_mem_hardwall(cs);
|
||||
case FILE_SCHED_LOAD_BALANCE:
|
||||
return is_sched_load_balance(cs);
|
||||
case FILE_MEMORY_MIGRATE:
|
||||
return is_memory_migrate(cs);
|
||||
case FILE_MEMORY_PRESSURE_ENABLED:
|
||||
return cpuset_memory_pressure_enabled;
|
||||
case FILE_MEMORY_PRESSURE:
|
||||
return fmeter_getrate(&cs->fmeter);
|
||||
case FILE_SPREAD_PAGE:
|
||||
return is_spread_page(cs);
|
||||
case FILE_SPREAD_SLAB:
|
||||
return is_spread_slab(cs);
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
|
||||
/* Unreachable but makes gcc happy */
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
|
||||
u64 val)
|
||||
{
|
||||
struct cpuset *cs = css_cs(css);
|
||||
cpuset_filetype_t type = cft->private;
|
||||
int retval = 0;
|
||||
|
||||
cpus_read_lock();
|
||||
cpuset_lock();
|
||||
if (!is_cpuset_online(cs)) {
|
||||
retval = -ENODEV;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
switch (type) {
|
||||
case FILE_CPU_EXCLUSIVE:
|
||||
retval = cpuset_update_flag(CS_CPU_EXCLUSIVE, cs, val);
|
||||
break;
|
||||
case FILE_MEM_EXCLUSIVE:
|
||||
retval = cpuset_update_flag(CS_MEM_EXCLUSIVE, cs, val);
|
||||
break;
|
||||
case FILE_MEM_HARDWALL:
|
||||
retval = cpuset_update_flag(CS_MEM_HARDWALL, cs, val);
|
||||
break;
|
||||
case FILE_SCHED_LOAD_BALANCE:
|
||||
retval = cpuset_update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
|
||||
break;
|
||||
case FILE_MEMORY_MIGRATE:
|
||||
retval = cpuset_update_flag(CS_MEMORY_MIGRATE, cs, val);
|
||||
break;
|
||||
case FILE_MEMORY_PRESSURE_ENABLED:
|
||||
cpuset_memory_pressure_enabled = !!val;
|
||||
break;
|
||||
case FILE_SPREAD_PAGE:
|
||||
retval = cpuset_update_flag(CS_SPREAD_PAGE, cs, val);
|
||||
break;
|
||||
case FILE_SPREAD_SLAB:
|
||||
retval = cpuset_update_flag(CS_SPREAD_SLAB, cs, val);
|
||||
break;
|
||||
default:
|
||||
retval = -EINVAL;
|
||||
break;
|
||||
}
|
||||
out_unlock:
|
||||
cpuset_unlock();
|
||||
cpus_read_unlock();
|
||||
return retval;
|
||||
}
|
||||
|
||||
/*
|
||||
* for the common functions, 'private' gives the type of file
|
||||
*/
|
||||
|
||||
struct cftype cpuset1_files[] = {
|
||||
{
|
||||
.name = "cpus",
|
||||
.seq_show = cpuset_common_seq_show,
|
||||
.write = cpuset_write_resmask,
|
||||
.max_write_len = (100U + 6 * NR_CPUS),
|
||||
.private = FILE_CPULIST,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "mems",
|
||||
.seq_show = cpuset_common_seq_show,
|
||||
.write = cpuset_write_resmask,
|
||||
.max_write_len = (100U + 6 * MAX_NUMNODES),
|
||||
.private = FILE_MEMLIST,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "effective_cpus",
|
||||
.seq_show = cpuset_common_seq_show,
|
||||
.private = FILE_EFFECTIVE_CPULIST,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "effective_mems",
|
||||
.seq_show = cpuset_common_seq_show,
|
||||
.private = FILE_EFFECTIVE_MEMLIST,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "cpu_exclusive",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_CPU_EXCLUSIVE,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "mem_exclusive",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_MEM_EXCLUSIVE,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "mem_hardwall",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_MEM_HARDWALL,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "sched_load_balance",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_SCHED_LOAD_BALANCE,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "sched_relax_domain_level",
|
||||
.read_s64 = cpuset_read_s64,
|
||||
.write_s64 = cpuset_write_s64,
|
||||
.private = FILE_SCHED_RELAX_DOMAIN_LEVEL,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "memory_migrate",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_MEMORY_MIGRATE,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "memory_pressure",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.private = FILE_MEMORY_PRESSURE,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "memory_spread_page",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_SPREAD_PAGE,
|
||||
},
|
||||
|
||||
{
|
||||
/* obsolete, may be removed in the future */
|
||||
.name = "memory_spread_slab",
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_SPREAD_SLAB,
|
||||
},
|
||||
|
||||
{
|
||||
.name = "memory_pressure_enabled",
|
||||
.flags = CFTYPE_ONLY_ON_ROOT,
|
||||
.read_u64 = cpuset_read_u64,
|
||||
.write_u64 = cpuset_write_u64,
|
||||
.private = FILE_MEMORY_PRESSURE_ENABLED,
|
||||
},
|
||||
|
||||
{ } /* terminate */
|
||||
};
|
File diff suppressed because it is too large
Load Diff
@ -244,7 +244,6 @@ static void pids_event(struct pids_cgroup *pids_forking,
|
||||
struct pids_cgroup *pids_over_limit)
|
||||
{
|
||||
struct pids_cgroup *p = pids_forking;
|
||||
bool limit = false;
|
||||
|
||||
/* Only log the first time limit is hit. */
|
||||
if (atomic64_inc_return(&p->events_local[PIDCG_FORKFAIL]) == 1) {
|
||||
@ -252,20 +251,17 @@ static void pids_event(struct pids_cgroup *pids_forking,
|
||||
pr_cont_cgroup_path(p->css.cgroup);
|
||||
pr_cont("\n");
|
||||
}
|
||||
cgroup_file_notify(&p->events_local_file);
|
||||
if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
|
||||
cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
|
||||
return;
|
||||
|
||||
for (; parent_pids(p); p = parent_pids(p)) {
|
||||
if (p == pids_over_limit) {
|
||||
limit = true;
|
||||
atomic64_inc(&p->events_local[PIDCG_MAX]);
|
||||
cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS) {
|
||||
cgroup_file_notify(&p->events_local_file);
|
||||
return;
|
||||
}
|
||||
if (limit)
|
||||
atomic64_inc(&p->events[PIDCG_MAX]);
|
||||
|
||||
atomic64_inc(&pids_over_limit->events_local[PIDCG_MAX]);
|
||||
cgroup_file_notify(&pids_over_limit->events_local_file);
|
||||
|
||||
for (p = pids_over_limit; parent_pids(p); p = parent_pids(p)) {
|
||||
atomic64_inc(&p->events[PIDCG_MAX]);
|
||||
cgroup_file_notify(&p->events_file);
|
||||
}
|
||||
}
|
||||
@ -276,15 +272,10 @@ static void pids_event(struct pids_cgroup *pids_forking,
|
||||
*/
|
||||
static int pids_can_fork(struct task_struct *task, struct css_set *cset)
|
||||
{
|
||||
struct cgroup_subsys_state *css;
|
||||
struct pids_cgroup *pids, *pids_over_limit;
|
||||
int err;
|
||||
|
||||
if (cset)
|
||||
css = cset->subsys[pids_cgrp_id];
|
||||
else
|
||||
css = task_css_check(current, pids_cgrp_id, true);
|
||||
pids = css_pids(css);
|
||||
pids = css_pids(cset->subsys[pids_cgrp_id]);
|
||||
err = pids_try_charge(pids, 1, &pids_over_limit);
|
||||
if (err)
|
||||
pids_event(pids, pids_over_limit);
|
||||
@ -294,14 +285,9 @@ static int pids_can_fork(struct task_struct *task, struct css_set *cset)
|
||||
|
||||
static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
|
||||
{
|
||||
struct cgroup_subsys_state *css;
|
||||
struct pids_cgroup *pids;
|
||||
|
||||
if (cset)
|
||||
css = cset->subsys[pids_cgrp_id];
|
||||
else
|
||||
css = task_css_check(current, pids_cgrp_id, true);
|
||||
pids = css_pids(css);
|
||||
pids = css_pids(cset->subsys[pids_cgrp_id]);
|
||||
pids_uncharge(pids, 1);
|
||||
}
|
||||
|
||||
|
@ -2311,7 +2311,6 @@ __latent_entropy struct task_struct *copy_process(
|
||||
#endif
|
||||
#ifdef CONFIG_CPUSETS
|
||||
p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
|
||||
p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
|
||||
seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock);
|
||||
#endif
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
|
@ -34,7 +34,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
|
||||
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
|
||||
earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
|
||||
nmi_backtrace.o win_minmax.o memcat_p.o \
|
||||
buildid.o objpool.o
|
||||
buildid.o objpool.o union_find.o
|
||||
|
||||
lib-$(CONFIG_PRINTK) += dump_stack.o
|
||||
lib-$(CONFIG_SMP) += cpumask.o
|
||||
|
49
lib/union_find.c
Normal file
49
lib/union_find.c
Normal file
@ -0,0 +1,49 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#include <linux/union_find.h>
|
||||
|
||||
/**
|
||||
* uf_find - Find the root of a node and perform path compression
|
||||
* @node: the node to find the root of
|
||||
*
|
||||
* This function returns the root of the node by following the parent
|
||||
* pointers. It also performs path compression, making the tree shallower.
|
||||
*
|
||||
* Returns the root node of the set containing node.
|
||||
*/
|
||||
struct uf_node *uf_find(struct uf_node *node)
|
||||
{
|
||||
struct uf_node *parent;
|
||||
|
||||
while (node->parent != node) {
|
||||
parent = node->parent;
|
||||
node->parent = parent->parent;
|
||||
node = parent;
|
||||
}
|
||||
return node;
|
||||
}
|
||||
|
||||
/**
|
||||
* uf_union - Merge two sets, using union by rank
|
||||
* @node1: the first node
|
||||
* @node2: the second node
|
||||
*
|
||||
* This function merges the sets containing node1 and node2, by comparing
|
||||
* the ranks to keep the tree balanced.
|
||||
*/
|
||||
void uf_union(struct uf_node *node1, struct uf_node *node2)
|
||||
{
|
||||
struct uf_node *root1 = uf_find(node1);
|
||||
struct uf_node *root2 = uf_find(node2);
|
||||
|
||||
if (root1 == root2)
|
||||
return;
|
||||
|
||||
if (root1->rank < root2->rank) {
|
||||
root1->parent = root2;
|
||||
} else if (root1->rank > root2->rank) {
|
||||
root2->parent = root1;
|
||||
} else {
|
||||
root2->parent = root1;
|
||||
root1->rank++;
|
||||
}
|
||||
}
|
@ -84,6 +84,20 @@ echo member > test/cpuset.cpus.partition
|
||||
echo "" > test/cpuset.cpus
|
||||
[[ $RESULT -eq 0 ]] && skip_test "Child cgroups are using cpuset!"
|
||||
|
||||
#
|
||||
# If isolated CPUs have been reserved at boot time (as shown in
|
||||
# cpuset.cpus.isolated), these isolated CPUs should be outside of CPUs 0-7
|
||||
# that will be used by this script for testing purpose. If not, some of
|
||||
# the tests may fail incorrectly. These isolated CPUs will also be removed
|
||||
# before being compared with the expected results.
|
||||
#
|
||||
BOOT_ISOLCPUS=$(cat $CGROUP2/cpuset.cpus.isolated)
|
||||
if [[ -n "$BOOT_ISOLCPUS" ]]
|
||||
then
|
||||
[[ $(echo $BOOT_ISOLCPUS | sed -e "s/[,-].*//") -le 7 ]] &&
|
||||
skip_test "Pre-isolated CPUs ($BOOT_ISOLCPUS) overlap CPUs to be tested"
|
||||
echo "Pre-isolated CPUs: $BOOT_ISOLCPUS"
|
||||
fi
|
||||
cleanup()
|
||||
{
|
||||
online_cpus
|
||||
@ -321,7 +335,7 @@ TEST_MATRIX=(
|
||||
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
|
||||
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
|
||||
#
|
||||
# Incorrect change to cpuset.cpus invalidates partition root
|
||||
# Incorrect change to cpuset.cpus[.exclusive] invalidates partition root
|
||||
#
|
||||
# Adding CPUs to partition root that are not in parent's
|
||||
# cpuset.cpus is allowed, but those extra CPUs are ignored.
|
||||
@ -365,6 +379,16 @@ TEST_MATRIX=(
|
||||
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
|
||||
" C0-3 . . C4-5 X5 . . . 0 A1:0-3,B1:4-5"
|
||||
|
||||
# Child partition root that try to take all CPUs from parent partition
|
||||
# with tasks will remain invalid.
|
||||
" C1-4:P1:S+ P1 . . . . . . 0 A1:1-4,A2:1-4 A1:P1,A2:P-1"
|
||||
" C1-4:P1:S+ P1 . . . C1-4 . . 0 A1,A2:1-4 A1:P1,A2:P1"
|
||||
" C1-4:P1:S+ P1 . . T C1-4 . . 0 A1:1-4,A2:1-4 A1:P1,A2:P-1"
|
||||
|
||||
# Clearing of cpuset.cpus with a preset cpuset.cpus.exclusive shouldn't
|
||||
# affect cpuset.cpus.exclusive.effective.
|
||||
" C1-4:X3:S+ C1:X3 . . . C . . 0 A2:1-4,XA2:3"
|
||||
|
||||
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
|
||||
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
|
||||
# Failure cases:
|
||||
@ -632,7 +656,8 @@ check_cgroup_states()
|
||||
# Note that isolated CPUs from the sched/domains context include offline
|
||||
# CPUs as well as CPUs in non-isolated 1-CPU partition. Those CPUs may
|
||||
# not be included in the cpuset.cpus.isolated control file which contains
|
||||
# only CPUs in isolated partitions.
|
||||
# only CPUs in isolated partitions as well as those that are isolated at
|
||||
# boot time.
|
||||
#
|
||||
# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
|
||||
# <isolcpus1> - expected sched/domains value
|
||||
@ -659,10 +684,14 @@ check_isolcpus()
|
||||
fi
|
||||
|
||||
#
|
||||
# Check the debug isolated cpumask, if present
|
||||
# Check cpuset.cpus.isolated cpumask
|
||||
#
|
||||
[[ -f $ISCPUS ]] && {
|
||||
if [[ -z "$BOOT_ISOLCPUS" ]]
|
||||
then
|
||||
ISOLCPUS=$(cat $ISCPUS)
|
||||
else
|
||||
ISOLCPUS=$(cat $ISCPUS | sed -e "s/,*$BOOT_ISOLCPUS//")
|
||||
fi
|
||||
[[ "$EXPECT_VAL2" != "$ISOLCPUS" ]] && {
|
||||
# Take a 50ms pause and try again
|
||||
pause 0.05
|
||||
@ -670,7 +699,6 @@ check_isolcpus()
|
||||
}
|
||||
[[ "$EXPECT_VAL2" != "$ISOLCPUS" ]] && return 1
|
||||
ISOLCPUS=
|
||||
}
|
||||
|
||||
#
|
||||
# Use the sched domain in debugfs to check isolated CPUs, if available
|
||||
@ -703,6 +731,9 @@ check_isolcpus()
|
||||
fi
|
||||
done
|
||||
[[ "$ISOLCPUS" = *- ]] && ISOLCPUS=${ISOLCPUS}$LASTISOLCPU
|
||||
[[ -n "BOOT_ISOLCPUS" ]] &&
|
||||
ISOLCPUS=$(echo $ISOLCPUS | sed -e "s/,*$BOOT_ISOLCPUS//")
|
||||
|
||||
[[ "$EXPECT_VAL" = "$ISOLCPUS" ]]
|
||||
}
|
||||
|
||||
@ -720,7 +751,8 @@ test_fail()
|
||||
}
|
||||
|
||||
#
|
||||
# Check to see if there are unexpected isolated CPUs left
|
||||
# Check to see if there are unexpected isolated CPUs left beyond the boot
|
||||
# time isolated ones.
|
||||
#
|
||||
null_isolcpus_check()
|
||||
{
|
||||
|
77
tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
Executable file
77
tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
Executable file
@ -0,0 +1,77 @@
|
||||
#!/bin/bash
|
||||
# SPDX-License-Identifier: GPL-2.0
|
||||
#
|
||||
# Basc test for cpuset v1 interfaces write/read
|
||||
#
|
||||
|
||||
skip_test() {
|
||||
echo "$1"
|
||||
echo "Test SKIPPED"
|
||||
exit 4 # ksft_skip
|
||||
}
|
||||
|
||||
write_test() {
|
||||
dir=$1
|
||||
interface=$2
|
||||
value=$3
|
||||
original=$(cat $dir/$interface)
|
||||
echo "testing $interface $value"
|
||||
echo $value > $dir/$interface
|
||||
new=$(cat $dir/$interface)
|
||||
[[ $value -ne $(cat $dir/$interface) ]] && {
|
||||
echo "$interface write $value failed: new:$new"
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
|
||||
[[ $(id -u) -eq 0 ]] || skip_test "Test must be run as root!"
|
||||
|
||||
# Find cpuset v1 mount point
|
||||
CPUSET=$(mount -t cgroup | grep cpuset | head -1 | awk '{print $3}')
|
||||
[[ -n "$CPUSET" ]] || skip_test "cpuset v1 mount point not found!"
|
||||
|
||||
#
|
||||
# Create a test cpuset, read write test
|
||||
#
|
||||
TDIR=test$$
|
||||
[[ -d $CPUSET/$TDIR ]] || mkdir $CPUSET/$TDIR
|
||||
|
||||
ITF_MATRIX=(
|
||||
#interface value expect root_only
|
||||
'cpuset.cpus 0-1 0-1 0'
|
||||
'cpuset.mem_exclusive 1 1 0'
|
||||
'cpuset.mem_exclusive 0 0 0'
|
||||
'cpuset.mem_hardwall 1 1 0'
|
||||
'cpuset.mem_hardwall 0 0 0'
|
||||
'cpuset.memory_migrate 1 1 0'
|
||||
'cpuset.memory_migrate 0 0 0'
|
||||
'cpuset.memory_spread_page 1 1 0'
|
||||
'cpuset.memory_spread_page 0 0 0'
|
||||
'cpuset.memory_spread_slab 1 1 0'
|
||||
'cpuset.memory_spread_slab 0 0 0'
|
||||
'cpuset.mems 0 0 0'
|
||||
'cpuset.sched_load_balance 1 1 0'
|
||||
'cpuset.sched_load_balance 0 0 0'
|
||||
'cpuset.sched_relax_domain_level 2 2 0'
|
||||
'cpuset.memory_pressure_enabled 1 1 1'
|
||||
'cpuset.memory_pressure_enabled 0 0 1'
|
||||
)
|
||||
|
||||
run_test()
|
||||
{
|
||||
cnt="${ITF_MATRIX[@]}"
|
||||
for i in "${ITF_MATRIX[@]}" ; do
|
||||
args=($i)
|
||||
root_only=${args[3]}
|
||||
[[ $root_only -eq 1 ]] && {
|
||||
write_test "$CPUSET" "${args[0]}" "${args[1]}" "${args[2]}"
|
||||
continue
|
||||
}
|
||||
write_test "$CPUSET/$TDIR" "${args[0]}" "${args[1]}" "${args[2]}"
|
||||
done
|
||||
}
|
||||
|
||||
run_test
|
||||
rmdir $CPUSET/$TDIR
|
||||
echo "Test PASSED"
|
||||
exit 0
|
Loading…
Reference in New Issue
Block a user