License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2005-04-16 15:20:36 -07:00
|
|
|
#ifndef __NET_PKT_CLS_H
|
|
|
|
#define __NET_PKT_CLS_H
|
|
|
|
|
|
|
|
#include <linux/pkt_cls.h>
|
2017-10-26 18:24:28 -07:00
|
|
|
#include <linux/workqueue.h>
|
2005-04-16 15:20:36 -07:00
|
|
|
#include <net/sch_generic.h>
|
|
|
|
#include <net/act_api.h>
|
2019-06-15 11:03:49 +02:00
|
|
|
#include <net/net_namespace.h>
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2018-07-30 14:30:44 +02:00
|
|
|
/* TC action not accessible from user space */
|
2019-06-24 23:13:35 +01:00
|
|
|
#define TC_ACT_CONSUMED (TC_ACT_VALUE_MAX + 1)
|
2018-07-30 14:30:44 +02:00
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
/* Basic packet classifier frontend definitions. */
|
|
|
|
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_walker {
|
2005-04-16 15:20:36 -07:00
|
|
|
int stop;
|
|
|
|
int skip;
|
|
|
|
int count;
|
2019-02-25 17:38:31 +02:00
|
|
|
bool nonempty;
|
2018-07-09 13:29:11 +03:00
|
|
|
unsigned long cookie;
|
2017-08-04 21:31:43 -07:00
|
|
|
int (*fn)(struct tcf_proto *, void *node, struct tcf_walker *);
|
2005-04-16 15:20:36 -07:00
|
|
|
};
|
|
|
|
|
2013-07-30 22:47:13 -07:00
|
|
|
int register_tcf_proto_ops(struct tcf_proto_ops *ops);
|
2022-07-13 09:54:38 +08:00
|
|
|
void unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
|
2024-02-01 14:09:40 +01:00
|
|
|
#define NET_CLS_ALIAS_PREFIX "net-cls-"
|
|
|
|
#define MODULE_ALIAS_NET_CLS(kind) MODULE_ALIAS(NET_CLS_ALIAS_PREFIX kind)
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-10-19 15:50:29 +02:00
|
|
|
struct tcf_block_ext_info {
|
2019-07-09 22:55:41 +02:00
|
|
|
enum flow_block_binder_type binder_type;
|
2017-11-03 11:46:24 +01:00
|
|
|
tcf_chain_head_change_t *chain_head_change;
|
|
|
|
void *chain_head_change_priv;
|
2018-01-17 11:46:46 +01:00
|
|
|
u32 block_index;
|
2017-10-19 15:50:29 +02:00
|
|
|
};
|
|
|
|
|
2020-06-27 01:45:26 +03:00
|
|
|
struct tcf_qevent {
|
|
|
|
struct tcf_block *block;
|
|
|
|
struct tcf_block_ext_info info;
|
|
|
|
struct tcf_proto __rcu *filter_chain;
|
|
|
|
};
|
|
|
|
|
2017-10-19 15:50:31 +02:00
|
|
|
struct tcf_block_cb;
|
2018-05-23 15:26:53 -07:00
|
|
|
bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
|
2017-10-19 15:50:31 +02:00
|
|
|
|
2017-02-15 11:57:50 +01:00
|
|
|
#ifdef CONFIG_NET_CLS
|
2018-07-27 09:45:05 +02:00
|
|
|
struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
|
|
|
|
u32 chain_index);
|
|
|
|
void tcf_chain_put_by_act(struct tcf_chain *chain);
|
2019-02-11 10:55:36 +02:00
|
|
|
struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
|
|
|
|
struct tcf_chain *chain);
|
2019-02-11 10:55:40 +02:00
|
|
|
struct tcf_proto *tcf_get_next_proto(struct tcf_chain *chain,
|
2020-11-27 17:12:05 +02:00
|
|
|
struct tcf_proto *tp);
|
2018-01-17 11:46:48 +01:00
|
|
|
void tcf_block_netif_keep_dst(struct tcf_block *block);
|
2017-05-17 11:07:55 +02:00
|
|
|
int tcf_block_get(struct tcf_block **p_block,
|
2017-12-20 12:35:19 -05:00
|
|
|
struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
|
|
|
|
struct netlink_ext_ack *extack);
|
2017-11-03 11:46:24 +01:00
|
|
|
int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
|
2017-12-20 12:35:19 -05:00
|
|
|
struct tcf_block_ext_info *ei,
|
|
|
|
struct netlink_ext_ack *extack);
|
2017-05-17 11:07:55 +02:00
|
|
|
void tcf_block_put(struct tcf_block *block);
|
2017-11-03 11:46:24 +01:00
|
|
|
void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
|
2017-10-19 15:50:29 +02:00
|
|
|
struct tcf_block_ext_info *ei);
|
2023-02-18 00:36:14 +02:00
|
|
|
int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
|
|
|
|
int police, struct tcf_proto *tp, u32 handle, bool used_action_miss);
|
2017-10-13 14:00:59 +02:00
|
|
|
|
2018-01-17 11:46:46 +01:00
|
|
|
static inline bool tcf_block_shared(struct tcf_block *block)
|
|
|
|
{
|
|
|
|
return block->index;
|
|
|
|
}
|
|
|
|
|
2019-07-10 20:12:29 +03:00
|
|
|
static inline bool tcf_block_non_null_shared(struct tcf_block *block)
|
|
|
|
{
|
|
|
|
return block && block->index;
|
|
|
|
}
|
|
|
|
|
net: sched: make skip_sw actually skip software
TC filters come in 3 variants:
- no flag (try to process in hardware, but fallback to software))
- skip_hw (do not process filter by hardware)
- skip_sw (do not process filter by software)
However skip_sw is implemented so that the skip_sw
flag can first be checked, after it has been matched.
IMHO it's common when using skip_sw, to use it on all rules.
So if all filters in a block is skip_sw filters, then
we can bail early, we can thus avoid having to match
the filters, just to check for the skip_sw flag.
This patch adds a bypass, for when only TC skip_sw rules
are used. The bypass is guarded by a static key, to avoid
harming other workloads.
There are 3 ways that a packet from a skip_sw ruleset, can
end up in the kernel path. Although the send packets to a
non-existent chain way is only improved a few percents, then
I believe it's worth optimizing the trap and fall-though
use-cases.
+----------------------------+--------+--------+--------+
| Test description | Pre- | Post- | Rel. |
| | kpps | kpps | chg. |
+----------------------------+--------+--------+--------+
| basic forwarding + notrack | 3589.3 | 3587.9 | 1.00x |
| switch to eswitch mode | 3081.8 | 3094.7 | 1.00x |
| add ingress qdisc | 3042.9 | 3063.6 | 1.01x |
| tc forward in hw / skip_sw |37024.7 |37028.4 | 1.00x |
| tc forward in sw / skip_hw | 3245.0 | 3245.3 | 1.00x |
+----------------------------+--------+--------+--------+
| tests with only skip_sw rules below: |
+----------------------------+--------+--------+--------+
| 1 non-matching rule | 2694.7 | 3058.7 | 1.14x |
| 1 n-m rule, match trap | 2611.2 | 3323.1 | 1.27x |
| 1 n-m rule, goto non-chain | 2886.8 | 2945.9 | 1.02x |
| 5 non-matching rules | 1958.2 | 3061.3 | 1.56x |
| 5 n-m rules, match trap | 1911.9 | 3327.0 | 1.74x |
| 5 n-m rules, goto non-chain| 2883.1 | 2947.5 | 1.02x |
| 10 non-matching rules | 1466.3 | 3062.8 | 2.09x |
| 10 n-m rules, match trap | 1444.3 | 3317.9 | 2.30x |
| 10 n-m rules,goto non-chain| 2883.1 | 2939.5 | 1.02x |
| 25 non-matching rules | 838.5 | 3058.9 | 3.65x |
| 25 n-m rules, match trap | 824.5 | 3323.0 | 4.03x |
| 25 n-m rules,goto non-chain| 2875.8 | 2944.7 | 1.02x |
| 50 non-matching rules | 488.1 | 3054.7 | 6.26x |
| 50 n-m rules, match trap | 484.9 | 3318.5 | 6.84x |
| 50 n-m rules,goto non-chain| 2884.1 | 2939.7 | 1.02x |
+----------------------------+--------+--------+--------+
perf top (25 n-m skip_sw rules - pre patch):
20.39% [kernel] [k] __skb_flow_dissect
16.43% [kernel] [k] rhashtable_jhash2
10.58% [kernel] [k] fl_classify
10.23% [kernel] [k] fl_mask_lookup
4.79% [kernel] [k] memset_orig
2.58% [kernel] [k] tcf_classify
1.47% [kernel] [k] __x86_indirect_thunk_rax
1.42% [kernel] [k] __dev_queue_xmit
1.36% [kernel] [k] nft_do_chain
1.21% [kernel] [k] __rcu_read_lock
perf top (25 n-m skip_sw rules - post patch):
5.12% [kernel] [k] __dev_queue_xmit
4.77% [kernel] [k] nft_do_chain
3.65% [kernel] [k] dev_gro_receive
3.41% [kernel] [k] check_preemption_disabled
3.14% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
2.88% [kernel] [k] __netif_receive_skb_core.constprop.0
2.49% [kernel] [k] mlx5e_xmit
2.15% [kernel] [k] ip_forward
1.95% [kernel] [k] mlx5e_tc_restore_tunnel
1.92% [kernel] [k] vlan_gro_receive
Test setup:
DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
Data rate measured on switch (Extreme X690), and DUT connected as
a router on a stick, with pktgen and pktsink as VLANs.
Pktgen-dpdk was in range 36.6-37.7 Mpps 64B packets across all tests.
Full test data at https://files.fiberby.net/ast/2024/tc_skip_sw/v2_tests/
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-25 20:47:36 +00:00
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
DECLARE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key);
|
|
|
|
|
|
|
|
static inline bool tcf_block_bypass_sw(struct tcf_block *block)
|
|
|
|
{
|
|
|
|
return block && block->bypass_wanted;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-10-13 14:00:59 +02:00
|
|
|
static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
|
|
|
|
{
|
2018-01-17 11:46:46 +01:00
|
|
|
WARN_ON(tcf_block_shared(block));
|
2017-10-13 14:00:59 +02:00
|
|
|
return block->q;
|
|
|
|
}
|
|
|
|
|
2021-07-28 20:08:00 +02:00
|
|
|
int tcf_classify(struct sk_buff *skb,
|
|
|
|
const struct tcf_block *block,
|
|
|
|
const struct tcf_proto *tp, struct tcf_result *res,
|
|
|
|
bool compat_mode);
|
2017-05-17 11:07:54 +02:00
|
|
|
|
2022-09-16 10:02:43 +08:00
|
|
|
static inline bool tc_cls_stats_dump(struct tcf_proto *tp,
|
|
|
|
struct tcf_walker *arg,
|
|
|
|
void *filter)
|
|
|
|
{
|
|
|
|
if (arg->count >= arg->skip && arg->fn(tp, filter, arg) < 0) {
|
|
|
|
arg->stop = 1;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
arg->count++;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2017-02-15 11:57:50 +01:00
|
|
|
#else
|
2019-05-04 04:46:25 -07:00
|
|
|
static inline bool tcf_block_shared(struct tcf_block *block)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-07-10 20:12:29 +03:00
|
|
|
static inline bool tcf_block_non_null_shared(struct tcf_block *block)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2017-05-17 11:07:55 +02:00
|
|
|
static inline
|
|
|
|
int tcf_block_get(struct tcf_block **p_block,
|
2017-12-22 15:52:05 +00:00
|
|
|
struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-05-17 11:07:55 +02:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-10-19 15:50:29 +02:00
|
|
|
static inline
|
2017-11-03 11:46:24 +01:00
|
|
|
int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
|
2018-01-03 17:30:45 -08:00
|
|
|
struct tcf_block_ext_info *ei,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-10-19 15:50:29 +02:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-05-17 11:07:55 +02:00
|
|
|
static inline void tcf_block_put(struct tcf_block *block)
|
2017-02-15 11:57:50 +01:00
|
|
|
{
|
|
|
|
}
|
2017-05-17 11:07:54 +02:00
|
|
|
|
2017-10-19 15:50:29 +02:00
|
|
|
static inline
|
2017-11-03 11:46:24 +01:00
|
|
|
void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
|
2017-10-19 15:50:29 +02:00
|
|
|
struct tcf_block_ext_info *ei)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2017-10-13 14:00:59 +02:00
|
|
|
static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2021-07-28 20:08:00 +02:00
|
|
|
static inline int tcf_classify(struct sk_buff *skb,
|
|
|
|
const struct tcf_block *block,
|
|
|
|
const struct tcf_proto *tp,
|
2017-05-17 11:07:54 +02:00
|
|
|
struct tcf_result *res, bool compat_mode)
|
|
|
|
{
|
|
|
|
return TC_ACT_UNSPEC;
|
|
|
|
}
|
2020-02-16 12:01:21 +02:00
|
|
|
|
2017-02-15 11:57:50 +01:00
|
|
|
#endif
|
2017-02-09 14:38:56 +01:00
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
static inline unsigned long
|
|
|
|
__cls_set_class(unsigned long *clp, unsigned long cl)
|
|
|
|
{
|
2014-09-30 16:07:24 -07:00
|
|
|
return xchg(clp, cl);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2020-01-23 16:26:18 -08:00
|
|
|
static inline void
|
|
|
|
__tcf_bind_filter(struct Qdisc *q, struct tcf_result *r, unsigned long base)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2020-01-23 16:26:18 -08:00
|
|
|
unsigned long cl;
|
2017-10-13 14:01:00 +02:00
|
|
|
|
2020-01-23 16:26:18 -08:00
|
|
|
cl = q->ops->cl_ops->bind_tcf(q, base, r->classid);
|
|
|
|
cl = __cls_set_class(&r->class, cl);
|
|
|
|
if (cl)
|
|
|
|
q->ops->cl_ops->unbind_tcf(q, cl);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
|
|
|
|
{
|
2017-10-13 14:01:00 +02:00
|
|
|
struct Qdisc *q = tp->chain->block->q;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-10-13 14:01:00 +02:00
|
|
|
/* Check q as it is not set for shared blocks. In that case,
|
|
|
|
* setting class is not supported.
|
|
|
|
*/
|
|
|
|
if (!q)
|
|
|
|
return;
|
2020-01-23 16:26:18 -08:00
|
|
|
sch_tree_lock(q);
|
|
|
|
__tcf_bind_filter(q, r, base);
|
|
|
|
sch_tree_unlock(q);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
__tcf_unbind_filter(struct Qdisc *q, struct tcf_result *r)
|
|
|
|
{
|
|
|
|
unsigned long cl;
|
|
|
|
|
|
|
|
if ((cl = __cls_set_class(&r->class, 0)) != 0)
|
2017-10-13 14:01:00 +02:00
|
|
|
q->ops->cl_ops->unbind_tcf(q, cl);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
|
|
|
|
{
|
2017-10-13 14:01:00 +02:00
|
|
|
struct Qdisc *q = tp->chain->block->q;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-10-13 14:01:00 +02:00
|
|
|
if (!q)
|
|
|
|
return;
|
2020-01-23 16:26:18 -08:00
|
|
|
__tcf_unbind_filter(q, r);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2022-09-27 20:48:54 +08:00
|
|
|
static inline void tc_cls_bind_class(u32 classid, unsigned long cl,
|
|
|
|
void *q, struct tcf_result *res,
|
|
|
|
unsigned long base)
|
|
|
|
{
|
|
|
|
if (res->classid == classid) {
|
|
|
|
if (cl)
|
|
|
|
__tcf_bind_filter(q, res, base);
|
|
|
|
else
|
|
|
|
__tcf_unbind_filter(q, res);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_exts {
|
2005-04-16 15:20:36 -07:00
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
2013-12-15 20:15:05 -08:00
|
|
|
__u32 type; /* for backward compat(TCA_OLD_COMPAT) */
|
2016-08-13 22:35:00 -07:00
|
|
|
int nr_actions;
|
|
|
|
struct tc_action **actions;
|
2021-12-09 23:44:24 -08:00
|
|
|
struct net *net;
|
|
|
|
netns_tracker ns_tracker;
|
2023-02-18 00:36:14 +02:00
|
|
|
struct tcf_exts_miss_cookie_node *miss_cookie_node;
|
2005-04-16 15:20:36 -07:00
|
|
|
#endif
|
2013-12-15 20:15:07 -08:00
|
|
|
/* Map to export classifier specific extension TLV types to the
|
|
|
|
* generic extensions API. Unsupported extensions must be set to 0.
|
|
|
|
*/
|
2005-04-16 15:20:36 -07:00
|
|
|
int action;
|
|
|
|
int police;
|
|
|
|
};
|
|
|
|
|
2019-02-20 21:37:42 -08:00
|
|
|
static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
|
|
|
|
int action, int police)
|
2013-12-15 20:15:05 -08:00
|
|
|
{
|
2023-02-18 00:36:14 +02:00
|
|
|
#ifdef CONFIG_NET_CLS
|
|
|
|
return tcf_exts_init_ex(exts, net, action, police, NULL, 0, false);
|
|
|
|
#else
|
|
|
|
return -EOPNOTSUPP;
|
2013-12-15 20:15:05 -08:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-11-06 13:47:19 -08:00
|
|
|
/* Return false if the netns is being destroyed in cleanup_net(). Callers
|
|
|
|
* need to do cleanup synchronously in this case, otherwise may race with
|
|
|
|
* tc_action_net_exit(). Return true for other cases.
|
|
|
|
*/
|
|
|
|
static inline bool tcf_exts_get_net(struct tcf_exts *exts)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
exts->net = maybe_get_net(exts->net);
|
2021-12-09 23:44:24 -08:00
|
|
|
if (exts->net)
|
|
|
|
netns_tracker_alloc(exts->net, &exts->ns_tracker, GFP_KERNEL);
|
2017-11-06 13:47:19 -08:00
|
|
|
return exts->net != NULL;
|
|
|
|
#else
|
|
|
|
return true;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void tcf_exts_put_net(struct tcf_exts *exts)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
if (exts->net)
|
2021-12-09 23:44:24 -08:00
|
|
|
put_net_track(exts->net, &exts->ns_tracker);
|
2017-11-06 13:47:19 -08:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2016-08-13 22:35:00 -07:00
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
2018-08-19 12:22:09 -07:00
|
|
|
#define tcf_exts_for_each_action(i, a, exts) \
|
|
|
|
for (i = 0; i < TCA_ACT_MAX_PRIO && ((a) = (exts)->actions[i]); i++)
|
|
|
|
#else
|
|
|
|
#define tcf_exts_for_each_action(i, a, exts) \
|
2018-08-22 17:25:44 +02:00
|
|
|
for (; 0; (void)(i), (void)(a), (void)(exts))
|
2016-08-13 22:35:00 -07:00
|
|
|
#endif
|
|
|
|
|
2021-12-17 19:16:22 +01:00
|
|
|
#define tcf_act_for_each_action(i, a, actions) \
|
|
|
|
for (i = 0; i < TCA_ACT_MAX_PRIO && ((a) = actions[i]); i++)
|
|
|
|
|
2023-02-12 15:25:16 +02:00
|
|
|
static inline bool tc_act_in_hw(struct tc_action *act)
|
|
|
|
{
|
|
|
|
return !!act->in_hw_count;
|
|
|
|
}
|
|
|
|
|
2017-05-31 08:06:43 -07:00
|
|
|
static inline void
|
2021-12-17 19:16:24 +01:00
|
|
|
tcf_exts_hw_stats_update(const struct tcf_exts *exts,
|
2023-02-12 15:25:16 +02:00
|
|
|
struct flow_stats *stats,
|
|
|
|
bool use_act_stats)
|
2017-05-31 08:06:43 -07:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < exts->nr_actions; i++) {
|
|
|
|
struct tc_action *a = exts->actions[i];
|
|
|
|
|
2023-02-12 15:25:16 +02:00
|
|
|
if (use_act_stats || tc_act_in_hw(a)) {
|
|
|
|
if (!tcf_action_update_hw_stats(a))
|
|
|
|
continue;
|
2021-12-17 19:16:25 +01:00
|
|
|
}
|
2023-02-12 15:25:16 +02:00
|
|
|
|
|
|
|
preempt_disable();
|
|
|
|
tcf_action_stats_update(a, stats->bytes, stats->pkts, stats->drops,
|
|
|
|
stats->lastused, true);
|
|
|
|
preempt_enable();
|
|
|
|
|
|
|
|
a->used_hw_stats = stats->used_hw_stats;
|
|
|
|
a->used_hw_stats_valid = stats->used_hw_stats_valid;
|
2021-12-17 19:16:25 +01:00
|
|
|
}
|
2017-05-31 08:06:43 -07:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-08-04 14:28:58 +02:00
|
|
|
/**
|
|
|
|
* tcf_exts_has_actions - check if at least one action is present
|
|
|
|
* @exts: tc filter extensions handle
|
|
|
|
*
|
|
|
|
* Returns true if at least one action is present.
|
|
|
|
*/
|
|
|
|
static inline bool tcf_exts_has_actions(struct tcf_exts *exts)
|
|
|
|
{
|
2016-08-13 22:34:59 -07:00
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
2017-08-04 14:28:58 +02:00
|
|
|
return exts->nr_actions;
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
}
|
2016-08-13 22:34:59 -07:00
|
|
|
|
2017-08-04 14:28:59 +02:00
|
|
|
/**
|
|
|
|
* tcf_exts_exec - execute tc filter extensions
|
|
|
|
* @skb: socket buffer
|
|
|
|
* @exts: tc filter extensions handle
|
|
|
|
* @res: desired result
|
|
|
|
*
|
2017-08-04 14:29:01 +02:00
|
|
|
* Executes all configured extensions. Returns TC_ACT_OK on a normal execution,
|
2017-08-04 14:28:59 +02:00
|
|
|
* a negative number if the filter must be considered unmatched or
|
|
|
|
* a positive action code (TC_ACT_*) which must be returned to the
|
|
|
|
* underlying layer.
|
|
|
|
*/
|
|
|
|
static inline int
|
|
|
|
tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
|
|
|
|
struct tcf_result *res)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
2017-08-04 14:29:02 +02:00
|
|
|
return tcf_action_exec(skb, exts->actions, exts->nr_actions, res);
|
2017-08-04 14:28:59 +02:00
|
|
|
#endif
|
2017-08-04 14:29:01 +02:00
|
|
|
return TC_ACT_OK;
|
2017-08-04 14:28:59 +02:00
|
|
|
}
|
|
|
|
|
2023-02-18 00:36:14 +02:00
|
|
|
static inline int
|
|
|
|
tcf_exts_exec_ex(struct sk_buff *skb, struct tcf_exts *exts, int act_index,
|
|
|
|
struct tcf_result *res)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
return tcf_action_exec(skb, exts->actions + act_index,
|
|
|
|
exts->nr_actions - act_index, res);
|
|
|
|
#else
|
|
|
|
return TC_ACT_OK;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2013-07-30 22:47:13 -07:00
|
|
|
int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
|
|
|
|
struct nlattr **tb, struct nlattr *rate_tlv,
|
2021-07-29 16:12:14 -07:00
|
|
|
struct tcf_exts *exts, u32 flags,
|
2018-01-18 11:20:52 -05:00
|
|
|
struct netlink_ext_ack *extack);
|
2021-12-17 19:16:28 +01:00
|
|
|
int tcf_exts_validate_ex(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
|
|
|
|
struct nlattr *rate_tlv, struct tcf_exts *exts,
|
|
|
|
u32 flags, u32 fl_flags, struct netlink_ext_ack *extack);
|
2014-09-25 10:26:37 -07:00
|
|
|
void tcf_exts_destroy(struct tcf_exts *exts);
|
2017-08-04 14:29:15 +02:00
|
|
|
void tcf_exts_change(struct tcf_exts *dst, struct tcf_exts *src);
|
2013-12-15 20:15:07 -08:00
|
|
|
int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts);
|
2020-05-15 14:40:12 +03:00
|
|
|
int tcf_exts_terse_dump(struct sk_buff *skb, struct tcf_exts *exts);
|
2013-12-15 20:15:07 -08:00
|
|
|
int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/**
|
|
|
|
* struct tcf_pkt_info - packet information
|
2021-08-03 11:40:19 +02:00
|
|
|
*
|
|
|
|
* @ptr: start of the pkt data
|
|
|
|
* @nexthdr: offset of the next header
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_pkt_info {
|
2005-04-16 15:20:36 -07:00
|
|
|
unsigned char * ptr;
|
|
|
|
int nexthdr;
|
|
|
|
};
|
|
|
|
|
|
|
|
#ifdef CONFIG_NET_EMATCH
|
|
|
|
|
|
|
|
struct tcf_ematch_ops;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct tcf_ematch - extended match (ematch)
|
|
|
|
*
|
|
|
|
* @matchid: identifier to allow userspace to reidentify a match
|
|
|
|
* @flags: flags specifying attributes and the relation to other matches
|
|
|
|
* @ops: the operations lookup table of the corresponding ematch module
|
|
|
|
* @datalen: length of the ematch specific configuration data
|
|
|
|
* @data: ematch specific data
|
2021-08-03 11:40:19 +02:00
|
|
|
* @net: the network namespace
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_ematch {
|
2005-04-16 15:20:36 -07:00
|
|
|
struct tcf_ematch_ops * ops;
|
|
|
|
unsigned long data;
|
|
|
|
unsigned int datalen;
|
|
|
|
u16 matchid;
|
|
|
|
u16 flags;
|
2014-10-05 21:27:53 -07:00
|
|
|
struct net *net;
|
2005-04-16 15:20:36 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
static inline int tcf_em_is_container(struct tcf_ematch *em)
|
|
|
|
{
|
|
|
|
return !em->ops;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int tcf_em_is_simple(struct tcf_ematch *em)
|
|
|
|
{
|
|
|
|
return em->flags & TCF_EM_SIMPLE;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int tcf_em_is_inverted(struct tcf_ematch *em)
|
|
|
|
{
|
|
|
|
return em->flags & TCF_EM_INVERT;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int tcf_em_last_match(struct tcf_ematch *em)
|
|
|
|
{
|
|
|
|
return (em->flags & TCF_EM_REL_MASK) == TCF_EM_REL_END;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int tcf_em_early_end(struct tcf_ematch *em, int result)
|
|
|
|
{
|
|
|
|
if (tcf_em_last_match(em))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
if (result == 0 && em->flags & TCF_EM_REL_AND)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
if (result != 0 && em->flags & TCF_EM_REL_OR)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct tcf_ematch_tree - ematch tree handle
|
|
|
|
*
|
|
|
|
* @hdr: ematch tree header supplied by userspace
|
|
|
|
* @matches: array of ematches
|
|
|
|
*/
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_ematch_tree {
|
2005-04-16 15:20:36 -07:00
|
|
|
struct tcf_ematch_tree_hdr hdr;
|
|
|
|
struct tcf_ematch * matches;
|
|
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct tcf_ematch_ops - ematch module operations
|
|
|
|
*
|
|
|
|
* @kind: identifier (kind) of this ematch module
|
|
|
|
* @datalen: length of expected configuration data (optional)
|
|
|
|
* @change: called during validation (optional)
|
|
|
|
* @match: called during ematch tree evaluation, must return 1/0
|
|
|
|
* @destroy: called during destroyage (optional)
|
|
|
|
* @dump: called during dumping process (optional)
|
|
|
|
* @owner: owner, must be set to THIS_MODULE
|
|
|
|
* @link: link to previous/next ematch module (internal use)
|
|
|
|
*/
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_ematch_ops {
|
2005-04-16 15:20:36 -07:00
|
|
|
int kind;
|
|
|
|
int datalen;
|
2014-10-05 21:27:53 -07:00
|
|
|
int (*change)(struct net *net, void *,
|
2005-04-16 15:20:36 -07:00
|
|
|
int, struct tcf_ematch *);
|
|
|
|
int (*match)(struct sk_buff *, struct tcf_ematch *,
|
|
|
|
struct tcf_pkt_info *);
|
2014-10-05 21:27:53 -07:00
|
|
|
void (*destroy)(struct tcf_ematch *);
|
2005-04-16 15:20:36 -07:00
|
|
|
int (*dump)(struct sk_buff *, struct tcf_ematch *);
|
|
|
|
struct module *owner;
|
|
|
|
struct list_head link;
|
|
|
|
};
|
|
|
|
|
2013-07-30 22:47:13 -07:00
|
|
|
int tcf_em_register(struct tcf_ematch_ops *);
|
|
|
|
void tcf_em_unregister(struct tcf_ematch_ops *);
|
|
|
|
int tcf_em_tree_validate(struct tcf_proto *, struct nlattr *,
|
|
|
|
struct tcf_ematch_tree *);
|
2014-10-05 21:27:53 -07:00
|
|
|
void tcf_em_tree_destroy(struct tcf_ematch_tree *);
|
2013-07-30 22:47:13 -07:00
|
|
|
int tcf_em_tree_dump(struct sk_buff *, struct tcf_ematch_tree *, int);
|
|
|
|
int __tcf_em_tree_match(struct sk_buff *, struct tcf_ematch_tree *,
|
|
|
|
struct tcf_pkt_info *);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/**
|
2024-08-22 13:57:30 +01:00
|
|
|
* tcf_em_tree_match - evaluate an ematch tree
|
2005-04-16 15:20:36 -07:00
|
|
|
*
|
|
|
|
* @skb: socket buffer of the packet in question
|
|
|
|
* @tree: ematch tree to be used for evaluation
|
|
|
|
* @info: packet information examined by classifier
|
|
|
|
*
|
|
|
|
* This function matches @skb against the ematch tree in @tree by going
|
|
|
|
* through all ematches respecting their logic relations returning
|
|
|
|
* as soon as the result is obvious.
|
|
|
|
*
|
|
|
|
* Returns 1 if the ematch tree as-one matches, no ematches are configured
|
|
|
|
* or ematch is not enabled in the kernel, otherwise 0 is returned.
|
|
|
|
*/
|
|
|
|
static inline int tcf_em_tree_match(struct sk_buff *skb,
|
|
|
|
struct tcf_ematch_tree *tree,
|
|
|
|
struct tcf_pkt_info *info)
|
|
|
|
{
|
|
|
|
if (tree->hdr.nmatches)
|
|
|
|
return __tcf_em_tree_match(skb, tree, info);
|
|
|
|
else
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2007-07-11 19:46:26 -07:00
|
|
|
#define MODULE_ALIAS_TCF_EMATCH(kind) MODULE_ALIAS("ematch-kind-" __stringify(kind))
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
#else /* CONFIG_NET_EMATCH */
|
|
|
|
|
2009-11-03 03:26:03 +00:00
|
|
|
struct tcf_ematch_tree {
|
2005-04-16 15:20:36 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
#define tcf_em_tree_validate(tp, tb, t) ((void)(t), 0)
|
2014-10-05 21:27:53 -07:00
|
|
|
#define tcf_em_tree_destroy(t) do { (void)(t); } while(0)
|
2005-04-16 15:20:36 -07:00
|
|
|
#define tcf_em_tree_dump(skb, t, tlv) (0)
|
|
|
|
#define tcf_em_tree_match(skb, t, info) ((void)(info), 1)
|
|
|
|
|
|
|
|
#endif /* CONFIG_NET_EMATCH */
|
|
|
|
|
|
|
|
static inline unsigned char * tcf_get_base_ptr(struct sk_buff *skb, int layer)
|
|
|
|
{
|
|
|
|
switch (layer) {
|
|
|
|
case TCF_LAYER_LINK:
|
2018-01-18 11:32:36 +01:00
|
|
|
return skb_mac_header(skb);
|
2005-04-16 15:20:36 -07:00
|
|
|
case TCF_LAYER_NETWORK:
|
2007-04-10 20:50:43 -07:00
|
|
|
return skb_network_header(skb);
|
2005-04-16 15:20:36 -07:00
|
|
|
case TCF_LAYER_TRANSPORT:
|
2007-04-25 18:04:18 -07:00
|
|
|
return skb_transport_header(skb);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2007-04-20 22:47:35 -07:00
|
|
|
static inline int tcf_valid_offset(const struct sk_buff *skb,
|
|
|
|
const unsigned char *ptr, const int len)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2010-12-21 12:43:16 -08:00
|
|
|
return likely((ptr + len) <= skb_tail_pointer(skb) &&
|
|
|
|
ptr >= skb->head &&
|
|
|
|
(ptr <= (ptr + len)));
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int
|
2018-01-18 11:20:54 -05:00
|
|
|
tcf_change_indev(struct net *net, struct nlattr *indev_tlv,
|
|
|
|
struct netlink_ext_ack *extack)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2014-01-09 16:14:02 -08:00
|
|
|
char indev[IFNAMSIZ];
|
|
|
|
struct net_device *dev;
|
|
|
|
|
2020-11-15 18:08:06 +01:00
|
|
|
if (nla_strscpy(indev, indev_tlv, IFNAMSIZ) < 0) {
|
2020-03-23 21:48:47 +01:00
|
|
|
NL_SET_ERR_MSG_ATTR(extack, indev_tlv,
|
|
|
|
"Interface name too long");
|
2005-04-16 15:20:36 -07:00
|
|
|
return -EINVAL;
|
2018-01-18 11:20:54 -05:00
|
|
|
}
|
2014-01-09 16:14:02 -08:00
|
|
|
dev = __dev_get_by_name(net, indev);
|
2020-03-23 21:48:47 +01:00
|
|
|
if (!dev) {
|
|
|
|
NL_SET_ERR_MSG_ATTR(extack, indev_tlv,
|
|
|
|
"Network device not found");
|
2014-01-09 16:14:02 -08:00
|
|
|
return -ENODEV;
|
2020-03-23 21:48:47 +01:00
|
|
|
}
|
2014-01-09 16:14:02 -08:00
|
|
|
return dev->ifindex;
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2014-01-09 16:14:02 -08:00
|
|
|
static inline bool
|
|
|
|
tcf_match_indev(struct sk_buff *skb, int ifindex)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2014-01-09 16:14:02 -08:00
|
|
|
if (!ifindex)
|
|
|
|
return true;
|
|
|
|
if (!skb->skb_iif)
|
|
|
|
return false;
|
|
|
|
return ifindex == skb->skb_iif;
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2021-12-17 19:16:20 +01:00
|
|
|
int tc_setup_offload_action(struct flow_action *flow_action,
|
2022-04-07 10:35:22 +03:00
|
|
|
const struct tcf_exts *exts,
|
|
|
|
struct netlink_ext_ack *extack);
|
2021-12-17 19:16:20 +01:00
|
|
|
void tc_cleanup_offload_action(struct flow_action *flow_action);
|
2021-12-17 19:16:22 +01:00
|
|
|
int tc_setup_action(struct flow_action *flow_action,
|
2022-04-07 10:35:22 +03:00
|
|
|
struct tc_action *actions[],
|
2023-02-18 00:36:14 +02:00
|
|
|
u32 miss_cookie_base,
|
2022-04-07 10:35:22 +03:00
|
|
|
struct netlink_ext_ack *extack);
|
2019-08-26 16:45:04 +03:00
|
|
|
|
2018-12-11 11:15:46 -08:00
|
|
|
int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
|
2019-08-26 16:44:59 +03:00
|
|
|
void *type_data, bool err_stop, bool rtnl_held);
|
|
|
|
int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
|
|
|
|
enum tc_setup_type type, void *type_data, bool err_stop,
|
|
|
|
u32 *flags, unsigned int *in_hw_count, bool rtnl_held);
|
|
|
|
int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
|
|
|
|
enum tc_setup_type type, void *type_data, bool err_stop,
|
|
|
|
u32 *old_flags, unsigned int *old_in_hw_count,
|
|
|
|
u32 *new_flags, unsigned int *new_in_hw_count,
|
|
|
|
bool rtnl_held);
|
|
|
|
int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
|
|
|
|
enum tc_setup_type type, void *type_data, bool err_stop,
|
|
|
|
u32 *flags, unsigned int *in_hw_count, bool rtnl_held);
|
|
|
|
int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
|
|
|
|
bool add, flow_setup_cb_t *cb,
|
|
|
|
enum tc_setup_type type, void *type_data,
|
|
|
|
void *cb_priv, u32 *flags, unsigned int *in_hw_count);
|
2019-02-02 12:50:45 +01:00
|
|
|
unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
|
2017-10-11 09:41:09 +02:00
|
|
|
|
2020-06-27 01:45:26 +03:00
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
|
|
|
|
enum flow_block_binder_type binder_type,
|
|
|
|
struct nlattr *block_index_attr,
|
|
|
|
struct netlink_ext_ack *extack);
|
|
|
|
void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch);
|
|
|
|
int tcf_qevent_validate_change(struct tcf_qevent *qe, struct nlattr *block_index_attr,
|
|
|
|
struct netlink_ext_ack *extack);
|
|
|
|
struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, struct sk_buff *skb,
|
2020-07-14 20:03:07 +03:00
|
|
|
struct sk_buff **to_free, int *ret);
|
2020-06-27 01:45:26 +03:00
|
|
|
int tcf_qevent_dump(struct sk_buff *skb, int attr_name, struct tcf_qevent *qe);
|
|
|
|
#else
|
|
|
|
static inline int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
|
|
|
|
enum flow_block_binder_type binder_type,
|
|
|
|
struct nlattr *block_index_attr,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int tcf_qevent_validate_change(struct tcf_qevent *qe, struct nlattr *block_index_attr,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct sk_buff *
|
|
|
|
tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, struct sk_buff *skb,
|
2020-07-14 20:03:07 +03:00
|
|
|
struct sk_buff **to_free, int *ret)
|
2020-06-27 01:45:26 +03:00
|
|
|
{
|
|
|
|
return skb;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int tcf_qevent_dump(struct sk_buff *skb, int attr_name, struct tcf_qevent *qe)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-02-16 21:17:09 -08:00
|
|
|
struct tc_cls_u32_knode {
|
|
|
|
struct tcf_exts *exts;
|
2018-11-19 15:21:46 -08:00
|
|
|
struct tcf_result *res;
|
2016-02-17 14:59:30 -08:00
|
|
|
struct tc_u32_sel *sel;
|
2016-02-16 21:17:09 -08:00
|
|
|
u32 handle;
|
|
|
|
u32 val;
|
|
|
|
u32 mask;
|
|
|
|
u32 link_handle;
|
2016-02-17 14:59:30 -08:00
|
|
|
u8 fshift;
|
2016-02-16 21:17:09 -08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_cls_u32_hnode {
|
|
|
|
u32 handle;
|
|
|
|
u32 prio;
|
|
|
|
unsigned int divisor;
|
|
|
|
};
|
|
|
|
|
|
|
|
enum tc_clsu32_command {
|
|
|
|
TC_CLSU32_NEW_KNODE,
|
|
|
|
TC_CLSU32_REPLACE_KNODE,
|
|
|
|
TC_CLSU32_DELETE_KNODE,
|
|
|
|
TC_CLSU32_NEW_HNODE,
|
|
|
|
TC_CLSU32_REPLACE_HNODE,
|
|
|
|
TC_CLSU32_DELETE_HNODE,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_cls_u32_offload {
|
2019-07-09 22:55:49 +02:00
|
|
|
struct flow_cls_common_offload common;
|
2016-02-16 21:17:09 -08:00
|
|
|
/* knode values */
|
|
|
|
enum tc_clsu32_command command;
|
|
|
|
union {
|
|
|
|
struct tc_cls_u32_knode knode;
|
|
|
|
struct tc_cls_u32_hnode hnode;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2017-08-09 14:30:35 +02:00
|
|
|
static inline bool tc_can_offload(const struct net_device *dev)
|
2016-02-26 07:53:49 -08:00
|
|
|
{
|
2017-11-01 11:47:41 +01:00
|
|
|
return dev->features & NETIF_F_HW_TC;
|
2016-02-26 07:53:49 -08:00
|
|
|
}
|
|
|
|
|
2018-01-19 17:44:48 -08:00
|
|
|
static inline bool tc_can_offload_extack(const struct net_device *dev,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
bool can = tc_can_offload(dev);
|
|
|
|
|
|
|
|
if (!can)
|
|
|
|
NL_SET_ERR_MSG(extack, "TC offload is disabled on net device");
|
|
|
|
|
|
|
|
return can;
|
|
|
|
}
|
|
|
|
|
2018-01-25 14:00:43 -08:00
|
|
|
static inline bool
|
|
|
|
tc_cls_can_offload_and_chain0(const struct net_device *dev,
|
2019-07-09 22:55:49 +02:00
|
|
|
struct flow_cls_common_offload *common)
|
2018-01-25 14:00:43 -08:00
|
|
|
{
|
|
|
|
if (!tc_can_offload_extack(dev, common->extack))
|
|
|
|
return false;
|
|
|
|
if (common->chain_index) {
|
|
|
|
NL_SET_ERR_MSG(common->extack,
|
|
|
|
"Driver supports only offload of chain 0");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2016-12-01 14:06:33 +02:00
|
|
|
static inline bool tc_skip_hw(u32 flags)
|
|
|
|
{
|
|
|
|
return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false;
|
|
|
|
}
|
|
|
|
|
2016-05-12 17:08:23 -07:00
|
|
|
static inline bool tc_skip_sw(u32 flags)
|
|
|
|
{
|
|
|
|
return (flags & TCA_CLS_FLAGS_SKIP_SW) ? true : false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* SKIP_HW and SKIP_SW are mutually exclusive flags. */
|
|
|
|
static inline bool tc_flags_valid(u32 flags)
|
|
|
|
{
|
sched: cls: enable verbose logging
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.
But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.
The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.
This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.
The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower verbose \
src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower \
src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-13 17:44:27 -03:00
|
|
|
if (flags & ~(TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW |
|
|
|
|
TCA_CLS_FLAGS_VERBOSE))
|
2016-05-12 17:08:23 -07:00
|
|
|
return false;
|
|
|
|
|
sched: cls: enable verbose logging
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.
But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.
The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.
This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.
The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower verbose \
src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower \
src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-13 17:44:27 -03:00
|
|
|
flags &= TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW;
|
2016-05-12 17:08:23 -07:00
|
|
|
if (!(flags ^ (TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW)))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2017-02-16 10:31:12 +02:00
|
|
|
static inline bool tc_in_hw(u32 flags)
|
|
|
|
{
|
|
|
|
return (flags & TCA_CLS_FLAGS_IN_HW) ? true : false;
|
|
|
|
}
|
|
|
|
|
2018-01-24 12:54:14 -08:00
|
|
|
static inline void
|
2019-07-09 22:55:49 +02:00
|
|
|
tc_cls_common_offload_init(struct flow_cls_common_offload *cls_common,
|
2018-01-24 12:54:14 -08:00
|
|
|
const struct tcf_proto *tp, u32 flags,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
cls_common->chain_index = tp->chain->index;
|
|
|
|
cls_common->protocol = tp->protocol;
|
2019-08-16 03:24:09 +02:00
|
|
|
cls_common->prio = tp->prio >> 16;
|
net: sched: propagate "skip_sw" flag to struct flow_cls_common_offload
Background: switchdev ports offload the Linux bridge, and most of the
packets they handle will never see the CPU. The ports between which
there exists no hardware data path are considered 'foreign' to switchdev.
These can either be normal physical NICs without switchdev offload, or
incompatible switchdev ports, or virtual interfaces like veth/dummy/etc.
In some cases, an offloaded filter can only do half the work, and the
rest must be handled by software. Redirecting/mirroring from the ingress
of a switchdev port towards a foreign interface is one example of
combined hardware/software data path. The most that the switchdev port
can do is to extract the matching packets from its offloaded data path
and send them to the CPU. From there on, the software filter runs
(a second time, after the first run in hardware) on the packet and
performs the mirred action.
It makes sense for switchdev drivers which allow this kind of "half
offloading" to sense the "skip_sw" flag of the filter/action pair, and
deny attempts from the user to install a filter that does not run in
software, because that simply won't work.
In fact, a mirred action on a switchdev port towards a dummy interface
appears to be a valid way of (selectively) monitoring offloaded traffic
that flows through it. IFF_PROMISC was also discussed years ago, but
(despite initial disagreement) there seems to be consensus that this
flag should not affect the destination taken by packets, but merely
whether or not the NIC discards packets with unknown MAC DA for local
processing.
[1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/
[2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-23 16:52:46 +03:00
|
|
|
cls_common->skip_sw = tc_skip_sw(flags);
|
sched: cls: enable verbose logging
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.
But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.
The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.
This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.
The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower verbose \
src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower \
src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-13 17:44:27 -03:00
|
|
|
if (tc_skip_sw(flags) || flags & TCA_CLS_FLAGS_VERBOSE)
|
2018-01-24 12:54:14 -08:00
|
|
|
cls_common->extack = extack;
|
|
|
|
}
|
|
|
|
|
2021-05-25 16:21:52 +03:00
|
|
|
#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
|
|
|
|
static inline struct tc_skb_ext *tc_skb_ext_alloc(struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct tc_skb_ext *tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
|
|
|
|
|
|
|
|
if (tc_skb_ext)
|
|
|
|
memset(tc_skb_ext, 0, sizeof(*tc_skb_ext));
|
|
|
|
return tc_skb_ext;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-07-21 12:03:12 +02:00
|
|
|
enum tc_matchall_command {
|
|
|
|
TC_CLSMATCHALL_REPLACE,
|
|
|
|
TC_CLSMATCHALL_DESTROY,
|
2019-05-04 04:46:23 -07:00
|
|
|
TC_CLSMATCHALL_STATS,
|
2016-07-21 12:03:12 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_cls_matchall_offload {
|
2019-07-09 22:55:49 +02:00
|
|
|
struct flow_cls_common_offload common;
|
2016-07-21 12:03:12 +02:00
|
|
|
enum tc_matchall_command command;
|
2019-05-04 04:46:17 -07:00
|
|
|
struct flow_rule *rule;
|
2019-05-04 04:46:23 -07:00
|
|
|
struct flow_stats stats;
|
2023-02-12 15:25:16 +02:00
|
|
|
bool use_act_stats;
|
2016-07-21 12:03:12 +02:00
|
|
|
unsigned long cookie;
|
|
|
|
};
|
|
|
|
|
2016-09-21 11:43:53 +01:00
|
|
|
enum tc_clsbpf_command {
|
2017-12-19 13:32:13 -08:00
|
|
|
TC_CLSBPF_OFFLOAD,
|
2016-09-21 11:44:02 +01:00
|
|
|
TC_CLSBPF_STATS,
|
2016-09-21 11:43:53 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_cls_bpf_offload {
|
2019-07-09 22:55:49 +02:00
|
|
|
struct flow_cls_common_offload common;
|
2016-09-21 11:43:53 +01:00
|
|
|
enum tc_clsbpf_command command;
|
|
|
|
struct tcf_exts *exts;
|
|
|
|
struct bpf_prog *prog;
|
2017-12-19 13:32:13 -08:00
|
|
|
struct bpf_prog *oldprog;
|
2016-09-21 11:43:53 +01:00
|
|
|
const char *name;
|
|
|
|
bool exts_integrated;
|
|
|
|
};
|
|
|
|
|
2017-01-24 07:02:41 -05:00
|
|
|
/* This structure holds cookie structure that is passed from user
|
|
|
|
* to the kernel for actions and classifiers
|
|
|
|
*/
|
|
|
|
struct tc_cookie {
|
|
|
|
u8 *data;
|
|
|
|
u32 len;
|
2018-07-05 17:24:23 +03:00
|
|
|
struct rcu_head rcu;
|
2017-01-24 07:02:41 -05:00
|
|
|
};
|
2017-11-06 07:23:41 +01:00
|
|
|
|
2018-01-10 14:59:58 +01:00
|
|
|
struct tc_qopt_offload_stats {
|
2021-10-16 10:49:09 +02:00
|
|
|
struct gnet_stats_basic_sync *bstats;
|
2018-01-10 14:59:58 +01:00
|
|
|
struct gnet_stats_queue *qstats;
|
|
|
|
};
|
|
|
|
|
2018-05-25 21:53:35 -07:00
|
|
|
enum tc_mq_command {
|
|
|
|
TC_MQ_CREATE,
|
|
|
|
TC_MQ_DESTROY,
|
2018-05-25 21:53:37 -07:00
|
|
|
TC_MQ_STATS,
|
2018-11-12 14:58:14 -08:00
|
|
|
TC_MQ_GRAFT,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_mq_opt_offload_graft_params {
|
|
|
|
unsigned long queue;
|
|
|
|
u32 child_handle;
|
2018-05-25 21:53:35 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_mq_qopt_offload {
|
|
|
|
enum tc_mq_command command;
|
|
|
|
u32 handle;
|
2018-11-12 14:58:14 -08:00
|
|
|
union {
|
|
|
|
struct tc_qopt_offload_stats stats;
|
|
|
|
struct tc_mq_opt_offload_graft_params graft_params;
|
|
|
|
};
|
2018-05-25 21:53:35 -07:00
|
|
|
};
|
|
|
|
|
sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.
In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.
After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.
ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.
The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.
This solution addresses two main problems of scaling HTB:
1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:
# tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
classid 1:10
It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:
# tc filter add dev eth0 egress protocol ip flower dst_port 80
action skbedit priority 1:10
This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.
Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.
2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.
Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:
# tc qdisc replace dev eth0 root handle 1: htb offload
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19 14:08:13 +02:00
|
|
|
enum tc_htb_command {
|
|
|
|
/* Root */
|
|
|
|
TC_HTB_CREATE, /* Initialize HTB offload. */
|
|
|
|
TC_HTB_DESTROY, /* Destroy HTB offload. */
|
|
|
|
|
|
|
|
/* Classes */
|
|
|
|
/* Allocate qid and create leaf. */
|
|
|
|
TC_HTB_LEAF_ALLOC_QUEUE,
|
|
|
|
/* Convert leaf to inner, preserve and return qid, create new leaf. */
|
|
|
|
TC_HTB_LEAF_TO_INNER,
|
|
|
|
/* Delete leaf, while siblings remain. */
|
|
|
|
TC_HTB_LEAF_DEL,
|
|
|
|
/* Delete leaf, convert parent to leaf, preserving qid. */
|
|
|
|
TC_HTB_LEAF_DEL_LAST,
|
|
|
|
/* TC_HTB_LEAF_DEL_LAST, but delete driver data on hardware errors. */
|
|
|
|
TC_HTB_LEAF_DEL_LAST_FORCE,
|
|
|
|
/* Modify parameters of a node. */
|
|
|
|
TC_HTB_NODE_MODIFY,
|
|
|
|
|
|
|
|
/* Class qdisc */
|
|
|
|
TC_HTB_LEAF_QUERY_QUEUE, /* Query qid by classid. */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_htb_qopt_offload {
|
|
|
|
struct netlink_ext_ack *extack;
|
|
|
|
enum tc_htb_command command;
|
|
|
|
u32 parent_classid;
|
2021-08-26 14:54:25 +03:00
|
|
|
u16 classid;
|
sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.
In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.
After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.
ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.
The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.
This solution addresses two main problems of scaling HTB:
1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:
# tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
classid 1:10
It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:
# tc filter add dev eth0 egress protocol ip flower dst_port 80
action skbedit priority 1:10
This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.
Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.
2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.
Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:
# tc qdisc replace dev eth0 root handle 1: htb offload
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19 14:08:13 +02:00
|
|
|
u16 qid;
|
2023-07-19 16:34:41 +05:30
|
|
|
u32 quantum;
|
sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.
In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.
After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.
ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.
The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.
This solution addresses two main problems of scaling HTB:
1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:
# tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
classid 1:10
It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:
# tc filter add dev eth0 egress protocol ip flower dst_port 80
action skbedit priority 1:10
This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.
Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.
2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.
Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:
# tc qdisc replace dev eth0 root handle 1: htb offload
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19 14:08:13 +02:00
|
|
|
u64 rate;
|
|
|
|
u64 ceil;
|
2023-05-13 14:21:36 +05:30
|
|
|
u8 prio;
|
sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.
In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.
After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.
ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.
The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.
This solution addresses two main problems of scaling HTB:
1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:
# tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
classid 1:10
It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:
# tc filter add dev eth0 egress protocol ip flower dst_port 80
action skbedit priority 1:10
This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.
Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.
2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.
Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:
# tc qdisc replace dev eth0 root handle 1: htb offload
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19 14:08:13 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
#define TC_HTB_CLASSID_ROOT U32_MAX
|
|
|
|
|
2017-11-06 07:23:41 +01:00
|
|
|
enum tc_red_command {
|
|
|
|
TC_RED_REPLACE,
|
|
|
|
TC_RED_DESTROY,
|
|
|
|
TC_RED_STATS,
|
|
|
|
TC_RED_XSTATS,
|
2018-11-12 14:58:13 -08:00
|
|
|
TC_RED_GRAFT,
|
2017-11-06 07:23:41 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_red_qopt_offload_params {
|
|
|
|
u32 min;
|
|
|
|
u32 max;
|
|
|
|
u32 probability;
|
2018-11-12 14:58:16 -08:00
|
|
|
u32 limit;
|
2017-11-06 07:23:41 +01:00
|
|
|
bool is_ecn;
|
2018-11-08 19:50:38 -08:00
|
|
|
bool is_harddrop;
|
2020-03-13 01:10:57 +02:00
|
|
|
bool is_nodrop;
|
2018-01-14 20:01:26 -08:00
|
|
|
struct gnet_stats_queue *qstats;
|
2017-11-06 07:23:41 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_red_qopt_offload {
|
|
|
|
enum tc_red_command command;
|
|
|
|
u32 handle;
|
|
|
|
u32 parent;
|
|
|
|
union {
|
|
|
|
struct tc_red_qopt_offload_params set;
|
2018-01-10 14:59:58 +01:00
|
|
|
struct tc_qopt_offload_stats stats;
|
2017-11-06 07:23:41 +01:00
|
|
|
struct red_stats *xstats;
|
2018-11-12 14:58:13 -08:00
|
|
|
u32 child_handle;
|
2017-11-06 07:23:41 +01:00
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2018-11-19 15:21:42 -08:00
|
|
|
enum tc_gred_command {
|
|
|
|
TC_GRED_REPLACE,
|
|
|
|
TC_GRED_DESTROY,
|
2018-11-19 15:21:43 -08:00
|
|
|
TC_GRED_STATS,
|
2018-11-19 15:21:42 -08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_gred_vq_qopt_offload_params {
|
|
|
|
bool present;
|
|
|
|
u32 limit;
|
|
|
|
u32 prio;
|
|
|
|
u32 min;
|
|
|
|
u32 max;
|
|
|
|
bool is_ecn;
|
|
|
|
bool is_harddrop;
|
|
|
|
u32 probability;
|
|
|
|
/* Only need backlog, see struct tc_prio_qopt_offload_params */
|
|
|
|
u32 *backlog;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_gred_qopt_offload_params {
|
|
|
|
bool grio_on;
|
|
|
|
bool wred_on;
|
|
|
|
unsigned int dp_cnt;
|
|
|
|
unsigned int dp_def;
|
|
|
|
struct gnet_stats_queue *qstats;
|
|
|
|
struct tc_gred_vq_qopt_offload_params tab[MAX_DPs];
|
|
|
|
};
|
|
|
|
|
2018-11-19 15:21:43 -08:00
|
|
|
struct tc_gred_qopt_offload_stats {
|
2021-10-16 10:49:09 +02:00
|
|
|
struct gnet_stats_basic_sync bstats[MAX_DPs];
|
2018-11-19 15:21:43 -08:00
|
|
|
struct gnet_stats_queue qstats[MAX_DPs];
|
|
|
|
struct red_stats *xstats[MAX_DPs];
|
|
|
|
};
|
|
|
|
|
2018-11-19 15:21:42 -08:00
|
|
|
struct tc_gred_qopt_offload {
|
|
|
|
enum tc_gred_command command;
|
|
|
|
u32 handle;
|
|
|
|
u32 parent;
|
|
|
|
union {
|
|
|
|
struct tc_gred_qopt_offload_params set;
|
2018-11-19 15:21:43 -08:00
|
|
|
struct tc_gred_qopt_offload_stats stats;
|
2018-11-19 15:21:42 -08:00
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2018-01-14 12:33:15 +01:00
|
|
|
enum tc_prio_command {
|
|
|
|
TC_PRIO_REPLACE,
|
|
|
|
TC_PRIO_DESTROY,
|
|
|
|
TC_PRIO_STATS,
|
2018-02-28 10:45:06 +01:00
|
|
|
TC_PRIO_GRAFT,
|
2018-01-14 12:33:15 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_prio_qopt_offload_params {
|
|
|
|
int bands;
|
|
|
|
u8 priomap[TC_PRIO_MAX + 1];
|
2019-12-18 14:55:08 +00:00
|
|
|
/* At the point of un-offloading the Qdisc, the reported backlog and
|
|
|
|
* qlen need to be reduced by the portion that is in HW.
|
2018-01-14 12:33:15 +01:00
|
|
|
*/
|
|
|
|
struct gnet_stats_queue *qstats;
|
|
|
|
};
|
|
|
|
|
2018-02-28 10:45:06 +01:00
|
|
|
struct tc_prio_qopt_offload_graft_params {
|
|
|
|
u8 band;
|
|
|
|
u32 child_handle;
|
|
|
|
};
|
|
|
|
|
2018-01-14 12:33:15 +01:00
|
|
|
struct tc_prio_qopt_offload {
|
|
|
|
enum tc_prio_command command;
|
|
|
|
u32 handle;
|
|
|
|
u32 parent;
|
|
|
|
union {
|
|
|
|
struct tc_prio_qopt_offload_params replace_params;
|
|
|
|
struct tc_qopt_offload_stats stats;
|
2018-02-28 10:45:06 +01:00
|
|
|
struct tc_prio_qopt_offload_graft_params graft_params;
|
2018-01-14 12:33:15 +01:00
|
|
|
};
|
|
|
|
};
|
2018-02-28 10:45:06 +01:00
|
|
|
|
2018-11-12 14:58:10 -08:00
|
|
|
enum tc_root_command {
|
|
|
|
TC_ROOT_GRAFT,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_root_qopt_offload {
|
|
|
|
enum tc_root_command command;
|
|
|
|
u32 handle;
|
|
|
|
bool ingress;
|
|
|
|
};
|
|
|
|
|
2019-12-18 14:55:15 +00:00
|
|
|
enum tc_ets_command {
|
|
|
|
TC_ETS_REPLACE,
|
|
|
|
TC_ETS_DESTROY,
|
|
|
|
TC_ETS_STATS,
|
|
|
|
TC_ETS_GRAFT,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_ets_qopt_offload_replace_params {
|
|
|
|
unsigned int bands;
|
|
|
|
u8 priomap[TC_PRIO_MAX + 1];
|
|
|
|
unsigned int quanta[TCQ_ETS_MAX_BANDS]; /* 0 for strict bands. */
|
|
|
|
unsigned int weights[TCQ_ETS_MAX_BANDS];
|
|
|
|
struct gnet_stats_queue *qstats;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_ets_qopt_offload_graft_params {
|
|
|
|
u8 band;
|
|
|
|
u32 child_handle;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_ets_qopt_offload {
|
|
|
|
enum tc_ets_command command;
|
|
|
|
u32 handle;
|
|
|
|
u32 parent;
|
|
|
|
union {
|
|
|
|
struct tc_ets_qopt_offload_replace_params replace_params;
|
|
|
|
struct tc_qopt_offload_stats stats;
|
|
|
|
struct tc_ets_qopt_offload_graft_params graft_params;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2020-01-24 15:23:06 +02:00
|
|
|
enum tc_tbf_command {
|
|
|
|
TC_TBF_REPLACE,
|
|
|
|
TC_TBF_DESTROY,
|
|
|
|
TC_TBF_STATS,
|
2021-10-19 11:07:04 +03:00
|
|
|
TC_TBF_GRAFT,
|
2020-01-24 15:23:06 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_tbf_qopt_offload_replace_params {
|
|
|
|
struct psched_ratecfg rate;
|
|
|
|
u32 max_size;
|
|
|
|
struct gnet_stats_queue *qstats;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_tbf_qopt_offload {
|
|
|
|
enum tc_tbf_command command;
|
|
|
|
u32 handle;
|
|
|
|
u32 parent;
|
|
|
|
union {
|
|
|
|
struct tc_tbf_qopt_offload_replace_params replace_params;
|
|
|
|
struct tc_qopt_offload_stats stats;
|
2021-10-19 11:07:04 +03:00
|
|
|
u32 child_handle;
|
2020-01-24 15:23:06 +02:00
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2020-03-05 09:16:40 +02:00
|
|
|
enum tc_fifo_command {
|
|
|
|
TC_FIFO_REPLACE,
|
|
|
|
TC_FIFO_DESTROY,
|
|
|
|
TC_FIFO_STATS,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct tc_fifo_qopt_offload {
|
|
|
|
enum tc_fifo_command command;
|
|
|
|
u32 handle;
|
|
|
|
u32 parent;
|
|
|
|
union {
|
|
|
|
struct tc_qopt_offload_stats stats;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2022-02-03 10:44:30 +02:00
|
|
|
#ifdef CONFIG_NET_CLS_ACT
|
|
|
|
DECLARE_STATIC_KEY_FALSE(tc_skb_ext_tc);
|
|
|
|
void tc_skb_ext_tc_enable(void);
|
|
|
|
void tc_skb_ext_tc_disable(void);
|
|
|
|
#define tc_skb_ext_tc_enabled() static_branch_unlikely(&tc_skb_ext_tc)
|
|
|
|
#else /* CONFIG_NET_CLS_ACT */
|
|
|
|
static inline void tc_skb_ext_tc_enable(void) { }
|
|
|
|
static inline void tc_skb_ext_tc_disable(void) { }
|
|
|
|
#define tc_skb_ext_tc_enabled() false
|
|
|
|
#endif
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
#endif
|