linux-next/include/linux/dsa/ocelot.h
Vladimir Oltean 67c3ca2c5c net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection
Problem description
-------------------

On an NXP LS1028A (felix DSA driver) with the following configuration:

- ocelot-8021q tagging protocol
- VLAN-aware bridge (with STP) spanning at least swp0 and swp1
- 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700
- ptp4l on swp0.700 and swp1.700

we see that the ptp4l instances do not see each other's traffic,
and they all go to the grand master state due to the
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.

Jumping to the conclusion for the impatient
-------------------------------------------

There is a zero-day bug in the ocelot switchdev driver in the way it
handles VLAN-tagged packet injection. The correct logic already exists in
the source code, in function ocelot_xmit_get_vlan_info() added by commit
5ca721c54d ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
But it is used only for normal NPI-based injection with the DSA "ocelot"
tagging protocol. The other injection code paths (register-based and
FDMA-based) roll their own wrong logic. This affects and was noticed on
the DSA "ocelot-8021q" protocol because it uses register-based injection.

By moving ocelot_xmit_get_vlan_info() to a place that's common for both
the DSA tagger and the ocelot switch library, it can also be called from
ocelot_port_inject_frame() in ocelot.c.

We need to touch the lines with ocelot_ifh_port_set()'s prototype
anyway, so let's rename it to something clearer regarding what it does,
and add a kernel-doc. ocelot_ifh_set_basic() should do.

Investigation notes
-------------------

Debugging reveals that PTP event (aka those carrying timestamps, like
Sync) frames injected into swp0.700 (but also swp1.700) hit the wire
with two VLAN tags:

00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc
                                              ~~~~~~~~~~~
00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00
          ~~~~~~~~~~~
00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03
00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00
00000040: 00 00

The second (unexpected) VLAN tag makes felix_check_xtr_pkt() ->
ptp_classify_raw() fail to see these as PTP packets at the link
partner's receiving end, and return PTP_CLASS_NONE (because the BPF
classifier is not written to expect 2 VLAN tags).

The reason why packets have 2 VLAN tags is because the transmission
code treats VLAN incorrectly.

Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX
feature. Therefore, at xmit time, all VLANs should be in the skb head,
and none should be in the hwaccel area. This is done by:

static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
					  netdev_features_t features)
{
	if (skb_vlan_tag_present(skb) &&
	    !vlan_hw_offload_capable(features, skb->vlan_proto))
		skb = __vlan_hwaccel_push_inside(skb);
	return skb;
}

But ocelot_port_inject_frame() handles things incorrectly:

	ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));

void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op)
{
	(...)
	if (vlan_tag)
		ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
	(...)
}

The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head
is by calling:

static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
	skb->vlan_present = 0;
}

which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get().
This means that ocelot, when it calls skb_vlan_tag_get(), sees
(and uses) a residual skb->vlan_tci, while the same VLAN tag is
_already_ in the skb head.

The trivial fix for double VLAN headers is to replace the content of
ocelot_ifh_port_set() with:

	if (skb_vlan_tag_present(skb))
		ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));

but this would not be correct either, because, as mentioned,
vlan_hw_offload_capable() is false for us, so we'd be inserting dead
code and we'd always transmit packets with VID=0 in the injection frame
header.

I can't actually test the ocelot switchdev driver and rely exclusively
on code inspection, but I don't think traffic from 8021q uppers has ever
been injected properly, and not double-tagged. Thus I'm blaming the
introduction of VLAN fields in the injection header - early driver code.

As hinted at in the early conclusion, what we _want_ to happen for
VLAN transmission was already described once in commit 5ca721c54d
("net: dsa: tag_ocelot: set the classified VLAN during xmit").

ocelot_xmit_get_vlan_info() intends to ensure that if the port through
which we're transmitting is under a VLAN-aware bridge, the outer VLAN
tag from the skb head is stripped from there and inserted into the
injection frame header (so that the packet is processed in hardware
through that actual VLAN). And in all other cases, the packet is sent
with VID=0 in the injection frame header, since the port is VLAN-unaware
and has logic to strip this VID on egress (making it invisible to the
wire).

Fixes: 08d02364b1 ("net: mscc: fix the injection header")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-16 09:59:32 +01:00

324 lines
12 KiB
C

/* SPDX-License-Identifier: GPL-2.0
* Copyright 2019-2021 NXP
*/
#ifndef _NET_DSA_TAG_OCELOT_H
#define _NET_DSA_TAG_OCELOT_H
#include <linux/if_bridge.h>
#include <linux/if_vlan.h>
#include <linux/kthread.h>
#include <linux/packing.h>
#include <linux/skbuff.h>
#include <net/dsa.h>
struct ocelot_skb_cb {
struct sk_buff *clone;
unsigned int ptp_class; /* valid only for clones */
u32 tstamp_lo;
u8 ptp_cmd;
u8 ts_id;
};
#define OCELOT_SKB_CB(skb) \
((struct ocelot_skb_cb *)((skb)->cb))
#define IFH_TAG_TYPE_C 0
#define IFH_TAG_TYPE_S 1
#define IFH_REW_OP_NOOP 0x0
#define IFH_REW_OP_DSCP 0x1
#define IFH_REW_OP_ONE_STEP_PTP 0x2
#define IFH_REW_OP_TWO_STEP_PTP 0x3
#define IFH_REW_OP_ORIGIN_PTP 0x5
#define OCELOT_TAG_LEN 16
#define OCELOT_SHORT_PREFIX_LEN 4
#define OCELOT_LONG_PREFIX_LEN 16
#define OCELOT_TOTAL_TAG_LEN (OCELOT_SHORT_PREFIX_LEN + OCELOT_TAG_LEN)
/* The CPU injection header and the CPU extraction header can have 3 types of
* prefixes: long, short and no prefix. The format of the header itself is the
* same in all 3 cases.
*
* Extraction with long prefix:
*
* +-------------------+-------------------+------+------+------------+-------+
* | ff:ff:ff:ff:ff:ff | fe:ff:ff:ff:ff:ff | 8880 | 000a | extraction | frame |
* | | | | | header | |
* +-------------------+-------------------+------+------+------------+-------+
* 48 bits 48 bits 16 bits 16 bits 128 bits
*
* Extraction with short prefix:
*
* +------+------+------------+-------+
* | 8880 | 000a | extraction | frame |
* | | | header | |
* +------+------+------------+-------+
* 16 bits 16 bits 128 bits
*
* Extraction with no prefix:
*
* +------------+-------+
* | extraction | frame |
* | header | |
* +------------+-------+
* 128 bits
*
*
* Injection with long prefix:
*
* +-------------------+-------------------+------+------+------------+-------+
* | any dmac | any smac | 8880 | 000a | injection | frame |
* | | | | | header | |
* +-------------------+-------------------+------+------+------------+-------+
* 48 bits 48 bits 16 bits 16 bits 128 bits
*
* Injection with short prefix:
*
* +------+------+------------+-------+
* | 8880 | 000a | injection | frame |
* | | | header | |
* +------+------+------------+-------+
* 16 bits 16 bits 128 bits
*
* Injection with no prefix:
*
* +------------+-------+
* | injection | frame |
* | header | |
* +------------+-------+
* 128 bits
*
* The injection header looks like this (network byte order, bit 127
* is part of lowest address byte in memory, bit 0 is part of highest
* address byte):
*
* +------+------+------+------+------+------+------+------+
* 127:120 |BYPASS| MASQ | MASQ_PORT |REW_OP|REW_OP|
* +------+------+------+------+------+------+------+------+
* 119:112 | REW_OP |
* +------+------+------+------+------+------+------+------+
* 111:104 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 103: 96 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 95: 88 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 87: 80 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 79: 72 | RSV |
* +------+------+------+------+------+------+------+------+
* 71: 64 | RSV | DEST |
* +------+------+------+------+------+------+------+------+
* 63: 56 | DEST |
* +------+------+------+------+------+------+------+------+
* 55: 48 | RSV |
* +------+------+------+------+------+------+------+------+
* 47: 40 | RSV | SRC_PORT | RSV |TFRM_TIMER|
* +------+------+------+------+------+------+------+------+
* 39: 32 | TFRM_TIMER | RSV |
* +------+------+------+------+------+------+------+------+
* 31: 24 | RSV | DP | POP_CNT | CPUQ |
* +------+------+------+------+------+------+------+------+
* 23: 16 | CPUQ | QOS_CLASS |TAG_TYPE|
* +------+------+------+------+------+------+------+------+
* 15: 8 | PCP | DEI | VID |
* +------+------+------+------+------+------+------+------+
* 7: 0 | VID |
* +------+------+------+------+------+------+------+------+
*
* And the extraction header looks like this:
*
* +------+------+------+------+------+------+------+------+
* 127:120 | RSV | REW_OP |
* +------+------+------+------+------+------+------+------+
* 119:112 | REW_OP | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 111:104 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 103: 96 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 95: 88 | REW_VAL |
* +------+------+------+------+------+------+------+------+
* 87: 80 | REW_VAL | LLEN |
* +------+------+------+------+------+------+------+------+
* 79: 72 | LLEN | WLEN |
* +------+------+------+------+------+------+------+------+
* 71: 64 | WLEN | RSV |
* +------+------+------+------+------+------+------+------+
* 63: 56 | RSV |
* +------+------+------+------+------+------+------+------+
* 55: 48 | RSV |
* +------+------+------+------+------+------+------+------+
* 47: 40 | RSV | SRC_PORT | ACL_ID |
* +------+------+------+------+------+------+------+------+
* 39: 32 | ACL_ID | RSV | SFLOW_ID |
* +------+------+------+------+------+------+------+------+
* 31: 24 |ACL_HIT| DP | LRN_FLAGS | CPUQ |
* +------+------+------+------+------+------+------+------+
* 23: 16 | CPUQ | QOS_CLASS |TAG_TYPE|
* +------+------+------+------+------+------+------+------+
* 15: 8 | PCP | DEI | VID |
* +------+------+------+------+------+------+------+------+
* 7: 0 | VID |
* +------+------+------+------+------+------+------+------+
*/
struct felix_deferred_xmit_work {
struct dsa_port *dp;
struct sk_buff *skb;
struct kthread_work work;
};
struct ocelot_8021q_tagger_data {
void (*xmit_work_fn)(struct kthread_work *work);
};
static inline struct ocelot_8021q_tagger_data *
ocelot_8021q_tagger_data(struct dsa_switch *ds)
{
BUG_ON(ds->dst->tag_ops->proto != DSA_TAG_PROTO_OCELOT_8021Q);
return ds->tagger_data;
}
static inline void ocelot_xfh_get_rew_val(void *extraction, u64 *rew_val)
{
packing(extraction, rew_val, 116, 85, OCELOT_TAG_LEN, UNPACK, 0);
}
static inline void ocelot_xfh_get_len(void *extraction, u64 *len)
{
u64 llen, wlen;
packing(extraction, &llen, 84, 79, OCELOT_TAG_LEN, UNPACK, 0);
packing(extraction, &wlen, 78, 71, OCELOT_TAG_LEN, UNPACK, 0);
*len = 60 * wlen + llen - 80;
}
static inline void ocelot_xfh_get_src_port(void *extraction, u64 *src_port)
{
packing(extraction, src_port, 46, 43, OCELOT_TAG_LEN, UNPACK, 0);
}
static inline void ocelot_xfh_get_qos_class(void *extraction, u64 *qos_class)
{
packing(extraction, qos_class, 19, 17, OCELOT_TAG_LEN, UNPACK, 0);
}
static inline void ocelot_xfh_get_tag_type(void *extraction, u64 *tag_type)
{
packing(extraction, tag_type, 16, 16, OCELOT_TAG_LEN, UNPACK, 0);
}
static inline void ocelot_xfh_get_vlan_tci(void *extraction, u64 *vlan_tci)
{
packing(extraction, vlan_tci, 15, 0, OCELOT_TAG_LEN, UNPACK, 0);
}
static inline void ocelot_ifh_set_bypass(void *injection, u64 bypass)
{
packing(injection, &bypass, 127, 127, OCELOT_TAG_LEN, PACK, 0);
}
static inline void ocelot_ifh_set_rew_op(void *injection, u64 rew_op)
{
packing(injection, &rew_op, 125, 117, OCELOT_TAG_LEN, PACK, 0);
}
static inline void ocelot_ifh_set_dest(void *injection, u64 dest)
{
packing(injection, &dest, 67, 56, OCELOT_TAG_LEN, PACK, 0);
}
static inline void ocelot_ifh_set_qos_class(void *injection, u64 qos_class)
{
packing(injection, &qos_class, 19, 17, OCELOT_TAG_LEN, PACK, 0);
}
static inline void seville_ifh_set_dest(void *injection, u64 dest)
{
packing(injection, &dest, 67, 57, OCELOT_TAG_LEN, PACK, 0);
}
static inline void ocelot_ifh_set_src(void *injection, u64 src)
{
packing(injection, &src, 46, 43, OCELOT_TAG_LEN, PACK, 0);
}
static inline void ocelot_ifh_set_tag_type(void *injection, u64 tag_type)
{
packing(injection, &tag_type, 16, 16, OCELOT_TAG_LEN, PACK, 0);
}
static inline void ocelot_ifh_set_vlan_tci(void *injection, u64 vlan_tci)
{
packing(injection, &vlan_tci, 15, 0, OCELOT_TAG_LEN, PACK, 0);
}
/* Determine the PTP REW_OP to use for injecting the given skb */
static inline u32 ocelot_ptp_rew_op(struct sk_buff *skb)
{
struct sk_buff *clone = OCELOT_SKB_CB(skb)->clone;
u8 ptp_cmd = OCELOT_SKB_CB(skb)->ptp_cmd;
u32 rew_op = 0;
if (ptp_cmd == IFH_REW_OP_TWO_STEP_PTP && clone) {
rew_op = ptp_cmd;
rew_op |= OCELOT_SKB_CB(clone)->ts_id << 3;
} else if (ptp_cmd == IFH_REW_OP_ORIGIN_PTP) {
rew_op = ptp_cmd;
}
return rew_op;
}
/**
* ocelot_xmit_get_vlan_info: Determine VLAN_TCI and TAG_TYPE for injected frame
* @skb: Pointer to socket buffer
* @br: Pointer to bridge device that the port is under, if any
* @vlan_tci:
* @tag_type:
*
* If the port is under a VLAN-aware bridge, remove the VLAN header from the
* payload and move it into the DSA tag, which will make the switch classify
* the packet to the bridge VLAN. Otherwise, leave the classified VLAN at zero,
* which is the pvid of standalone ports (OCELOT_STANDALONE_PVID), although not
* of VLAN-unaware bridge ports (that would be ocelot_vlan_unaware_pvid()).
* Anyway, VID 0 is fine because it is stripped on egress for these port modes,
* and source address learning is not performed for packets injected from the
* CPU anyway, so it doesn't matter that the VID is "wrong".
*/
static inline void ocelot_xmit_get_vlan_info(struct sk_buff *skb,
struct net_device *br,
u64 *vlan_tci, u64 *tag_type)
{
struct vlan_ethhdr *hdr;
u16 proto, tci;
if (!br || !br_vlan_enabled(br)) {
*vlan_tci = 0;
*tag_type = IFH_TAG_TYPE_C;
return;
}
hdr = (struct vlan_ethhdr *)skb_mac_header(skb);
br_vlan_get_proto(br, &proto);
if (ntohs(hdr->h_vlan_proto) == proto) {
vlan_remove_tag(skb, &tci);
*vlan_tci = tci;
} else {
rcu_read_lock();
br_vlan_get_pvid_rcu(br, &tci);
rcu_read_unlock();
*vlan_tci = tci;
}
*tag_type = (proto != ETH_P_8021Q) ? IFH_TAG_TYPE_S : IFH_TAG_TYPE_C;
}
#endif