2019-05-02 20:23:30 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0
|
|
|
|
* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
|
|
|
|
*/
|
|
|
|
|
2019-05-05 10:19:27 +00:00
|
|
|
/* Included by drivers/net/dsa/sja1105/sja1105.h and net/dsa/tag_sja1105.c */
|
2019-05-02 20:23:30 +00:00
|
|
|
|
|
|
|
#ifndef _NET_DSA_SJA1105_H
|
|
|
|
#define _NET_DSA_SJA1105_H
|
|
|
|
|
2019-05-05 10:19:27 +00:00
|
|
|
#include <linux/skbuff.h>
|
2019-05-02 20:23:34 +00:00
|
|
|
#include <linux/etherdevice.h>
|
2020-05-12 17:20:34 +00:00
|
|
|
#include <linux/dsa/8021q.h>
|
2019-05-05 10:19:27 +00:00
|
|
|
#include <net/dsa.h>
|
2019-05-02 20:23:34 +00:00
|
|
|
|
|
|
|
#define ETH_P_SJA1105 ETH_P_DSA_8021Q
|
2019-06-08 12:04:36 +00:00
|
|
|
#define ETH_P_SJA1105_META 0x0008
|
net: dsa: add support for the SJA1110 native tagging protocol
The SJA1110 has improved a few things compared to SJA1105:
- To send a control packet from the host port with SJA1105, one needed
to program a one-shot "management route" over SPI. This is no longer
true with SJA1110, you can actually send "in-band control extensions"
in the packets sent by DSA, these are in fact DSA tags which contain
the destination port and switch ID.
- When receiving a control packet from the switch with SJA1105, the
source port and switch ID were written in bytes 3 and 4 of the
destination MAC address of the frame (which was a very poor shot at a
DSA header). If the control packet also had an RX timestamp, that
timestamp was sent in an actual follow-up packet, so there were
reordering concerns on multi-core/multi-queue DSA masters, where the
metadata frame with the RX timestamp might get processed before the
actual packet to which that timestamp belonged (there is no way to
pair a packet to its timestamp other than the order in which they were
received). On SJA1110, this is no longer true, control packets have
the source port, switch ID and timestamp all in the DSA tags.
- Timestamps from the switch were partial: to get a 64-bit timestamp as
required by PTP stacks, one would need to take the partial 24-bit or
32-bit timestamp from the packet, then read the current PTP time very
quickly, and then patch in the high bits of the current PTP time into
the captured partial timestamp, to reconstruct what the full 64-bit
timestamp must have been. That is awful because packet processing is
done in NAPI context, but reading the current PTP time is done over
SPI and therefore needs sleepable context.
But it also aggravated a few things:
- Not only is there a DSA header in SJA1110, but there is a DSA trailer
in fact, too. So DSA needs to be extended to support taggers which
have both a header and a trailer. Very unconventional - my understanding
is that the trailer exists because the timestamps couldn't be prepared
in time for putting them in the header area.
- Like SJA1105, not all packets sent to the CPU have the DSA tag added
to them, only control packets do:
* the ones which match the destination MAC filters/traps in
MAC_FLTRES1 and MAC_FLTRES0
* the ones which match FDB entries which have TRAP or TAKETS bits set
So we could in theory hack something up to request the switch to take
timestamps for all packets that reach the CPU, and those would be
DSA-tagged and contain the source port / switch ID by virtue of the
fact that there needs to be a timestamp trailer provided. BUT:
- The SJA1110 does not parse its own DSA tags in a way that is useful
for routing in cross-chip topologies, a la Marvell. And the sja1105
driver already supports cross-chip bridging from the SJA1105 days.
It does that by automatically setting up the DSA links as VLAN trunks
which contain all the necessary tag_8021q RX VLANs that must be
communicated between the switches that span the same bridge. So when
using tag_8021q on sja1105, it is possible to have 2 switches with
ports sw0p0, sw0p1, sw1p0, sw1p1, and 2 VLAN-unaware bridges br0 and
br1, and br0 can take sw0p0 and sw1p0, and br1 can take sw0p1 and
sw1p1, and forwarding will happen according to the expected rules of
the Linux bridge.
We like that, and we don't want that to go away, so as a matter of
fact, the SJA1110 tagger still needs to support tag_8021q.
So the sja1110 tagger is a hybrid between tag_8021q for data packets,
and the native hardware support for control packets.
On RX, packets have a 13-byte trailer if they contain an RX timestamp.
That trailer is padded in such a way that its byte 8 (the start of the
"residence time" field - not parsed by Linux because we don't care) is
aligned on a 16 byte boundary. So the padding has a variable length
between 0 and 15 bytes. The DSA header contains the offset of the
beginning of the padding relative to the beginning of the frame (and the
end of the padding is obviously the end of the packet minus 13 bytes,
the length of the trailer). So we discard it.
Packets which don't have a trailer contain the source port and switch ID
information in the header (they are "trap-to-host" packets). Packets
which have a trailer contain the source port and switch ID in the trailer.
On TX, the destination port mask and switch ID is always in the trailer,
so we always need to say in the header that a trailer is present.
The header needs a custom EtherType and this was chosen as 0xdadc, after
0xdada which is for Marvell and 0xdadb which is for VLANs in
VLAN-unaware mode on SJA1105 (and SJA1110 in fact too).
Because we use tag_8021q in concert with the native tagging protocol,
control packets will have 2 DSA tags.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 19:01:29 +00:00
|
|
|
#define ETH_P_SJA1110 0xdadc
|
2019-05-02 20:23:34 +00:00
|
|
|
|
net: dsa: sja1105: drop untagged packets on the CPU and DSA ports
The sja1105 driver is a bit special in its use of VLAN headers as DSA
tags. This is because in VLAN-aware mode, the VLAN headers use an actual
TPID of 0x8100, which is understood even by the DSA master as an actual
VLAN header.
Furthermore, control packets such as PTP and STP are transmitted with no
VLAN header as a DSA tag, because, depending on switch generation, there
are ways to steer these control packets towards a precise egress port
other than VLAN tags. Transmitting control packets as untagged means
leaving a door open for traffic in general to be transmitted as untagged
from the DSA master, and for it to traverse the switch and exit a random
switch port according to the FDB lookup.
This behavior is a bit out of line with other DSA drivers which have
native support for DSA tagging. There, it is to be expected that the
switch only accepts DSA-tagged packets on its CPU port, dropping
everything that does not match this pattern.
We perhaps rely a bit too much on the switches' hardware dropping on the
CPU port, and place no other restrictions in the kernel data path to
avoid that. For example, sja1105 is also a bit special in that STP/PTP
packets are transmitted using "management routes"
(sja1105_port_deferred_xmit): when sending a link-local packet from the
CPU, we must first write a SPI message to the switch to tell it to
expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
it towards port 3 when it gets it. This entry expires as soon as it
matches a packet received by the switch, and it needs to be reinstalled
for the next packet etc. All in all quite a ghetto mechanism, but it is
all that the sja1105 switches offer for injecting a control packet.
The driver takes a mutex for serializing control packets and making the
pairs of SPI writes of a management route and its associated skb atomic,
but to be honest, a mutex is only relevant as long as all parties agree
to take it. With the DSA design, it is possible to open an AF_PACKET
socket on the DSA master net device, and blast packets towards
01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
it all goes kaput because management routes installed by the driver will
match skbs sent by the DSA master, and not skbs generated by the driver
itself. So they will end up being routed on the wrong port.
So through the lens of that, maybe it would make sense to avoid that
from happening by doing something in the network stack, like: introduce
a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
dev_hard_start_xmit(), introduce the following check:
if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
kfree_skb(skb);
Ok, maybe that is a bit drastic, but that would at least prevent a bunch
of problems. For example, right now, even though the majority of DSA
switches drop packets without DSA tags sent by the DSA master (and
therefore the majority of garbage that user space daemons like avahi and
udhcpcd and friends create), it is still conceivable that an aggressive
user space program can open an AF_PACKET socket and inject a spoofed DSA
tag directly on the DSA master. We have no protection against that; the
packet will be understood by the switch and be routed wherever user
space says. Furthermore: there are some DSA switches where we even have
register access over Ethernet, using DSA tags. So even user space
drivers are possible in this way. This is a huge hole.
However, the biggest thing that bothers me is that udhcpcd attempts to
ask for an IP address on all interfaces by default, and with sja1105, it
will attempt to get a valid IP address on both the DSA master as well as
on sja1105 switch ports themselves. So with IP addresses in the same
subnet on multiple interfaces, the routing table will be messed up and
the system will be unusable for traffic until it is configured manually
to not ask for an IP address on the DSA master itself.
It turns out that it is possible to avoid that in the sja1105 driver, at
least very superficially, by requesting the switch to drop VLAN-untagged
packets on the CPU port. With the exception of control packets, all
traffic originated from tag_sja1105.c is already VLAN-tagged, so only
STP and PTP packets need to be converted. For that, we need to uphold
the equivalence between an untagged and a pvid-tagged packet, and to
remember that the CPU port of sja1105 uses a pvid of 4095.
Now that we drop untagged traffic on the CPU port, non-aggressive user
space applications like udhcpcd stop bothering us, and sja1105 effectively
becomes just as vulnerable to the aggressive kind of user space programs
as other DSA switches are (ok, users can also create 8021q uppers on top
of the DSA master in the case of sja1105, but in future patches we can
easily deny that, but it still doesn't change the fact that VLAN-tagged
packets can still be injected over raw sockets).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24 17:15:01 +00:00
|
|
|
#define SJA1105_DEFAULT_VLAN (VLAN_N_VID - 1)
|
|
|
|
|
2019-05-05 10:19:27 +00:00
|
|
|
/* IEEE 802.3 Annex 57A: Slow Protocols PDUs (01:80:C2:xx:xx:xx) */
|
|
|
|
#define SJA1105_LINKLOCAL_FILTER_A 0x0180C2000000ull
|
|
|
|
#define SJA1105_LINKLOCAL_FILTER_A_MASK 0xFFFFFF000000ull
|
|
|
|
/* IEEE 1588 Annex F: Transport of PTP over Ethernet (01:1B:19:xx:xx:xx) */
|
|
|
|
#define SJA1105_LINKLOCAL_FILTER_B 0x011B19000000ull
|
|
|
|
#define SJA1105_LINKLOCAL_FILTER_B_MASK 0xFFFFFF000000ull
|
|
|
|
|
2019-06-08 12:04:36 +00:00
|
|
|
/* Source and Destination MAC of follow-up meta frames.
|
|
|
|
* Whereas the choice of SMAC only affects the unique identification of the
|
|
|
|
* switch as sender of meta frames, the DMAC must be an address that is present
|
2023-10-23 18:17:28 +00:00
|
|
|
* in the DSA conduit port's multicast MAC filter.
|
2019-06-08 12:04:36 +00:00
|
|
|
* 01-80-C2-00-00-0E is a good choice for this, as all profiles of IEEE 1588
|
|
|
|
* over L2 use this address for some purpose already.
|
|
|
|
*/
|
|
|
|
#define SJA1105_META_SMAC 0x222222222222ull
|
|
|
|
#define SJA1105_META_DMAC 0x0180C200000Eull
|
|
|
|
|
2021-12-09 23:34:45 +00:00
|
|
|
enum sja1110_meta_tstamp {
|
|
|
|
SJA1110_META_TSTAMP_TX = 0,
|
|
|
|
SJA1110_META_TSTAMP_RX = 1,
|
|
|
|
};
|
|
|
|
|
net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q
When the ocelot-8021q driver was converted to deferred xmit as part of
commit 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during
init and teardown"), the deferred implementation was deliberately made
subtly different from what sja1105 has.
The implementation differences lied on the following observations:
- There might be a race between these two lines in tag_sja1105.c:
skb_queue_tail(&sp->xmit_queue, skb_get(skb));
kthread_queue_work(sp->xmit_worker, &sp->xmit_work);
and the skb dequeue logic in sja1105_port_deferred_xmit(). For
example, the xmit_work might be already queued, however the work item
has just finished walking through the skb queue. Because we don't
check the return code from kthread_queue_work, we don't do anything if
the work item is already queued.
However, nobody will take that skb and send it, at least until the
next timestampable skb is sent. This creates additional (and
avoidable) TX timestamping latency.
To close that race, what the ocelot-8021q driver does is it doesn't
keep a single work item per port, and a skb timestamping queue, but
rather dynamically allocates a work item per packet.
- It is also unnecessary to have more than one kthread that does the
work. So delete the per-port kthread allocations and replace them with
a single kthread which is global to the switch.
This change brings the two implementations in line by applying those
observations to the sja1105 driver as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-09 23:34:40 +00:00
|
|
|
struct sja1105_deferred_xmit_work {
|
|
|
|
struct dsa_port *dp;
|
|
|
|
struct sk_buff *skb;
|
|
|
|
struct kthread_work work;
|
|
|
|
};
|
|
|
|
|
2021-12-09 23:34:42 +00:00
|
|
|
/* Global tagger data */
|
2019-06-08 12:04:40 +00:00
|
|
|
struct sja1105_tagger_data {
|
net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q
When the ocelot-8021q driver was converted to deferred xmit as part of
commit 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during
init and teardown"), the deferred implementation was deliberately made
subtly different from what sja1105 has.
The implementation differences lied on the following observations:
- There might be a race between these two lines in tag_sja1105.c:
skb_queue_tail(&sp->xmit_queue, skb_get(skb));
kthread_queue_work(sp->xmit_worker, &sp->xmit_work);
and the skb dequeue logic in sja1105_port_deferred_xmit(). For
example, the xmit_work might be already queued, however the work item
has just finished walking through the skb queue. Because we don't
check the return code from kthread_queue_work, we don't do anything if
the work item is already queued.
However, nobody will take that skb and send it, at least until the
next timestampable skb is sent. This creates additional (and
avoidable) TX timestamping latency.
To close that race, what the ocelot-8021q driver does is it doesn't
keep a single work item per port, and a skb timestamping queue, but
rather dynamically allocates a work item per packet.
- It is also unnecessary to have more than one kthread that does the
work. So delete the per-port kthread allocations and replace them with
a single kthread which is global to the switch.
This change brings the two implementations in line by applying those
observations to the sja1105 driver as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-09 23:34:40 +00:00
|
|
|
void (*xmit_work_fn)(struct kthread_work *work);
|
2021-12-09 23:34:45 +00:00
|
|
|
void (*meta_tstamp_handler)(struct dsa_switch *ds, int port, u8 ts_id,
|
|
|
|
enum sja1110_meta_tstamp dir, u64 tstamp);
|
2019-06-08 12:04:40 +00:00
|
|
|
};
|
|
|
|
|
2019-06-08 12:04:42 +00:00
|
|
|
struct sja1105_skb_cb {
|
2021-04-27 04:22:00 +00:00
|
|
|
struct sk_buff *clone;
|
2021-06-11 19:01:28 +00:00
|
|
|
u64 tstamp;
|
net: dsa: sja1105: implement TX timestamping for SJA1110
The TX timestamping procedure for SJA1105 is a bit unconventional
because the transmit procedure itself is unconventional.
Control packets (and therefore PTP as well) are transmitted to a
specific port in SJA1105 using "management routes" which must be written
over SPI to the switch. These are one-shot rules that match by
destination MAC address on traffic coming from the CPU port, and select
the precise destination port for that packet. So to transmit a packet
from NET_TX softirq context, we actually need to defer to a process
context so that we can perform that SPI write before we send the packet.
The DSA master dev_queue_xmit() runs in process context, and we poll
until the switch confirms it took the TX timestamp, then we annotate the
skb clone with that TX timestamp. This is why the sja1105 driver does
not need an skb queue for TX timestamping.
But the SJA1110 is a bit (not much!) more conventional, and you can
request 2-step TX timestamping through the DSA header, as well as give
the switch a cookie (timestamp ID) which it will give back to you when
it has the timestamp. So now we do need a queue for keeping the skb
clones until their TX timestamps become available.
The interesting part is that the metadata frames from SJA1105 haven't
disappeared completely. On SJA1105 they were used as follow-ups which
contained RX timestamps, but on SJA1110 they are actually TX completion
packets, which contain a variable (up to 32) array of timestamps.
Why an array? Because:
- not only is the TX timestamp on the egress port being communicated,
but also the RX timestamp on the CPU port. Nice, but we don't care
about that, so we ignore it.
- because a packet could be multicast to multiple egress ports, each
port takes its own timestamp, and the TX completion packet contains
the individual timestamps on each port.
This is unconventional because switches typically have a timestamping
FIFO and raise an interrupt, but this one doesn't. So the tagger needs
to detect and parse meta frames, and call into the main switch driver,
which pairs the timestamps with the skbs in the TX timestamping queue
which are waiting for one.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 19:01:31 +00:00
|
|
|
/* Only valid for packets cloned for 2-step TX timestamping */
|
|
|
|
u8 ts_id;
|
2019-06-08 12:04:42 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
#define SJA1105_SKB_CB(skb) \
|
2021-04-27 04:22:00 +00:00
|
|
|
((struct sja1105_skb_cb *)((skb)->cb))
|
2019-06-08 12:04:42 +00:00
|
|
|
|
2021-12-09 23:34:44 +00:00
|
|
|
static inline struct sja1105_tagger_data *
|
|
|
|
sja1105_tagger_data(struct dsa_switch *ds)
|
2021-08-17 14:58:47 +00:00
|
|
|
{
|
2021-12-14 01:45:35 +00:00
|
|
|
BUG_ON(ds->dst->tag_ops->proto != DSA_TAG_PROTO_SJA1105 &&
|
|
|
|
ds->dst->tag_ops->proto != DSA_TAG_PROTO_SJA1110);
|
2021-12-09 23:34:44 +00:00
|
|
|
|
|
|
|
return ds->tagger_data;
|
2021-08-17 14:58:47 +00:00
|
|
|
}
|
|
|
|
|
2019-05-02 20:23:30 +00:00
|
|
|
#endif /* _NET_DSA_SJA1105_H */
|