mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-16 18:26:42 +00:00
Networking changes for 6.3.
Core ---- - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols --------- - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF --- - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter --------- - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt. races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API ---------- - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers ---------------------- - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers ------- - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - enetc: support XDP_REDIRECT for XDP non-linear buffers - enetc: improve reconfig, avoid link flap and waiting for idle - enetc: support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation. Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI= =xXhC -----END PGP SIGNATURE----- Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
This commit is contained in:
commit
5b7c4cabbb
19
Documentation/ABI/testing/sysfs-class-net-peak_usb
Normal file
19
Documentation/ABI/testing/sysfs-class-net-peak_usb
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
|
||||||
|
What: /sys/class/net/<iface>/peak_usb/can_channel_id
|
||||||
|
Date: November 2022
|
||||||
|
KernelVersion: 6.2
|
||||||
|
Contact: Stephane Grosjean <s.grosjean@peak-system.com>
|
||||||
|
Description:
|
||||||
|
PEAK PCAN-USB devices support user-configurable CAN channel
|
||||||
|
identifiers. Contrary to a USB serial number, these identifiers
|
||||||
|
are writable and can be set per CAN interface. This means that
|
||||||
|
if a USB device exports multiple CAN interfaces, each of them
|
||||||
|
can be assigned a unique channel ID.
|
||||||
|
This attribute provides read-only access to the currently
|
||||||
|
configured value of the channel identifier. Depending on the
|
||||||
|
device type, the identifier has a length of 8 or 32 bit. The
|
||||||
|
value read from this attribute is always an 8 digit 32 bit
|
||||||
|
hexadecimal value in big endian format. If the device only
|
||||||
|
supports an 8 bit identifier, the upper 24 bit of the value are
|
||||||
|
set to zero.
|
||||||
|
|
@ -557,6 +557,7 @@
|
|||||||
Format: <string>
|
Format: <string>
|
||||||
nosocket -- Disable socket memory accounting.
|
nosocket -- Disable socket memory accounting.
|
||||||
nokmem -- Disable kernel memory accounting.
|
nokmem -- Disable kernel memory accounting.
|
||||||
|
nobpf -- Disable BPF memory accounting.
|
||||||
|
|
||||||
checkreqprot= [SELINUX] Set initial checkreqprot flag value.
|
checkreqprot= [SELINUX] Set initial checkreqprot flag value.
|
||||||
Format: { "0" | "1" }
|
Format: { "0" | "1" }
|
||||||
|
@ -215,6 +215,12 @@ rmem_max
|
|||||||
|
|
||||||
The maximum receive socket buffer size in bytes.
|
The maximum receive socket buffer size in bytes.
|
||||||
|
|
||||||
|
rps_default_mask
|
||||||
|
----------------
|
||||||
|
|
||||||
|
The default RPS CPU mask used on newly created network devices. An empty
|
||||||
|
mask means RPS disabled by default.
|
||||||
|
|
||||||
tstamp_allow_data
|
tstamp_allow_data
|
||||||
-----------------
|
-----------------
|
||||||
Allow processes to receive tx timestamps looped together with the original
|
Allow processes to receive tx timestamps looped together with the original
|
||||||
|
@ -208,6 +208,10 @@ data structures and compile with kernel internal headers. Both of these
|
|||||||
kernel internals are subject to change and can break with newer kernels
|
kernel internals are subject to change and can break with newer kernels
|
||||||
such that the program needs to be adapted accordingly.
|
such that the program needs to be adapted accordingly.
|
||||||
|
|
||||||
|
New BPF functionality is generally added through the use of kfuncs instead of
|
||||||
|
new helpers. Kfuncs are not considered part of the stable API, and have their own
|
||||||
|
lifecycle expectations as described in :ref:`BPF_kfunc_lifecycle_expectations`.
|
||||||
|
|
||||||
Q: Are tracepoints part of the stable ABI?
|
Q: Are tracepoints part of the stable ABI?
|
||||||
------------------------------------------
|
------------------------------------------
|
||||||
A: NO. Tracepoints are tied to internal implementation details hence they are
|
A: NO. Tracepoints are tied to internal implementation details hence they are
|
||||||
@ -236,8 +240,8 @@ A: NO. Classic BPF programs are converted into extend BPF instructions.
|
|||||||
|
|
||||||
Q: Can BPF call arbitrary kernel functions?
|
Q: Can BPF call arbitrary kernel functions?
|
||||||
-------------------------------------------
|
-------------------------------------------
|
||||||
A: NO. BPF programs can only call a set of helper functions which
|
A: NO. BPF programs can only call specific functions exposed as BPF helpers or
|
||||||
is defined for every program type.
|
kfuncs. The set of available functions is defined for every program type.
|
||||||
|
|
||||||
Q: Can BPF overwrite arbitrary kernel memory?
|
Q: Can BPF overwrite arbitrary kernel memory?
|
||||||
---------------------------------------------
|
---------------------------------------------
|
||||||
@ -263,7 +267,12 @@ Q: New functionality via kernel modules?
|
|||||||
Q: Can BPF functionality such as new program or map types, new
|
Q: Can BPF functionality such as new program or map types, new
|
||||||
helpers, etc be added out of kernel module code?
|
helpers, etc be added out of kernel module code?
|
||||||
|
|
||||||
A: NO.
|
A: Yes, through kfuncs and kptrs
|
||||||
|
|
||||||
|
The core BPF functionality such as program types, maps and helpers cannot be
|
||||||
|
added to by modules. However, modules can expose functionality to BPF programs
|
||||||
|
by exporting kfuncs (which may return pointers to module-internal data
|
||||||
|
structures as kptrs).
|
||||||
|
|
||||||
Q: Directly calling kernel function is an ABI?
|
Q: Directly calling kernel function is an ABI?
|
||||||
----------------------------------------------
|
----------------------------------------------
|
||||||
@ -278,7 +287,8 @@ kernel functions have already been used by other kernel tcp
|
|||||||
cc (congestion-control) implementations. If any of these kernel
|
cc (congestion-control) implementations. If any of these kernel
|
||||||
functions has changed, both the in-tree and out-of-tree kernel tcp cc
|
functions has changed, both the in-tree and out-of-tree kernel tcp cc
|
||||||
implementations have to be changed. The same goes for the bpf
|
implementations have to be changed. The same goes for the bpf
|
||||||
programs and they have to be adjusted accordingly.
|
programs and they have to be adjusted accordingly. See
|
||||||
|
:ref:`BPF_kfunc_lifecycle_expectations` for details.
|
||||||
|
|
||||||
Q: Attaching to arbitrary kernel functions is an ABI?
|
Q: Attaching to arbitrary kernel functions is an ABI?
|
||||||
-----------------------------------------------------
|
-----------------------------------------------------
|
||||||
@ -340,6 +350,7 @@ compatibility for these features?
|
|||||||
|
|
||||||
A: NO.
|
A: NO.
|
||||||
|
|
||||||
Unlike map value types, there are no stability guarantees for this case. The
|
Unlike map value types, the API to work with allocated objects and any support
|
||||||
whole API to work with allocated objects and any support for special fields
|
for special fields inside them is exposed through kfuncs, and thus has the same
|
||||||
inside them is unstable (since it is exposed through kfuncs).
|
lifecycle expectations as the kfuncs themselves. See
|
||||||
|
:ref:`BPF_kfunc_lifecycle_expectations` for details.
|
||||||
|
393
Documentation/bpf/cpumasks.rst
Normal file
393
Documentation/bpf/cpumasks.rst
Normal file
@ -0,0 +1,393 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
.. _cpumasks-header-label:
|
||||||
|
|
||||||
|
==================
|
||||||
|
BPF cpumask kfuncs
|
||||||
|
==================
|
||||||
|
|
||||||
|
1. Introduction
|
||||||
|
===============
|
||||||
|
|
||||||
|
``struct cpumask`` is a bitmap data structure in the kernel whose indices
|
||||||
|
reflect the CPUs on the system. Commonly, cpumasks are used to track which CPUs
|
||||||
|
a task is affinitized to, but they can also be used to e.g. track which cores
|
||||||
|
are associated with a scheduling domain, which cores on a machine are idle,
|
||||||
|
etc.
|
||||||
|
|
||||||
|
BPF provides programs with a set of :ref:`kfuncs-header-label` that can be
|
||||||
|
used to allocate, mutate, query, and free cpumasks.
|
||||||
|
|
||||||
|
2. BPF cpumask objects
|
||||||
|
======================
|
||||||
|
|
||||||
|
There are two different types of cpumasks that can be used by BPF programs.
|
||||||
|
|
||||||
|
2.1 ``struct bpf_cpumask *``
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
``struct bpf_cpumask *`` is a cpumask that is allocated by BPF, on behalf of a
|
||||||
|
BPF program, and whose lifecycle is entirely controlled by BPF. These cpumasks
|
||||||
|
are RCU-protected, can be mutated, can be used as kptrs, and can be safely cast
|
||||||
|
to a ``struct cpumask *``.
|
||||||
|
|
||||||
|
2.1.1 ``struct bpf_cpumask *`` lifecycle
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
A ``struct bpf_cpumask *`` is allocated, acquired, and released, using the
|
||||||
|
following functions:
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_create
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_acquire
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_release
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct cpumask_map_value {
|
||||||
|
struct bpf_cpumask __kptr_ref * cpumask;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct array_map {
|
||||||
|
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||||
|
__type(key, int);
|
||||||
|
__type(value, struct cpumask_map_value);
|
||||||
|
__uint(max_entries, 65536);
|
||||||
|
} cpumask_map SEC(".maps");
|
||||||
|
|
||||||
|
static int cpumask_map_insert(struct bpf_cpumask *mask, u32 pid)
|
||||||
|
{
|
||||||
|
struct cpumask_map_value local, *v;
|
||||||
|
long status;
|
||||||
|
struct bpf_cpumask *old;
|
||||||
|
u32 key = pid;
|
||||||
|
|
||||||
|
local.cpumask = NULL;
|
||||||
|
status = bpf_map_update_elem(&cpumask_map, &key, &local, 0);
|
||||||
|
if (status) {
|
||||||
|
bpf_cpumask_release(mask);
|
||||||
|
return status;
|
||||||
|
}
|
||||||
|
|
||||||
|
v = bpf_map_lookup_elem(&cpumask_map, &key);
|
||||||
|
if (!v) {
|
||||||
|
bpf_cpumask_release(mask);
|
||||||
|
return -ENOENT;
|
||||||
|
}
|
||||||
|
|
||||||
|
old = bpf_kptr_xchg(&v->cpumask, mask);
|
||||||
|
if (old)
|
||||||
|
bpf_cpumask_release(old);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A sample tracepoint showing how a task's cpumask can be queried and
|
||||||
|
* recorded as a kptr.
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/task_newtask")
|
||||||
|
int BPF_PROG(record_task_cpumask, struct task_struct *task, u64 clone_flags)
|
||||||
|
{
|
||||||
|
struct bpf_cpumask *cpumask;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
cpumask = bpf_cpumask_create();
|
||||||
|
if (!cpumask)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
if (!bpf_cpumask_full(task->cpus_ptr))
|
||||||
|
bpf_printk("task %s has CPU affinity", task->comm);
|
||||||
|
|
||||||
|
bpf_cpumask_copy(cpumask, task->cpus_ptr);
|
||||||
|
return cpumask_map_insert(cpumask, task->pid);
|
||||||
|
}
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
2.1.1 ``struct bpf_cpumask *`` as kptrs
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
As mentioned and illustrated above, these ``struct bpf_cpumask *`` objects can
|
||||||
|
also be stored in a map and used as kptrs. If a ``struct bpf_cpumask *`` is in
|
||||||
|
a map, the reference can be removed from the map with bpf_kptr_xchg(), or
|
||||||
|
opportunistically acquired with bpf_cpumask_kptr_get():
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_kptr_get
|
||||||
|
|
||||||
|
Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
/* struct containing the struct bpf_cpumask kptr which is stored in the map. */
|
||||||
|
struct cpumasks_kfunc_map_value {
|
||||||
|
struct bpf_cpumask __kptr_ref * bpf_cpumask;
|
||||||
|
};
|
||||||
|
|
||||||
|
/* The map containing struct cpumasks_kfunc_map_value entries. */
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||||
|
__type(key, int);
|
||||||
|
__type(value, struct cpumasks_kfunc_map_value);
|
||||||
|
__uint(max_entries, 1);
|
||||||
|
} cpumasks_kfunc_map SEC(".maps");
|
||||||
|
|
||||||
|
/* ... */
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A simple example tracepoint program showing how a
|
||||||
|
* struct bpf_cpumask * kptr that is stored in a map can
|
||||||
|
* be acquired using the bpf_cpumask_kptr_get() kfunc.
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/cgroup_mkdir")
|
||||||
|
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
|
||||||
|
{
|
||||||
|
struct bpf_cpumask *kptr;
|
||||||
|
struct cpumasks_kfunc_map_value *v;
|
||||||
|
u32 key = 0;
|
||||||
|
|
||||||
|
/* Assume a bpf_cpumask * kptr was previously stored in the map. */
|
||||||
|
v = bpf_map_lookup_elem(&cpumasks_kfunc_map, &key);
|
||||||
|
if (!v)
|
||||||
|
return -ENOENT;
|
||||||
|
|
||||||
|
/* Acquire a reference to the bpf_cpumask * kptr that's already stored in the map. */
|
||||||
|
kptr = bpf_cpumask_kptr_get(&v->cpumask);
|
||||||
|
if (!kptr)
|
||||||
|
/* If no bpf_cpumask was present in the map, it's because
|
||||||
|
* we're racing with another CPU that removed it with
|
||||||
|
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
|
||||||
|
* above, and our call to bpf_cpumask_kptr_get().
|
||||||
|
* bpf_cpumask_kptr_get() internally safely handles this
|
||||||
|
* race, and will return NULL if the cpumask is no longer
|
||||||
|
* present in the map by the time we invoke the kfunc.
|
||||||
|
*/
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
|
/* Free the reference we just took above. Note that the
|
||||||
|
* original struct bpf_cpumask * kptr is still in the map. It will
|
||||||
|
* be freed either at a later time if another context deletes
|
||||||
|
* it from the map, or automatically by the BPF subsystem if
|
||||||
|
* it's still present when the map is destroyed.
|
||||||
|
*/
|
||||||
|
bpf_cpumask_release(kptr);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
2.2 ``struct cpumask``
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
``struct cpumask`` is the object that actually contains the cpumask bitmap
|
||||||
|
being queried, mutated, etc. A ``struct bpf_cpumask`` wraps a ``struct
|
||||||
|
cpumask``, which is why it's safe to cast it as such (note however that it is
|
||||||
|
**not** safe to cast a ``struct cpumask *`` to a ``struct bpf_cpumask *``, and
|
||||||
|
the verifier will reject any program that tries to do so).
|
||||||
|
|
||||||
|
As we'll see below, any kfunc that mutates its cpumask argument will take a
|
||||||
|
``struct bpf_cpumask *`` as that argument. Any argument that simply queries the
|
||||||
|
cpumask will instead take a ``struct cpumask *``.
|
||||||
|
|
||||||
|
3. cpumask kfuncs
|
||||||
|
=================
|
||||||
|
|
||||||
|
Above, we described the kfuncs that can be used to allocate, acquire, release,
|
||||||
|
etc a ``struct bpf_cpumask *``. This section of the document will describe the
|
||||||
|
kfuncs for mutating and querying cpumasks.
|
||||||
|
|
||||||
|
3.1 Mutating cpumasks
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Some cpumask kfuncs are "read-only" in that they don't mutate any of their
|
||||||
|
arguments, whereas others mutate at least one argument (which means that the
|
||||||
|
argument must be a ``struct bpf_cpumask *``, as described above).
|
||||||
|
|
||||||
|
This section will describe all of the cpumask kfuncs which mutate at least one
|
||||||
|
argument. :ref:`cpumasks-querying-label` below describes the read-only kfuncs.
|
||||||
|
|
||||||
|
3.1.1 Setting and clearing CPUs
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
bpf_cpumask_set_cpu() and bpf_cpumask_clear_cpu() can be used to set and clear
|
||||||
|
a CPU in a ``struct bpf_cpumask`` respectively:
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_set_cpu bpf_cpumask_clear_cpu
|
||||||
|
|
||||||
|
These kfuncs are pretty straightforward, and can be used, for example, as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A sample tracepoint showing how a cpumask can be queried.
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/task_newtask")
|
||||||
|
int BPF_PROG(test_set_clear_cpu, struct task_struct *task, u64 clone_flags)
|
||||||
|
{
|
||||||
|
struct bpf_cpumask *cpumask;
|
||||||
|
|
||||||
|
cpumask = bpf_cpumask_create();
|
||||||
|
if (!cpumask)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
bpf_cpumask_set_cpu(0, cpumask);
|
||||||
|
if (!bpf_cpumask_test_cpu(0, cast(cpumask)))
|
||||||
|
/* Should never happen. */
|
||||||
|
goto release_exit;
|
||||||
|
|
||||||
|
bpf_cpumask_clear_cpu(0, cpumask);
|
||||||
|
if (bpf_cpumask_test_cpu(0, cast(cpumask)))
|
||||||
|
/* Should never happen. */
|
||||||
|
goto release_exit;
|
||||||
|
|
||||||
|
/* struct cpumask * pointers such as task->cpus_ptr can also be queried. */
|
||||||
|
if (bpf_cpumask_test_cpu(0, task->cpus_ptr))
|
||||||
|
bpf_printk("task %s can use CPU %d", task->comm, 0);
|
||||||
|
|
||||||
|
release_exit:
|
||||||
|
bpf_cpumask_release(cpumask);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
bpf_cpumask_test_and_set_cpu() and bpf_cpumask_test_and_clear_cpu() are
|
||||||
|
complementary kfuncs that allow callers to atomically test and set (or clear)
|
||||||
|
CPUs:
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_test_and_set_cpu bpf_cpumask_test_and_clear_cpu
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
We can also set and clear entire ``struct bpf_cpumask *`` objects in one
|
||||||
|
operation using bpf_cpumask_setall() and bpf_cpumask_clear():
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_setall bpf_cpumask_clear
|
||||||
|
|
||||||
|
3.1.2 Operations between cpumasks
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
In addition to setting and clearing individual CPUs in a single cpumask,
|
||||||
|
callers can also perform bitwise operations between multiple cpumasks using
|
||||||
|
bpf_cpumask_and(), bpf_cpumask_or(), and bpf_cpumask_xor():
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_and bpf_cpumask_or bpf_cpumask_xor
|
||||||
|
|
||||||
|
The following is an example of how they may be used. Note that some of the
|
||||||
|
kfuncs shown in this example will be covered in more detail below.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A sample tracepoint showing how a cpumask can be mutated using
|
||||||
|
bitwise operators (and queried).
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/task_newtask")
|
||||||
|
int BPF_PROG(test_and_or_xor, struct task_struct *task, u64 clone_flags)
|
||||||
|
{
|
||||||
|
struct bpf_cpumask *mask1, *mask2, *dst1, *dst2;
|
||||||
|
|
||||||
|
mask1 = bpf_cpumask_create();
|
||||||
|
if (!mask1)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
mask2 = bpf_cpumask_create();
|
||||||
|
if (!mask2) {
|
||||||
|
bpf_cpumask_release(mask1);
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ...Safely create the other two masks... */
|
||||||
|
|
||||||
|
bpf_cpumask_set_cpu(0, mask1);
|
||||||
|
bpf_cpumask_set_cpu(1, mask2);
|
||||||
|
bpf_cpumask_and(dst1, (const struct cpumask *)mask1, (const struct cpumask *)mask2);
|
||||||
|
if (!bpf_cpumask_empty((const struct cpumask *)dst1))
|
||||||
|
/* Should never happen. */
|
||||||
|
goto release_exit;
|
||||||
|
|
||||||
|
bpf_cpumask_or(dst1, (const struct cpumask *)mask1, (const struct cpumask *)mask2);
|
||||||
|
if (!bpf_cpumask_test_cpu(0, (const struct cpumask *)dst1))
|
||||||
|
/* Should never happen. */
|
||||||
|
goto release_exit;
|
||||||
|
|
||||||
|
if (!bpf_cpumask_test_cpu(1, (const struct cpumask *)dst1))
|
||||||
|
/* Should never happen. */
|
||||||
|
goto release_exit;
|
||||||
|
|
||||||
|
bpf_cpumask_xor(dst2, (const struct cpumask *)mask1, (const struct cpumask *)mask2);
|
||||||
|
if (!bpf_cpumask_equal((const struct cpumask *)dst1,
|
||||||
|
(const struct cpumask *)dst2))
|
||||||
|
/* Should never happen. */
|
||||||
|
goto release_exit;
|
||||||
|
|
||||||
|
release_exit:
|
||||||
|
bpf_cpumask_release(mask1);
|
||||||
|
bpf_cpumask_release(mask2);
|
||||||
|
bpf_cpumask_release(dst1);
|
||||||
|
bpf_cpumask_release(dst2);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
The contents of an entire cpumask may be copied to another using
|
||||||
|
bpf_cpumask_copy():
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_copy
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
.. _cpumasks-querying-label:
|
||||||
|
|
||||||
|
3.2 Querying cpumasks
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
In addition to the above kfuncs, there is also a set of read-only kfuncs that
|
||||||
|
can be used to query the contents of cpumasks.
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_first bpf_cpumask_first_zero bpf_cpumask_test_cpu
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_equal bpf_cpumask_intersects bpf_cpumask_subset
|
||||||
|
bpf_cpumask_empty bpf_cpumask_full
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumask.c
|
||||||
|
:identifiers: bpf_cpumask_any bpf_cpumask_any_and
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
Some example usages of these querying kfuncs were shown above. We will not
|
||||||
|
replicate those exmaples here. Note, however, that all of the aforementioned
|
||||||
|
kfuncs are tested in `tools/testing/selftests/bpf/progs/cpumask_success.c`_, so
|
||||||
|
please take a look there if you're looking for more examples of how they can be
|
||||||
|
used.
|
||||||
|
|
||||||
|
.. _tools/testing/selftests/bpf/progs/cpumask_success.c:
|
||||||
|
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/cpumask_success.c
|
||||||
|
|
||||||
|
|
||||||
|
4. Adding BPF cpumask kfuncs
|
||||||
|
============================
|
||||||
|
|
||||||
|
The set of supported BPF cpumask kfuncs are not (yet) a 1-1 match with the
|
||||||
|
cpumask operations in include/linux/cpumask.h. Any of those cpumask operations
|
||||||
|
could easily be encapsulated in a new kfunc if and when required. If you'd like
|
||||||
|
to support a new cpumask operation, please feel free to submit a patch. If you
|
||||||
|
do add a new cpumask kfunc, please document it here, and add any relevant
|
||||||
|
selftest testcases to the cpumask selftest suite.
|
267
Documentation/bpf/graph_ds_impl.rst
Normal file
267
Documentation/bpf/graph_ds_impl.rst
Normal file
@ -0,0 +1,267 @@
|
|||||||
|
=========================
|
||||||
|
BPF Graph Data Structures
|
||||||
|
=========================
|
||||||
|
|
||||||
|
This document describes implementation details of new-style "graph" data
|
||||||
|
structures (linked_list, rbtree), with particular focus on the verifier's
|
||||||
|
implementation of semantics specific to those data structures.
|
||||||
|
|
||||||
|
Although no specific verifier code is referred to in this document, the document
|
||||||
|
assumes that the reader has general knowledge of BPF verifier internals, BPF
|
||||||
|
maps, and BPF program writing.
|
||||||
|
|
||||||
|
Note that the intent of this document is to describe the current state of
|
||||||
|
these graph data structures. **No guarantees** of stability for either
|
||||||
|
semantics or APIs are made or implied here.
|
||||||
|
|
||||||
|
.. contents::
|
||||||
|
:local:
|
||||||
|
:depth: 2
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
|
||||||
|
The BPF map API has historically been the main way to expose data structures
|
||||||
|
of various types for use within BPF programs. Some data structures fit naturally
|
||||||
|
with the map API (HASH, ARRAY), others less so. Consequentially, programs
|
||||||
|
interacting with the latter group of data structures can be hard to parse
|
||||||
|
for kernel programmers without previous BPF experience.
|
||||||
|
|
||||||
|
Luckily, some restrictions which necessitated the use of BPF map semantics are
|
||||||
|
no longer relevant. With the introduction of kfuncs, kptrs, and the any-context
|
||||||
|
BPF allocator, it is now possible to implement BPF data structures whose API
|
||||||
|
and semantics more closely match those exposed to the rest of the kernel.
|
||||||
|
|
||||||
|
Two such data structures - linked_list and rbtree - have many verification
|
||||||
|
details in common. Because both have "root"s ("head" for linked_list) and
|
||||||
|
"node"s, the verifier code and this document refer to common functionality
|
||||||
|
as "graph_api", "graph_root", "graph_node", etc.
|
||||||
|
|
||||||
|
Unless otherwise stated, examples and semantics below apply to both graph data
|
||||||
|
structures.
|
||||||
|
|
||||||
|
Unstable API
|
||||||
|
------------
|
||||||
|
|
||||||
|
Data structures implemented using the BPF map API have historically used BPF
|
||||||
|
helper functions - either standard map API helpers like ``bpf_map_update_elem``
|
||||||
|
or map-specific helpers. The new-style graph data structures instead use kfuncs
|
||||||
|
to define their manipulation helpers. Because there are no stability guarantees
|
||||||
|
for kfuncs, the API and semantics for these data structures can be evolved in
|
||||||
|
a way that breaks backwards compatibility if necessary.
|
||||||
|
|
||||||
|
Root and node types for the new data structures are opaquely defined in the
|
||||||
|
``uapi/linux/bpf.h`` header.
|
||||||
|
|
||||||
|
Locking
|
||||||
|
-------
|
||||||
|
|
||||||
|
The new-style data structures are intrusive and are defined similarly to their
|
||||||
|
vanilla kernel counterparts:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct node_data {
|
||||||
|
long key;
|
||||||
|
long data;
|
||||||
|
struct bpf_rb_node node;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct bpf_spin_lock glock;
|
||||||
|
struct bpf_rb_root groot __contains(node_data, node);
|
||||||
|
|
||||||
|
The "root" type for both linked_list and rbtree expects to be in a map_value
|
||||||
|
which also contains a ``bpf_spin_lock`` - in the above example both global
|
||||||
|
variables are placed in a single-value arraymap. The verifier considers this
|
||||||
|
spin_lock to be associated with the ``bpf_rb_root`` by virtue of both being in
|
||||||
|
the same map_value and will enforce that the correct lock is held when
|
||||||
|
verifying BPF programs that manipulate the tree. Since this lock checking
|
||||||
|
happens at verification time, there is no runtime penalty.
|
||||||
|
|
||||||
|
Non-owning references
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
**Motivation**
|
||||||
|
|
||||||
|
Consider the following BPF code:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */
|
||||||
|
|
||||||
|
bpf_spin_lock(&lock);
|
||||||
|
|
||||||
|
bpf_rbtree_add(&tree, n); /* PASSED */
|
||||||
|
|
||||||
|
bpf_spin_unlock(&lock);
|
||||||
|
|
||||||
|
From the verifier's perspective, the pointer ``n`` returned from ``bpf_obj_new``
|
||||||
|
has type ``PTR_TO_BTF_ID | MEM_ALLOC``, with a ``btf_id`` of
|
||||||
|
``struct node_data`` and a nonzero ``ref_obj_id``. Because it holds ``n``, the
|
||||||
|
program has ownership of the pointee's (object pointed to by ``n``) lifetime.
|
||||||
|
The BPF program must pass off ownership before exiting - either via
|
||||||
|
``bpf_obj_drop``, which ``free``'s the object, or by adding it to ``tree`` with
|
||||||
|
``bpf_rbtree_add``.
|
||||||
|
|
||||||
|
(``ACQUIRED`` and ``PASSED`` comments in the example denote statements where
|
||||||
|
"ownership is acquired" and "ownership is passed", respectively)
|
||||||
|
|
||||||
|
What should the verifier do with ``n`` after ownership is passed off? If the
|
||||||
|
object was ``free``'d with ``bpf_obj_drop`` the answer is obvious: the verifier
|
||||||
|
should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as
|
||||||
|
the object is no longer valid. The underlying memory may have been reused for
|
||||||
|
some other allocation, unmapped, etc.
|
||||||
|
|
||||||
|
When ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less
|
||||||
|
obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``,
|
||||||
|
but that would result in programs with useful, common coding patterns being
|
||||||
|
rejected, e.g.:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int x;
|
||||||
|
struct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */
|
||||||
|
|
||||||
|
bpf_spin_lock(&lock);
|
||||||
|
|
||||||
|
bpf_rbtree_add(&tree, n); /* PASSED */
|
||||||
|
x = n->data;
|
||||||
|
n->data = 42;
|
||||||
|
|
||||||
|
bpf_spin_unlock(&lock);
|
||||||
|
|
||||||
|
Both the read from and write to ``n->data`` would be rejected. The verifier
|
||||||
|
can do better, though, by taking advantage of two details:
|
||||||
|
|
||||||
|
* Graph data structure APIs can only be used when the ``bpf_spin_lock``
|
||||||
|
associated with the graph root is held
|
||||||
|
|
||||||
|
* Both graph data structures have pointer stability
|
||||||
|
|
||||||
|
* Because graph nodes are allocated with ``bpf_obj_new`` and
|
||||||
|
adding / removing from the root involves fiddling with the
|
||||||
|
``bpf_{list,rb}_node`` field of the node struct, a graph node will
|
||||||
|
remain at the same address after either operation.
|
||||||
|
|
||||||
|
Because the associated ``bpf_spin_lock`` must be held by any program adding
|
||||||
|
or removing, if we're in the critical section bounded by that lock, we know
|
||||||
|
that no other program can add or remove until the end of the critical section.
|
||||||
|
This combined with pointer stability means that, until the critical section
|
||||||
|
ends, we can safely access the graph node through ``n`` even after it was used
|
||||||
|
to pass ownership.
|
||||||
|
|
||||||
|
The verifier considers such a reference a *non-owning reference*. The ref
|
||||||
|
returned by ``bpf_obj_new`` is accordingly considered an *owning reference*.
|
||||||
|
Both terms currently only have meaning in the context of graph nodes and API.
|
||||||
|
|
||||||
|
**Details**
|
||||||
|
|
||||||
|
Let's enumerate the properties of both types of references.
|
||||||
|
|
||||||
|
*owning reference*
|
||||||
|
|
||||||
|
* This reference controls the lifetime of the pointee
|
||||||
|
|
||||||
|
* Ownership of pointee must be 'released' by passing it to some graph API
|
||||||
|
kfunc, or via ``bpf_obj_drop``, which ``free``'s the pointee
|
||||||
|
|
||||||
|
* If not released before program ends, verifier considers program invalid
|
||||||
|
|
||||||
|
* Access to the pointee's memory will not page fault
|
||||||
|
|
||||||
|
*non-owning reference*
|
||||||
|
|
||||||
|
* This reference does not own the pointee
|
||||||
|
|
||||||
|
* It cannot be used to add the graph node to a graph root, nor ``free``'d via
|
||||||
|
``bpf_obj_drop``
|
||||||
|
|
||||||
|
* No explicit control of lifetime, but can infer valid lifetime based on
|
||||||
|
non-owning ref existence (see explanation below)
|
||||||
|
|
||||||
|
* Access to the pointee's memory will not page fault
|
||||||
|
|
||||||
|
From verifier's perspective non-owning references can only exist
|
||||||
|
between spin_lock and spin_unlock. Why? After spin_unlock another program
|
||||||
|
can do arbitrary operations on the data structure like removing and ``free``-ing
|
||||||
|
via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
|
||||||
|
``free``'d, and reused via bpf_obj_new would point to an entirely different thing.
|
||||||
|
Or the memory could go away.
|
||||||
|
|
||||||
|
To prevent this logic violation all non-owning references are invalidated by the
|
||||||
|
verifier after a critical section ends. This is necessary to ensure the "will
|
||||||
|
not page fault" property of non-owning references. So if the verifier hasn't
|
||||||
|
invalidated a non-owning ref, accessing it will not page fault.
|
||||||
|
|
||||||
|
Currently ``bpf_obj_drop`` is not allowed in the critical section, so
|
||||||
|
if there's a valid non-owning ref, we must be in a critical section, and can
|
||||||
|
conclude that the ref's memory hasn't been dropped-and- ``free``'d or
|
||||||
|
dropped-and-reused.
|
||||||
|
|
||||||
|
Any reference to a node that is in an rbtree _must_ be non-owning, since
|
||||||
|
the tree has control of the pointee's lifetime. Similarly, any ref to a node
|
||||||
|
that isn't in rbtree _must_ be owning. This results in a nice property:
|
||||||
|
graph API add / remove implementations don't need to check if a node
|
||||||
|
has already been added (or already removed), as the ownership model
|
||||||
|
allows the verifier to prevent such a state from being valid by simply checking
|
||||||
|
types.
|
||||||
|
|
||||||
|
However, pointer aliasing poses an issue for the above "nice property".
|
||||||
|
Consider the following example:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct node_data *n, *m, *o, *p;
|
||||||
|
n = bpf_obj_new(typeof(*n)); /* 1 */
|
||||||
|
|
||||||
|
bpf_spin_lock(&lock);
|
||||||
|
|
||||||
|
bpf_rbtree_add(&tree, n); /* 2 */
|
||||||
|
m = bpf_rbtree_first(&tree); /* 3 */
|
||||||
|
|
||||||
|
o = bpf_rbtree_remove(&tree, n); /* 4 */
|
||||||
|
p = bpf_rbtree_remove(&tree, m); /* 5 */
|
||||||
|
|
||||||
|
bpf_spin_unlock(&lock);
|
||||||
|
|
||||||
|
bpf_obj_drop(o);
|
||||||
|
bpf_obj_drop(p); /* 6 */
|
||||||
|
|
||||||
|
Assume the tree is empty before this program runs. If we track verifier state
|
||||||
|
changes here using numbers in above comments:
|
||||||
|
|
||||||
|
1) n is an owning reference
|
||||||
|
|
||||||
|
2) n is a non-owning reference, it's been added to the tree
|
||||||
|
|
||||||
|
3) n and m are non-owning references, they both point to the same node
|
||||||
|
|
||||||
|
4) o is an owning reference, n and m non-owning, all point to same node
|
||||||
|
|
||||||
|
5) o and p are owning, n and m non-owning, all point to the same node
|
||||||
|
|
||||||
|
6) a double-free has occurred, since o and p point to same node and o was
|
||||||
|
``free``'d in previous statement
|
||||||
|
|
||||||
|
States 4 and 5 violate our "nice property", as there are non-owning refs to
|
||||||
|
a node which is not in an rbtree. Statement 5 will try to remove a node which
|
||||||
|
has already been removed as a result of this violation. State 6 is a dangerous
|
||||||
|
double-free.
|
||||||
|
|
||||||
|
At a minimum we should prevent state 6 from being possible. If we can't also
|
||||||
|
prevent state 5 then we must abandon our "nice property" and check whether a
|
||||||
|
node has already been removed at runtime.
|
||||||
|
|
||||||
|
We prevent both by generalizing the "invalidate non-owning references" behavior
|
||||||
|
of ``bpf_spin_unlock`` and doing similar invalidation after
|
||||||
|
``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:
|
||||||
|
|
||||||
|
* takes an arbitrary node argument
|
||||||
|
|
||||||
|
* removes it from the data structure
|
||||||
|
|
||||||
|
* returns an owning reference to the removed node
|
||||||
|
|
||||||
|
May result in a state where some other non-owning reference points to the same
|
||||||
|
node. So ``remove``-type kfuncs must be considered a non-owning reference
|
||||||
|
invalidation point as well.
|
@ -20,6 +20,7 @@ that goes into great technical depth about the BPF Architecture.
|
|||||||
syscall_api
|
syscall_api
|
||||||
helpers
|
helpers
|
||||||
kfuncs
|
kfuncs
|
||||||
|
cpumasks
|
||||||
programs
|
programs
|
||||||
maps
|
maps
|
||||||
bpf_prog_run
|
bpf_prog_run
|
||||||
|
@ -7,6 +7,11 @@ eBPF Instruction Set Specification, v1.0
|
|||||||
|
|
||||||
This document specifies version 1.0 of the eBPF instruction set.
|
This document specifies version 1.0 of the eBPF instruction set.
|
||||||
|
|
||||||
|
Documentation conventions
|
||||||
|
=========================
|
||||||
|
|
||||||
|
For brevity, this document uses the type notion "u64", "u32", etc.
|
||||||
|
to mean an unsigned integer whose width is the specified number of bits.
|
||||||
|
|
||||||
Registers and calling convention
|
Registers and calling convention
|
||||||
================================
|
================================
|
||||||
@ -30,20 +35,56 @@ Instruction encoding
|
|||||||
eBPF has two instruction encodings:
|
eBPF has two instruction encodings:
|
||||||
|
|
||||||
* the basic instruction encoding, which uses 64 bits to encode an instruction
|
* the basic instruction encoding, which uses 64 bits to encode an instruction
|
||||||
* the wide instruction encoding, which appends a second 64-bit immediate value
|
* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
|
||||||
(imm64) after the basic instruction for a total of 128 bits.
|
constant) value after the basic instruction for a total of 128 bits.
|
||||||
|
|
||||||
The basic instruction encoding looks as follows:
|
The basic instruction encoding is as follows, where MSB and LSB mean the most significant
|
||||||
|
bits and least significant bits, respectively:
|
||||||
|
|
||||||
============= ======= =============== ==================== ============
|
============= ======= ======= ======= ============
|
||||||
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
|
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
|
||||||
============= ======= =============== ==================== ============
|
============= ======= ======= ======= ============
|
||||||
immediate offset source register destination register opcode
|
imm offset src_reg dst_reg opcode
|
||||||
============= ======= =============== ==================== ============
|
============= ======= ======= ======= ============
|
||||||
|
|
||||||
|
**imm**
|
||||||
|
signed integer immediate value
|
||||||
|
|
||||||
|
**offset**
|
||||||
|
signed integer offset used with pointer arithmetic
|
||||||
|
|
||||||
|
**src_reg**
|
||||||
|
the source register number (0-10), except where otherwise specified
|
||||||
|
(`64-bit immediate instructions`_ reuse this field for other purposes)
|
||||||
|
|
||||||
|
**dst_reg**
|
||||||
|
destination register number (0-10)
|
||||||
|
|
||||||
|
**opcode**
|
||||||
|
operation to perform
|
||||||
|
|
||||||
Note that most instructions do not use all of the fields.
|
Note that most instructions do not use all of the fields.
|
||||||
Unused fields shall be cleared to zero.
|
Unused fields shall be cleared to zero.
|
||||||
|
|
||||||
|
As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
|
||||||
|
instruction uses a 64-bit immediate value that is constructed as follows.
|
||||||
|
The 64 bits following the basic instruction contain a pseudo instruction
|
||||||
|
using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
|
||||||
|
and imm containing the high 32 bits of the immediate value.
|
||||||
|
|
||||||
|
================= ==================
|
||||||
|
64 bits (MSB) 64 bits (LSB)
|
||||||
|
================= ==================
|
||||||
|
basic instruction pseudo instruction
|
||||||
|
================= ==================
|
||||||
|
|
||||||
|
Thus the 64-bit immediate value is constructed as follows:
|
||||||
|
|
||||||
|
imm64 = (next_imm << 32) | imm
|
||||||
|
|
||||||
|
where 'next_imm' refers to the imm value of the pseudo instruction
|
||||||
|
following the basic instruction.
|
||||||
|
|
||||||
Instruction classes
|
Instruction classes
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
@ -71,27 +112,32 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
|
|||||||
============== ====== =================
|
============== ====== =================
|
||||||
4 bits (MSB) 1 bit 3 bits (LSB)
|
4 bits (MSB) 1 bit 3 bits (LSB)
|
||||||
============== ====== =================
|
============== ====== =================
|
||||||
operation code source instruction class
|
code source instruction class
|
||||||
============== ====== =================
|
============== ====== =================
|
||||||
|
|
||||||
The 4th bit encodes the source operand:
|
**code**
|
||||||
|
the operation code, whose meaning varies by instruction class
|
||||||
|
|
||||||
====== ===== ========================================
|
**source**
|
||||||
|
the source operand location, which unless otherwise specified is one of:
|
||||||
|
|
||||||
|
====== ===== ==============================================
|
||||||
source value description
|
source value description
|
||||||
====== ===== ========================================
|
====== ===== ==============================================
|
||||||
BPF_K 0x00 use 32-bit immediate as source operand
|
BPF_K 0x00 use 32-bit 'imm' value as source operand
|
||||||
BPF_X 0x08 use 'src_reg' register as source operand
|
BPF_X 0x08 use 'src_reg' register value as source operand
|
||||||
====== ===== ========================================
|
====== ===== ==============================================
|
||||||
|
|
||||||
The four MSB bits store the operation code.
|
|
||||||
|
|
||||||
|
**instruction class**
|
||||||
|
the instruction class (see `Instruction classes`_)
|
||||||
|
|
||||||
Arithmetic instructions
|
Arithmetic instructions
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
|
``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
|
||||||
otherwise identical operations.
|
otherwise identical operations.
|
||||||
The 'code' field encodes the operation as below:
|
The 'code' field encodes the operation as below, where 'src' and 'dst' refer
|
||||||
|
to the values of the source and destination registers, respectively.
|
||||||
|
|
||||||
======== ===== ==========================================================
|
======== ===== ==========================================================
|
||||||
code value description
|
code value description
|
||||||
@ -99,35 +145,49 @@ code value description
|
|||||||
BPF_ADD 0x00 dst += src
|
BPF_ADD 0x00 dst += src
|
||||||
BPF_SUB 0x10 dst -= src
|
BPF_SUB 0x10 dst -= src
|
||||||
BPF_MUL 0x20 dst \*= src
|
BPF_MUL 0x20 dst \*= src
|
||||||
BPF_DIV 0x30 dst /= src
|
BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0
|
||||||
BPF_OR 0x40 dst \|= src
|
BPF_OR 0x40 dst \|= src
|
||||||
BPF_AND 0x50 dst &= src
|
BPF_AND 0x50 dst &= src
|
||||||
BPF_LSH 0x60 dst <<= src
|
BPF_LSH 0x60 dst <<= src
|
||||||
BPF_RSH 0x70 dst >>= src
|
BPF_RSH 0x70 dst >>= src
|
||||||
BPF_NEG 0x80 dst = ~src
|
BPF_NEG 0x80 dst = ~src
|
||||||
BPF_MOD 0x90 dst %= src
|
BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst
|
||||||
BPF_XOR 0xa0 dst ^= src
|
BPF_XOR 0xa0 dst ^= src
|
||||||
BPF_MOV 0xb0 dst = src
|
BPF_MOV 0xb0 dst = src
|
||||||
BPF_ARSH 0xc0 sign extending shift right
|
BPF_ARSH 0xc0 sign extending shift right
|
||||||
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
|
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
|
||||||
======== ===== ==========================================================
|
======== ===== ==========================================================
|
||||||
|
|
||||||
|
Underflow and overflow are allowed during arithmetic operations, meaning
|
||||||
|
the 64-bit or 32-bit value will wrap. If eBPF program execution would
|
||||||
|
result in division by zero, the destination register is instead set to zero.
|
||||||
|
If execution would result in modulo by zero, for ``BPF_ALU64`` the value of
|
||||||
|
the destination register is unchanged whereas for ``BPF_ALU`` the upper
|
||||||
|
32 bits of the destination register are zeroed.
|
||||||
|
|
||||||
``BPF_ADD | BPF_X | BPF_ALU`` means::
|
``BPF_ADD | BPF_X | BPF_ALU`` means::
|
||||||
|
|
||||||
dst_reg = (u32) dst_reg + (u32) src_reg;
|
dst = (u32) ((u32) dst + (u32) src)
|
||||||
|
|
||||||
|
where '(u32)' indicates that the upper 32 bits are zeroed.
|
||||||
|
|
||||||
``BPF_ADD | BPF_X | BPF_ALU64`` means::
|
``BPF_ADD | BPF_X | BPF_ALU64`` means::
|
||||||
|
|
||||||
dst_reg = dst_reg + src_reg
|
dst = dst + src
|
||||||
|
|
||||||
``BPF_XOR | BPF_K | BPF_ALU`` means::
|
``BPF_XOR | BPF_K | BPF_ALU`` means::
|
||||||
|
|
||||||
dst_reg = (u32) dst_reg ^ (u32) imm32
|
dst = (u32) dst ^ (u32) imm32
|
||||||
|
|
||||||
``BPF_XOR | BPF_K | BPF_ALU64`` means::
|
``BPF_XOR | BPF_K | BPF_ALU64`` means::
|
||||||
|
|
||||||
dst_reg = dst_reg ^ imm32
|
dst = dst ^ imm32
|
||||||
|
|
||||||
|
Also note that the division and modulo operations are unsigned. Thus, for
|
||||||
|
``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
|
||||||
|
for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result
|
||||||
|
interpreted as an unsigned 64-bit value. There are no instructions for
|
||||||
|
signed division or modulo.
|
||||||
|
|
||||||
Byte swap instructions
|
Byte swap instructions
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -155,11 +215,11 @@ Examples:
|
|||||||
|
|
||||||
``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
|
``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
|
||||||
|
|
||||||
dst_reg = htole16(dst_reg)
|
dst = htole16(dst)
|
||||||
|
|
||||||
``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
|
``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
|
||||||
|
|
||||||
dst_reg = htobe64(dst_reg)
|
dst = htobe64(dst)
|
||||||
|
|
||||||
Jump instructions
|
Jump instructions
|
||||||
-----------------
|
-----------------
|
||||||
@ -234,15 +294,15 @@ instructions that transfer data between a register and memory.
|
|||||||
|
|
||||||
``BPF_MEM | <size> | BPF_STX`` means::
|
``BPF_MEM | <size> | BPF_STX`` means::
|
||||||
|
|
||||||
*(size *) (dst_reg + off) = src_reg
|
*(size *) (dst + offset) = src
|
||||||
|
|
||||||
``BPF_MEM | <size> | BPF_ST`` means::
|
``BPF_MEM | <size> | BPF_ST`` means::
|
||||||
|
|
||||||
*(size *) (dst_reg + off) = imm32
|
*(size *) (dst + offset) = imm32
|
||||||
|
|
||||||
``BPF_MEM | <size> | BPF_LDX`` means::
|
``BPF_MEM | <size> | BPF_LDX`` means::
|
||||||
|
|
||||||
dst_reg = *(size *) (src_reg + off)
|
dst = *(size *) (src + offset)
|
||||||
|
|
||||||
Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
|
Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
|
||||||
|
|
||||||
@ -276,11 +336,11 @@ BPF_XOR 0xa0 atomic xor
|
|||||||
|
|
||||||
``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
|
``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
|
||||||
|
|
||||||
*(u32 *)(dst_reg + off16) += src_reg
|
*(u32 *)(dst + offset) += src
|
||||||
|
|
||||||
``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
|
``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
|
||||||
|
|
||||||
*(u64 *)(dst_reg + off16) += src_reg
|
*(u64 *)(dst + offset) += src
|
||||||
|
|
||||||
In addition to the simple atomic operations, there also is a modifier and
|
In addition to the simple atomic operations, there also is a modifier and
|
||||||
two complex atomic operations:
|
two complex atomic operations:
|
||||||
@ -295,16 +355,16 @@ BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
|
|||||||
|
|
||||||
The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
|
The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
|
||||||
always set for the complex atomic operations. If the ``BPF_FETCH`` flag
|
always set for the complex atomic operations. If the ``BPF_FETCH`` flag
|
||||||
is set, then the operation also overwrites ``src_reg`` with the value that
|
is set, then the operation also overwrites ``src`` with the value that
|
||||||
was in memory before it was modified.
|
was in memory before it was modified.
|
||||||
|
|
||||||
The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
|
The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
|
||||||
addressed by ``dst_reg + off``.
|
addressed by ``dst + offset``.
|
||||||
|
|
||||||
The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
|
The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
|
||||||
``dst_reg + off`` with ``R0``. If they match, the value addressed by
|
``dst + offset`` with ``R0``. If they match, the value addressed by
|
||||||
``dst_reg + off`` is replaced with ``src_reg``. In either case, the
|
``dst + offset`` is replaced with ``src``. In either case, the
|
||||||
value that was at ``dst_reg + off`` before the operation is zero-extended
|
value that was at ``dst + offset`` before the operation is zero-extended
|
||||||
and loaded back to ``R0``.
|
and loaded back to ``R0``.
|
||||||
|
|
||||||
64-bit immediate instructions
|
64-bit immediate instructions
|
||||||
@ -317,7 +377,7 @@ There is currently only one such instruction.
|
|||||||
|
|
||||||
``BPF_LD | BPF_DW | BPF_IMM`` means::
|
``BPF_LD | BPF_DW | BPF_IMM`` means::
|
||||||
|
|
||||||
dst_reg = imm64
|
dst = imm64
|
||||||
|
|
||||||
|
|
||||||
Legacy BPF Packet access instructions
|
Legacy BPF Packet access instructions
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
.. _kfuncs-header-label:
|
||||||
|
|
||||||
=============================
|
=============================
|
||||||
BPF Kernel Functions (kfuncs)
|
BPF Kernel Functions (kfuncs)
|
||||||
=============================
|
=============================
|
||||||
@ -9,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
|
|||||||
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
|
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
|
||||||
kfuncs do not have a stable interface and can change from one kernel release to
|
kfuncs do not have a stable interface and can change from one kernel release to
|
||||||
another. Hence, BPF programs need to be updated in response to changes in the
|
another. Hence, BPF programs need to be updated in response to changes in the
|
||||||
kernel.
|
kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
|
||||||
|
|
||||||
2. Defining a kfunc
|
2. Defining a kfunc
|
||||||
===================
|
===================
|
||||||
@ -37,7 +41,7 @@ An example is given below::
|
|||||||
__diag_ignore_all("-Wmissing-prototypes",
|
__diag_ignore_all("-Wmissing-prototypes",
|
||||||
"Global kfuncs as their definitions will be in BTF");
|
"Global kfuncs as their definitions will be in BTF");
|
||||||
|
|
||||||
struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
|
__bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
|
||||||
{
|
{
|
||||||
return find_get_task_by_vpid(nr);
|
return find_get_task_by_vpid(nr);
|
||||||
}
|
}
|
||||||
@ -62,7 +66,7 @@ kfunc with a __tag, where tag may be one of the supported annotations.
|
|||||||
This annotation is used to indicate a memory and size pair in the argument list.
|
This annotation is used to indicate a memory and size pair in the argument list.
|
||||||
An example is given below::
|
An example is given below::
|
||||||
|
|
||||||
void bpf_memzero(void *mem, int mem__sz)
|
__bpf_kfunc void bpf_memzero(void *mem, int mem__sz)
|
||||||
{
|
{
|
||||||
...
|
...
|
||||||
}
|
}
|
||||||
@ -82,7 +86,7 @@ safety of the program.
|
|||||||
|
|
||||||
An example is given below::
|
An example is given below::
|
||||||
|
|
||||||
void *bpf_obj_new(u32 local_type_id__k, ...)
|
__bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...)
|
||||||
{
|
{
|
||||||
...
|
...
|
||||||
}
|
}
|
||||||
@ -121,6 +125,20 @@ flags on a set of kfuncs as follows::
|
|||||||
This set encodes the BTF ID of each kfunc listed above, and encodes the flags
|
This set encodes the BTF ID of each kfunc listed above, and encodes the flags
|
||||||
along with it. Ofcourse, it is also allowed to specify no flags.
|
along with it. Ofcourse, it is also allowed to specify no flags.
|
||||||
|
|
||||||
|
kfunc definitions should also always be annotated with the ``__bpf_kfunc``
|
||||||
|
macro. This prevents issues such as the compiler inlining the kfunc if it's a
|
||||||
|
static kernel function, or the function being elided in an LTO build as it's
|
||||||
|
not used in the rest of the kernel. Developers should not manually add
|
||||||
|
annotations to their kfunc to prevent these issues. If an annotation is
|
||||||
|
required to prevent such an issue with your kfunc, it is a bug and should be
|
||||||
|
added to the definition of the macro so that other kfuncs are similarly
|
||||||
|
protected. An example is given below::
|
||||||
|
|
||||||
|
__bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
2.4.1 KF_ACQUIRE flag
|
2.4.1 KF_ACQUIRE flag
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
@ -163,7 +181,8 @@ KF_ACQUIRE and KF_RET_NULL flags.
|
|||||||
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
||||||
indicates that the all pointer arguments are valid, and that all pointers to
|
indicates that the all pointer arguments are valid, and that all pointers to
|
||||||
BTF objects have been passed in their unmodified form (that is, at a zero
|
BTF objects have been passed in their unmodified form (that is, at a zero
|
||||||
offset, and without having been obtained from walking another pointer).
|
offset, and without having been obtained from walking another pointer, with one
|
||||||
|
exception described below).
|
||||||
|
|
||||||
There are two types of pointers to kernel objects which are considered "valid":
|
There are two types of pointers to kernel objects which are considered "valid":
|
||||||
|
|
||||||
@ -176,6 +195,25 @@ KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
|
|||||||
The definition of "valid" pointers is subject to change at any time, and has
|
The definition of "valid" pointers is subject to change at any time, and has
|
||||||
absolutely no ABI stability guarantees.
|
absolutely no ABI stability guarantees.
|
||||||
|
|
||||||
|
As mentioned above, a nested pointer obtained from walking a trusted pointer is
|
||||||
|
no longer trusted, with one exception. If a struct type has a field that is
|
||||||
|
guaranteed to be valid as long as its parent pointer is trusted, the
|
||||||
|
``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
BTF_TYPE_SAFE_NESTED(struct task_struct) {
|
||||||
|
const cpumask_t *cpus_ptr;
|
||||||
|
};
|
||||||
|
|
||||||
|
In other words, you must:
|
||||||
|
|
||||||
|
1. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro.
|
||||||
|
|
||||||
|
2. Specify the type and name of the trusted nested field. This field must match
|
||||||
|
the field in the original type definition exactly.
|
||||||
|
|
||||||
2.4.6 KF_SLEEPABLE flag
|
2.4.6 KF_SLEEPABLE flag
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
@ -200,6 +238,28 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
|
|||||||
The argument may have reference count of 0 and the kfunc must take this
|
The argument may have reference count of 0 and the kfunc must take this
|
||||||
into consideration.
|
into consideration.
|
||||||
|
|
||||||
|
.. _KF_deprecated_flag:
|
||||||
|
|
||||||
|
2.4.9 KF_DEPRECATED flag
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
|
||||||
|
changed or removed in a subsequent kernel release. A kfunc that is
|
||||||
|
marked with KF_DEPRECATED should also have any relevant information
|
||||||
|
captured in its kernel doc. Such information typically includes the
|
||||||
|
kfunc's expected remaining lifespan, a recommendation for new
|
||||||
|
functionality that can replace it if any is available, and possibly a
|
||||||
|
rationale for why it is being removed.
|
||||||
|
|
||||||
|
Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
|
||||||
|
supported and have its KF_DEPRECATED flag removed, it is likely to be far more
|
||||||
|
difficult to remove a KF_DEPRECATED flag after it's been added than it is to
|
||||||
|
prevent it from being added in the first place. As described in
|
||||||
|
:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
|
||||||
|
encouraged to make their use-cases known as early as possible, and participate
|
||||||
|
in upstream discussions regarding whether to keep, change, deprecate, or remove
|
||||||
|
those kfuncs if and when such discussions occur.
|
||||||
|
|
||||||
2.5 Registering the kfuncs
|
2.5 Registering the kfuncs
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
@ -223,14 +283,150 @@ type. An example is shown below::
|
|||||||
}
|
}
|
||||||
late_initcall(init_subsystem);
|
late_initcall(init_subsystem);
|
||||||
|
|
||||||
3. Core kfuncs
|
2.6 Specifying no-cast aliases with ___init
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
The verifier will always enforce that the BTF type of a pointer passed to a
|
||||||
|
kfunc by a BPF program, matches the type of pointer specified in the kfunc
|
||||||
|
definition. The verifier, does, however, allow types that are equivalent
|
||||||
|
according to the C standard to be passed to the same kfunc arg, even if their
|
||||||
|
BTF_IDs differ.
|
||||||
|
|
||||||
|
For example, for the following type definition:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct bpf_cpumask {
|
||||||
|
cpumask_t cpumask;
|
||||||
|
refcount_t usage;
|
||||||
|
};
|
||||||
|
|
||||||
|
The verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc
|
||||||
|
taking a ``cpumask_t *`` (which is a typedef of ``struct cpumask *``). For
|
||||||
|
instance, both ``struct cpumask *`` and ``struct bpf_cpmuask *`` can be passed
|
||||||
|
to bpf_cpumask_test_cpu().
|
||||||
|
|
||||||
|
In some cases, this type-aliasing behavior is not desired. ``struct
|
||||||
|
nf_conn___init`` is one such example:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct nf_conn___init {
|
||||||
|
struct nf_conn ct;
|
||||||
|
};
|
||||||
|
|
||||||
|
The C standard would consider these types to be equivalent, but it would not
|
||||||
|
always be safe to pass either type to a trusted kfunc. ``struct
|
||||||
|
nf_conn___init`` represents an allocated ``struct nf_conn`` object that has
|
||||||
|
*not yet been initialized*, so it would therefore be unsafe to pass a ``struct
|
||||||
|
nf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct
|
||||||
|
nf_conn *`` (e.g. ``bpf_ct_change_timeout()``).
|
||||||
|
|
||||||
|
In order to accommodate such requirements, the verifier will enforce strict
|
||||||
|
PTR_TO_BTF_ID type matching if two types have the exact same name, with one
|
||||||
|
being suffixed with ``___init``.
|
||||||
|
|
||||||
|
.. _BPF_kfunc_lifecycle_expectations:
|
||||||
|
|
||||||
|
3. kfunc lifecycle expectations
|
||||||
|
===============================
|
||||||
|
|
||||||
|
kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
|
||||||
|
strict stability restrictions associated with kernel <-> user UAPIs. This means
|
||||||
|
they can be thought of as similar to EXPORT_SYMBOL_GPL, and can therefore be
|
||||||
|
modified or removed by a maintainer of the subsystem they're defined in when
|
||||||
|
it's deemed necessary.
|
||||||
|
|
||||||
|
Like any other change to the kernel, maintainers will not change or remove a
|
||||||
|
kfunc without having a reasonable justification. Whether or not they'll choose
|
||||||
|
to change a kfunc will ultimately depend on a variety of factors, such as how
|
||||||
|
widely used the kfunc is, how long the kfunc has been in the kernel, whether an
|
||||||
|
alternative kfunc exists, what the norm is in terms of stability for the
|
||||||
|
subsystem in question, and of course what the technical cost is of continuing
|
||||||
|
to support the kfunc.
|
||||||
|
|
||||||
|
There are several implications of this:
|
||||||
|
|
||||||
|
a) kfuncs that are widely used or have been in the kernel for a long time will
|
||||||
|
be more difficult to justify being changed or removed by a maintainer. In
|
||||||
|
other words, kfuncs that are known to have a lot of users and provide
|
||||||
|
significant value provide stronger incentives for maintainers to invest the
|
||||||
|
time and complexity in supporting them. It is therefore important for
|
||||||
|
developers that are using kfuncs in their BPF programs to communicate and
|
||||||
|
explain how and why those kfuncs are being used, and to participate in
|
||||||
|
discussions regarding those kfuncs when they occur upstream.
|
||||||
|
|
||||||
|
b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF programs
|
||||||
|
that call kfuncs are generally not part of the kernel tree. This means that
|
||||||
|
refactoring cannot typically change callers in-place when a kfunc changes,
|
||||||
|
as is done for e.g. an upstreamed driver being updated in place when a
|
||||||
|
kernel symbol is changed.
|
||||||
|
|
||||||
|
Unlike with regular kernel symbols, this is expected behavior for BPF
|
||||||
|
symbols, and out-of-tree BPF programs that use kfuncs should be considered
|
||||||
|
relevant to discussions and decisions around modifying and removing those
|
||||||
|
kfuncs. The BPF community will take an active role in participating in
|
||||||
|
upstream discussions when necessary to ensure that the perspectives of such
|
||||||
|
users are taken into account.
|
||||||
|
|
||||||
|
c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
|
||||||
|
will not ever hard-block a change in the kernel purely for stability
|
||||||
|
reasons. That being said, kfuncs are features that are meant to solve
|
||||||
|
problems and provide value to users. The decision of whether to change or
|
||||||
|
remove a kfunc is a multivariate technical decision that is made on a
|
||||||
|
case-by-case basis, and which is informed by data points such as those
|
||||||
|
mentioned above. It is expected that a kfunc being removed or changed with
|
||||||
|
no warning will not be a common occurrence or take place without sound
|
||||||
|
justification, but it is a possibility that must be accepted if one is to
|
||||||
|
use kfuncs.
|
||||||
|
|
||||||
|
3.1 kfunc deprecation
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
As described above, while sometimes a maintainer may find that a kfunc must be
|
||||||
|
changed or removed immediately to accommodate some changes in their subsystem,
|
||||||
|
usually kfuncs will be able to accommodate a longer and more measured
|
||||||
|
deprecation process. For example, if a new kfunc comes along which provides
|
||||||
|
superior functionality to an existing kfunc, the existing kfunc may be
|
||||||
|
deprecated for some period of time to allow users to migrate their BPF programs
|
||||||
|
to use the new one. Or, if a kfunc has no known users, a decision may be made
|
||||||
|
to remove the kfunc (without providing an alternative API) after some
|
||||||
|
deprecation period so as to provide users with a window to notify the kfunc
|
||||||
|
maintainer if it turns out that the kfunc is actually being used.
|
||||||
|
|
||||||
|
It's expected that the common case will be that kfuncs will go through a
|
||||||
|
deprecation period rather than being changed or removed without warning. As
|
||||||
|
described in :ref:`KF_deprecated_flag`, the kfunc framework provides the
|
||||||
|
KF_DEPRECATED flag to kfunc developers to signal to users that a kfunc has been
|
||||||
|
deprecated. Once a kfunc has been marked with KF_DEPRECATED, the following
|
||||||
|
procedure is followed for removal:
|
||||||
|
|
||||||
|
1. Any relevant information for deprecated kfuncs is documented in the kfunc's
|
||||||
|
kernel docs. This documentation will typically include the kfunc's expected
|
||||||
|
remaining lifespan, a recommendation for new functionality that can replace
|
||||||
|
the usage of the deprecated function (or an explanation as to why no such
|
||||||
|
replacement exists), etc.
|
||||||
|
|
||||||
|
2. The deprecated kfunc is kept in the kernel for some period of time after it
|
||||||
|
was first marked as deprecated. This time period will be chosen on a
|
||||||
|
case-by-case basis, and will typically depend on how widespread the use of
|
||||||
|
the kfunc is, how long it has been in the kernel, and how hard it is to move
|
||||||
|
to alternatives. This deprecation time period is "best effort", and as
|
||||||
|
described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
|
||||||
|
sometimes dictate that the kfunc be removed before the full intended
|
||||||
|
deprecation period has elapsed.
|
||||||
|
|
||||||
|
3. After the deprecation period the kfunc will be removed. At this point, BPF
|
||||||
|
programs calling the kfunc will be rejected by the verifier.
|
||||||
|
|
||||||
|
4. Core kfuncs
|
||||||
==============
|
==============
|
||||||
|
|
||||||
The BPF subsystem provides a number of "core" kfuncs that are potentially
|
The BPF subsystem provides a number of "core" kfuncs that are potentially
|
||||||
applicable to a wide variety of different possible use cases and programs.
|
applicable to a wide variety of different possible use cases and programs.
|
||||||
Those kfuncs are documented here.
|
Those kfuncs are documented here.
|
||||||
|
|
||||||
3.1 struct task_struct * kfuncs
|
4.1 struct task_struct * kfuncs
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
|
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
|
||||||
@ -306,7 +502,7 @@ Here is an example of it being used:
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
3.2 struct cgroup * kfuncs
|
4.2 struct cgroup * kfuncs
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
``struct cgroup *`` objects also have acquire and release functions:
|
``struct cgroup *`` objects also have acquire and release functions:
|
||||||
@ -420,3 +616,10 @@ the verifier. bpf_cgroup_ancestor() can be used as follows:
|
|||||||
bpf_cgroup_release(parent);
|
bpf_cgroup_release(parent);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
4.3 struct cpumask * kfuncs
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
|
||||||
|
destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
|
||||||
|
for more details.
|
||||||
|
@ -83,8 +83,8 @@ This prevents from accidentally exporting a symbol, that is not supposed
|
|||||||
to be a part of ABI what, in turn, improves both libbpf developer- and
|
to be a part of ABI what, in turn, improves both libbpf developer- and
|
||||||
user-experiences.
|
user-experiences.
|
||||||
|
|
||||||
ABI versionning
|
ABI versioning
|
||||||
---------------
|
--------------
|
||||||
|
|
||||||
To make future ABI extensions possible libbpf ABI is versioned.
|
To make future ABI extensions possible libbpf ABI is versioned.
|
||||||
Versioning is implemented by ``libbpf.map`` version script that is
|
Versioning is implemented by ``libbpf.map`` version script that is
|
||||||
@ -148,7 +148,7 @@ API documentation convention
|
|||||||
The libbpf API is documented via comments above definitions in
|
The libbpf API is documented via comments above definitions in
|
||||||
header files. These comments can be rendered by doxygen and sphinx
|
header files. These comments can be rendered by doxygen and sphinx
|
||||||
for well organized html output. This section describes the
|
for well organized html output. This section describes the
|
||||||
convention in which these comments should be formated.
|
convention in which these comments should be formatted.
|
||||||
|
|
||||||
Here is an example from btf.h:
|
Here is an example from btf.h:
|
||||||
|
|
||||||
|
498
Documentation/bpf/map_sockmap.rst
Normal file
498
Documentation/bpf/map_sockmap.rst
Normal file
@ -0,0 +1,498 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright Red Hat
|
||||||
|
|
||||||
|
==============================================
|
||||||
|
BPF_MAP_TYPE_SOCKMAP and BPF_MAP_TYPE_SOCKHASH
|
||||||
|
==============================================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_SOCKMAP`` was introduced in kernel version 4.14
|
||||||
|
- ``BPF_MAP_TYPE_SOCKHASH`` was introduced in kernel version 4.18
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_SOCKMAP`` and ``BPF_MAP_TYPE_SOCKHASH`` maps can be used to
|
||||||
|
redirect skbs between sockets or to apply policy at the socket level based on
|
||||||
|
the result of a BPF (verdict) program with the help of the BPF helpers
|
||||||
|
``bpf_sk_redirect_map()``, ``bpf_sk_redirect_hash()``,
|
||||||
|
``bpf_msg_redirect_map()`` and ``bpf_msg_redirect_hash()``.
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_SOCKMAP`` is backed by an array that uses an integer key as the
|
||||||
|
index to look up a reference to a ``struct sock``. The map values are socket
|
||||||
|
descriptors. Similarly, ``BPF_MAP_TYPE_SOCKHASH`` is a hash backed BPF map that
|
||||||
|
holds references to sockets via their socket descriptors.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
The value type is either __u32 or __u64; the latter (__u64) is to support
|
||||||
|
returning socket cookies to userspace. Returning the ``struct sock *`` that
|
||||||
|
the map holds to user-space is neither safe nor useful.
|
||||||
|
|
||||||
|
These maps may have BPF programs attached to them, specifically a parser program
|
||||||
|
and a verdict program. The parser program determines how much data has been
|
||||||
|
parsed and therefore how much data needs to be queued to come to a verdict. The
|
||||||
|
verdict program is essentially the redirect program and can return a verdict
|
||||||
|
of ``__SK_DROP``, ``__SK_PASS``, or ``__SK_REDIRECT``.
|
||||||
|
|
||||||
|
When a socket is inserted into one of these maps, its socket callbacks are
|
||||||
|
replaced and a ``struct sk_psock`` is attached to it. Additionally, this
|
||||||
|
``sk_psock`` inherits the programs that are attached to the map.
|
||||||
|
|
||||||
|
A sock object may be in multiple maps, but can only inherit a single
|
||||||
|
parse or verdict program. If adding a sock object to a map would result
|
||||||
|
in having multiple parser programs the update will return an EBUSY error.
|
||||||
|
|
||||||
|
The supported programs to attach to these maps are:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct sk_psock_progs {
|
||||||
|
struct bpf_prog *msg_parser;
|
||||||
|
struct bpf_prog *stream_parser;
|
||||||
|
struct bpf_prog *stream_verdict;
|
||||||
|
struct bpf_prog *skb_verdict;
|
||||||
|
};
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Users are not allowed to attach ``stream_verdict`` and ``skb_verdict``
|
||||||
|
programs to the same map.
|
||||||
|
|
||||||
|
The attach types for the map programs are:
|
||||||
|
|
||||||
|
- ``msg_parser`` program - ``BPF_SK_MSG_VERDICT``.
|
||||||
|
- ``stream_parser`` program - ``BPF_SK_SKB_STREAM_PARSER``.
|
||||||
|
- ``stream_verdict`` program - ``BPF_SK_SKB_STREAM_VERDICT``.
|
||||||
|
- ``skb_verdict`` program - ``BPF_SK_SKB_VERDICT``.
|
||||||
|
|
||||||
|
There are additional helpers available to use with the parser and verdict
|
||||||
|
programs: ``bpf_msg_apply_bytes()`` and ``bpf_msg_cork_bytes()``. With
|
||||||
|
``bpf_msg_apply_bytes()`` BPF programs can tell the infrastructure how many
|
||||||
|
bytes the given verdict should apply to. The helper ``bpf_msg_cork_bytes()``
|
||||||
|
handles a different case where a BPF program cannot reach a verdict on a msg
|
||||||
|
until it receives more bytes AND the program doesn't want to forward the packet
|
||||||
|
until it is known to be good.
|
||||||
|
|
||||||
|
Finally, the helpers ``bpf_msg_pull_data()`` and ``bpf_msg_push_data()`` are
|
||||||
|
available to ``BPF_PROG_TYPE_SK_MSG`` BPF programs to pull in data and set the
|
||||||
|
start and end pointers to given values or to add metadata to the ``struct
|
||||||
|
sk_msg_buff *msg``.
|
||||||
|
|
||||||
|
All these helpers will be described in more detail below.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
bpf_msg_redirect_map()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 key, u64 flags)
|
||||||
|
|
||||||
|
This helper is used in programs implementing policies at the socket level. If
|
||||||
|
the message ``msg`` is allowed to pass (i.e., if the verdict BPF program
|
||||||
|
returns ``SK_PASS``), redirect it to the socket referenced by ``map`` (of type
|
||||||
|
``BPF_MAP_TYPE_SOCKMAP``) at index ``key``. Both ingress and egress interfaces
|
||||||
|
can be used for redirection. The ``BPF_F_INGRESS`` value in ``flags`` is used
|
||||||
|
to select the ingress path otherwise the egress path is selected. This is the
|
||||||
|
only flag supported for now.
|
||||||
|
|
||||||
|
Returns ``SK_PASS`` on success, or ``SK_DROP`` on error.
|
||||||
|
|
||||||
|
bpf_sk_redirect_map()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map, u32 key u64 flags)
|
||||||
|
|
||||||
|
Redirect the packet to the socket referenced by ``map`` (of type
|
||||||
|
``BPF_MAP_TYPE_SOCKMAP``) at index ``key``. Both ingress and egress interfaces
|
||||||
|
can be used for redirection. The ``BPF_F_INGRESS`` value in ``flags`` is used
|
||||||
|
to select the ingress path otherwise the egress path is selected. This is the
|
||||||
|
only flag supported for now.
|
||||||
|
|
||||||
|
Returns ``SK_PASS`` on success, or ``SK_DROP`` on error.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
socket entries of type ``struct sock *`` can be retrieved using the
|
||||||
|
``bpf_map_lookup_elem()`` helper.
|
||||||
|
|
||||||
|
bpf_sock_map_update()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
|
||||||
|
|
||||||
|
Add an entry to, or update a ``map`` referencing sockets. The ``skops`` is used
|
||||||
|
as a new value for the entry associated to ``key``. The ``flags`` argument can
|
||||||
|
be one of the following:
|
||||||
|
|
||||||
|
- ``BPF_ANY``: Create a new element or update an existing element.
|
||||||
|
- ``BPF_NOEXIST``: Create a new element only if it did not exist.
|
||||||
|
- ``BPF_EXIST``: Update an existing element.
|
||||||
|
|
||||||
|
If the ``map`` has BPF programs (parser and verdict), those will be inherited
|
||||||
|
by the socket being added. If the socket is already attached to BPF programs,
|
||||||
|
this results in an error.
|
||||||
|
|
||||||
|
Returns 0 on success, or a negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_sock_hash_update()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
|
||||||
|
|
||||||
|
Add an entry to, or update a sockhash ``map`` referencing sockets. The ``skops``
|
||||||
|
is used as a new value for the entry associated to ``key``.
|
||||||
|
|
||||||
|
The ``flags`` argument can be one of the following:
|
||||||
|
|
||||||
|
- ``BPF_ANY``: Create a new element or update an existing element.
|
||||||
|
- ``BPF_NOEXIST``: Create a new element only if it did not exist.
|
||||||
|
- ``BPF_EXIST``: Update an existing element.
|
||||||
|
|
||||||
|
If the ``map`` has BPF programs (parser and verdict), those will be inherited
|
||||||
|
by the socket being added. If the socket is already attached to BPF programs,
|
||||||
|
this results in an error.
|
||||||
|
|
||||||
|
Returns 0 on success, or a negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_msg_redirect_hash()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags)
|
||||||
|
|
||||||
|
This helper is used in programs implementing policies at the socket level. If
|
||||||
|
the message ``msg`` is allowed to pass (i.e., if the verdict BPF program returns
|
||||||
|
``SK_PASS``), redirect it to the socket referenced by ``map`` (of type
|
||||||
|
``BPF_MAP_TYPE_SOCKHASH``) using hash ``key``. Both ingress and egress
|
||||||
|
interfaces can be used for redirection. The ``BPF_F_INGRESS`` value in
|
||||||
|
``flags`` is used to select the ingress path otherwise the egress path is
|
||||||
|
selected. This is the only flag supported for now.
|
||||||
|
|
||||||
|
Returns ``SK_PASS`` on success, or ``SK_DROP`` on error.
|
||||||
|
|
||||||
|
bpf_sk_redirect_hash()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags)
|
||||||
|
|
||||||
|
This helper is used in programs implementing policies at the skb socket level.
|
||||||
|
If the sk_buff ``skb`` is allowed to pass (i.e., if the verdict BPF program
|
||||||
|
returns ``SK_PASS``), redirect it to the socket referenced by ``map`` (of type
|
||||||
|
``BPF_MAP_TYPE_SOCKHASH``) using hash ``key``. Both ingress and egress
|
||||||
|
interfaces can be used for redirection. The ``BPF_F_INGRESS`` value in
|
||||||
|
``flags`` is used to select the ingress path otherwise the egress path is
|
||||||
|
selected. This is the only flag supported for now.
|
||||||
|
|
||||||
|
Returns ``SK_PASS`` on success, or ``SK_DROP`` on error.
|
||||||
|
|
||||||
|
bpf_msg_apply_bytes()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
|
||||||
|
|
||||||
|
For socket policies, apply the verdict of the BPF program to the next (number
|
||||||
|
of ``bytes``) of message ``msg``. For example, this helper can be used in the
|
||||||
|
following cases:
|
||||||
|
|
||||||
|
- A single ``sendmsg()`` or ``sendfile()`` system call contains multiple
|
||||||
|
logical messages that the BPF program is supposed to read and for which it
|
||||||
|
should apply a verdict.
|
||||||
|
- A BPF program only cares to read the first ``bytes`` of a ``msg``. If the
|
||||||
|
message has a large payload, then setting up and calling the BPF program
|
||||||
|
repeatedly for all bytes, even though the verdict is already known, would
|
||||||
|
create unnecessary overhead.
|
||||||
|
|
||||||
|
Returns 0
|
||||||
|
|
||||||
|
bpf_msg_cork_bytes()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
|
||||||
|
|
||||||
|
For socket policies, prevent the execution of the verdict BPF program for
|
||||||
|
message ``msg`` until the number of ``bytes`` have been accumulated.
|
||||||
|
|
||||||
|
This can be used when one needs a specific number of bytes before a verdict can
|
||||||
|
be assigned, even if the data spans multiple ``sendmsg()`` or ``sendfile()``
|
||||||
|
calls.
|
||||||
|
|
||||||
|
Returns 0
|
||||||
|
|
||||||
|
bpf_msg_pull_data()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64 flags)
|
||||||
|
|
||||||
|
For socket policies, pull in non-linear data from user space for ``msg`` and set
|
||||||
|
pointers ``msg->data`` and ``msg->data_end`` to ``start`` and ``end`` bytes
|
||||||
|
offsets into ``msg``, respectively.
|
||||||
|
|
||||||
|
If a program of type ``BPF_PROG_TYPE_SK_MSG`` is run on a ``msg`` it can only
|
||||||
|
parse data that the (``data``, ``data_end``) pointers have already consumed.
|
||||||
|
For ``sendmsg()`` hooks this is likely the first scatterlist element. But for
|
||||||
|
calls relying on the ``sendpage`` handler (e.g., ``sendfile()``) this will be
|
||||||
|
the range (**0**, **0**) because the data is shared with user space and by
|
||||||
|
default the objective is to avoid allowing user space to modify data while (or
|
||||||
|
after) BPF verdict is being decided. This helper can be used to pull in data
|
||||||
|
and to set the start and end pointers to given values. Data will be copied if
|
||||||
|
necessary (i.e., if data was not linear and if start and end pointers do not
|
||||||
|
point to the same chunk).
|
||||||
|
|
||||||
|
A call to this helper is susceptible to change the underlying packet buffer.
|
||||||
|
Therefore, at load time, all checks on pointers previously done by the verifier
|
||||||
|
are invalidated and must be performed again, if the helper is used in
|
||||||
|
combination with direct packet access.
|
||||||
|
|
||||||
|
All values for ``flags`` are reserved for future usage, and must be left at
|
||||||
|
zero.
|
||||||
|
|
||||||
|
Returns 0 on success, or a negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
Look up a socket entry in the sockmap or sockhash map.
|
||||||
|
|
||||||
|
Returns the socket entry associated to ``key``, or NULL if no entry was found.
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||||
|
|
||||||
|
Add or update a socket entry in a sockmap or sockhash.
|
||||||
|
|
||||||
|
The flags argument can be one of the following:
|
||||||
|
|
||||||
|
- BPF_ANY: Create a new element or update an existing element.
|
||||||
|
- BPF_NOEXIST: Create a new element only if it did not exist.
|
||||||
|
- BPF_EXIST: Update an existing element.
|
||||||
|
|
||||||
|
Returns 0 on success, or a negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
Delete a socket entry from a sockmap or a sockhash.
|
||||||
|
|
||||||
|
Returns 0 on success, or a negative error in case of failure.
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
bpf_map_update_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
|
||||||
|
|
||||||
|
Sockmap entries can be added or updated using the ``bpf_map_update_elem()``
|
||||||
|
function. The ``key`` parameter is the index value of the sockmap array. And the
|
||||||
|
``value`` parameter is the FD value of that socket.
|
||||||
|
|
||||||
|
Under the hood, the sockmap update function uses the socket FD value to
|
||||||
|
retrieve the associated socket and its attached psock.
|
||||||
|
|
||||||
|
The flags argument can be one of the following:
|
||||||
|
|
||||||
|
- BPF_ANY: Create a new element or update an existing element.
|
||||||
|
- BPF_NOEXIST: Create a new element only if it did not exist.
|
||||||
|
- BPF_EXIST: Update an existing element.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_elem(int fd, const void *key, void *value)
|
||||||
|
|
||||||
|
Sockmap entries can be retrieved using the ``bpf_map_lookup_elem()`` function.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
The entry returned is a socket cookie rather than a socket itself.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_delete_elem(int fd, const void *key)
|
||||||
|
|
||||||
|
Sockmap entries can be deleted using the ``bpf_map_delete_elem()``
|
||||||
|
function.
|
||||||
|
|
||||||
|
Returns 0 on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
Several examples of the use of sockmap APIs can be found in:
|
||||||
|
|
||||||
|
- `tools/testing/selftests/bpf/progs/test_sockmap_kern.h`_
|
||||||
|
- `tools/testing/selftests/bpf/progs/sockmap_parse_prog.c`_
|
||||||
|
- `tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c`_
|
||||||
|
- `tools/testing/selftests/bpf/progs/test_sockmap_listen.c`_
|
||||||
|
- `tools/testing/selftests/bpf/progs/test_sockmap_update.c`_
|
||||||
|
|
||||||
|
The following code snippet shows how to declare a sockmap.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_SOCKMAP);
|
||||||
|
__uint(max_entries, 1);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, __u64);
|
||||||
|
} sock_map_rx SEC(".maps");
|
||||||
|
|
||||||
|
The following code snippet shows a sample parser program.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
SEC("sk_skb/stream_parser")
|
||||||
|
int bpf_prog_parser(struct __sk_buff *skb)
|
||||||
|
{
|
||||||
|
return skb->len;
|
||||||
|
}
|
||||||
|
|
||||||
|
The following code snippet shows a simple verdict program that interacts with a
|
||||||
|
sockmap to redirect traffic to another socket based on the local port.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
SEC("sk_skb/stream_verdict")
|
||||||
|
int bpf_prog_verdict(struct __sk_buff *skb)
|
||||||
|
{
|
||||||
|
__u32 lport = skb->local_port;
|
||||||
|
__u32 idx = 0;
|
||||||
|
|
||||||
|
if (lport == 10000)
|
||||||
|
return bpf_sk_redirect_map(skb, &sock_map_rx, idx, 0);
|
||||||
|
|
||||||
|
return SK_PASS;
|
||||||
|
}
|
||||||
|
|
||||||
|
The following code snippet shows how to declare a sockhash map.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct socket_key {
|
||||||
|
__u32 src_ip;
|
||||||
|
__u32 dst_ip;
|
||||||
|
__u32 src_port;
|
||||||
|
__u32 dst_port;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_SOCKHASH);
|
||||||
|
__uint(max_entries, 1);
|
||||||
|
__type(key, struct socket_key);
|
||||||
|
__type(value, __u64);
|
||||||
|
} sock_hash_rx SEC(".maps");
|
||||||
|
|
||||||
|
The following code snippet shows a simple verdict program that interacts with a
|
||||||
|
sockhash to redirect traffic to another socket based on a hash of some of the
|
||||||
|
skb parameters.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
static inline
|
||||||
|
void extract_socket_key(struct __sk_buff *skb, struct socket_key *key)
|
||||||
|
{
|
||||||
|
key->src_ip = skb->remote_ip4;
|
||||||
|
key->dst_ip = skb->local_ip4;
|
||||||
|
key->src_port = skb->remote_port >> 16;
|
||||||
|
key->dst_port = (bpf_htonl(skb->local_port)) >> 16;
|
||||||
|
}
|
||||||
|
|
||||||
|
SEC("sk_skb/stream_verdict")
|
||||||
|
int bpf_prog_verdict(struct __sk_buff *skb)
|
||||||
|
{
|
||||||
|
struct socket_key key;
|
||||||
|
|
||||||
|
extract_socket_key(skb, &key);
|
||||||
|
|
||||||
|
return bpf_sk_redirect_hash(skb, &sock_hash_rx, &key, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
Several examples of the use of sockmap APIs can be found in:
|
||||||
|
|
||||||
|
- `tools/testing/selftests/bpf/prog_tests/sockmap_basic.c`_
|
||||||
|
- `tools/testing/selftests/bpf/test_sockmap.c`_
|
||||||
|
- `tools/testing/selftests/bpf/test_maps.c`_
|
||||||
|
|
||||||
|
The following code sample shows how to create a sockmap, attach a parser and
|
||||||
|
verdict program, as well as add a socket entry.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int create_sample_sockmap(int sock, int parse_prog_fd, int verdict_prog_fd)
|
||||||
|
{
|
||||||
|
int index = 0;
|
||||||
|
int map, err;
|
||||||
|
|
||||||
|
map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int), sizeof(int), 1, NULL);
|
||||||
|
if (map < 0) {
|
||||||
|
fprintf(stderr, "Failed to create sockmap: %s\n", strerror(errno));
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
err = bpf_prog_attach(parse_prog_fd, map, BPF_SK_SKB_STREAM_PARSER, 0);
|
||||||
|
if (err){
|
||||||
|
fprintf(stderr, "Failed to attach_parser_prog_to_map: %s\n", strerror(errno));
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
|
err = bpf_prog_attach(verdict_prog_fd, map, BPF_SK_SKB_STREAM_VERDICT, 0);
|
||||||
|
if (err){
|
||||||
|
fprintf(stderr, "Failed to attach_verdict_prog_to_map: %s\n", strerror(errno));
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
|
err = bpf_map_update_elem(map, &index, &sock, BPF_NOEXIST);
|
||||||
|
if (err) {
|
||||||
|
fprintf(stderr, "Failed to update sockmap: %s\n", strerror(errno));
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
|
out:
|
||||||
|
close(map);
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
|
References
|
||||||
|
===========
|
||||||
|
|
||||||
|
- https://github.com/jrfastab/linux-kernel-xdp/commit/c89fd73cb9d2d7f3c716c3e00836f07b1aeb261f
|
||||||
|
- https://lwn.net/Articles/731133/
|
||||||
|
- http://vger.kernel.org/lpc_net2018_talks/ktls_bpf_paper.pdf
|
||||||
|
- https://lwn.net/Articles/748628/
|
||||||
|
- https://lore.kernel.org/bpf/20200218171023.844439-7-jakub@cloudflare.com/
|
||||||
|
|
||||||
|
.. _`tools/testing/selftests/bpf/progs/test_sockmap_kern.h`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
|
||||||
|
.. _`tools/testing/selftests/bpf/progs/sockmap_parse_prog.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/sockmap_parse_prog.c
|
||||||
|
.. _`tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c
|
||||||
|
.. _`tools/testing/selftests/bpf/prog_tests/sockmap_basic.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
|
||||||
|
.. _`tools/testing/selftests/bpf/test_sockmap.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_sockmap.c
|
||||||
|
.. _`tools/testing/selftests/bpf/test_maps.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_maps.c
|
||||||
|
.. _`tools/testing/selftests/bpf/progs/test_sockmap_listen.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
|
||||||
|
.. _`tools/testing/selftests/bpf/progs/test_sockmap_update.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_update.c
|
@ -178,7 +178,7 @@ The following code snippet shows how to update an XSKMAP with an XSK entry.
|
|||||||
|
|
||||||
For an example on how create AF_XDP sockets, please see the AF_XDP-example and
|
For an example on how create AF_XDP sockets, please see the AF_XDP-example and
|
||||||
AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
|
AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
|
||||||
For a detailed explaination of the AF_XDP interface please see:
|
For a detailed explanation of the AF_XDP interface please see:
|
||||||
|
|
||||||
- `libxdp-readme`_.
|
- `libxdp-readme`_.
|
||||||
- `AF_XDP`_ kernel documentation.
|
- `AF_XDP`_ kernel documentation.
|
||||||
|
@ -6,4 +6,5 @@ Other
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
ringbuf
|
ringbuf
|
||||||
llvm_reloc
|
llvm_reloc
|
||||||
|
graph_ds_impl
|
||||||
|
@ -124,7 +124,7 @@ buffer. Currently 4 are supported:
|
|||||||
|
|
||||||
- ``BPF_RB_AVAIL_DATA`` returns amount of unconsumed data in ring buffer;
|
- ``BPF_RB_AVAIL_DATA`` returns amount of unconsumed data in ring buffer;
|
||||||
- ``BPF_RB_RING_SIZE`` returns the size of ring buffer;
|
- ``BPF_RB_RING_SIZE`` returns the size of ring buffer;
|
||||||
- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical possition
|
- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical position
|
||||||
of consumer/producer, respectively.
|
of consumer/producer, respectively.
|
||||||
|
|
||||||
Returned values are momentarily snapshots of ring buffer state and could be
|
Returned values are momentarily snapshots of ring buffer state and could be
|
||||||
@ -146,7 +146,7 @@ Design and Implementation
|
|||||||
This reserve/commit schema allows a natural way for multiple producers, either
|
This reserve/commit schema allows a natural way for multiple producers, either
|
||||||
on different CPUs or even on the same CPU/in the same BPF program, to reserve
|
on different CPUs or even on the same CPU/in the same BPF program, to reserve
|
||||||
independent records and work with them without blocking other producers. This
|
independent records and work with them without blocking other producers. This
|
||||||
means that if BPF program was interruped by another BPF program sharing the
|
means that if BPF program was interrupted by another BPF program sharing the
|
||||||
same ring buffer, they will both get a record reserved (provided there is
|
same ring buffer, they will both get a record reserved (provided there is
|
||||||
enough space left) and can work with it and submit it independently. This
|
enough space left) and can work with it and submit it independently. This
|
||||||
applies to NMI context as well, except that due to using a spinlock during
|
applies to NMI context as well, except that due to using a spinlock during
|
||||||
|
@ -192,7 +192,7 @@ checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
|
|||||||
As well as range-checking, the tracked information is also used for enforcing
|
As well as range-checking, the tracked information is also used for enforcing
|
||||||
alignment of pointer accesses. For instance, on most systems the packet pointer
|
alignment of pointer accesses. For instance, on most systems the packet pointer
|
||||||
is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump
|
is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump
|
||||||
over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
|
over the Ethernet header, then reads IHL and adds (IHL * 4), the resulting
|
||||||
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
|
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
|
||||||
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
|
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
|
||||||
that pointer are safe.
|
that pointer are safe.
|
||||||
@ -316,6 +316,301 @@ Pruning considers not only the registers but also the stack (and any spilled
|
|||||||
registers it may hold). They must all be safe for the branch to be pruned.
|
registers it may hold). They must all be safe for the branch to be pruned.
|
||||||
This is implemented in states_equal().
|
This is implemented in states_equal().
|
||||||
|
|
||||||
|
Some technical details about state pruning implementation could be found below.
|
||||||
|
|
||||||
|
Register liveness tracking
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
In order to make state pruning effective, liveness state is tracked for each
|
||||||
|
register and stack slot. The basic idea is to track which registers and stack
|
||||||
|
slots are actually used during subseqeuent execution of the program, until
|
||||||
|
program exit is reached. Registers and stack slots that were never used could be
|
||||||
|
removed from the cached state thus making more states equivalent to a cached
|
||||||
|
state. This could be illustrated by the following program::
|
||||||
|
|
||||||
|
0: call bpf_get_prandom_u32()
|
||||||
|
1: r1 = 0
|
||||||
|
2: if r0 == 0 goto +1
|
||||||
|
3: r0 = 1
|
||||||
|
--- checkpoint ---
|
||||||
|
4: r0 = r1
|
||||||
|
5: exit
|
||||||
|
|
||||||
|
Suppose that a state cache entry is created at instruction #4 (such entries are
|
||||||
|
also called "checkpoints" in the text below). The verifier could reach the
|
||||||
|
instruction with one of two possible register states:
|
||||||
|
|
||||||
|
* r0 = 1, r1 = 0
|
||||||
|
* r0 = 0, r1 = 0
|
||||||
|
|
||||||
|
However, only the value of register ``r1`` is important to successfully finish
|
||||||
|
verification. The goal of the liveness tracking algorithm is to spot this fact
|
||||||
|
and figure out that both states are actually equivalent.
|
||||||
|
|
||||||
|
Data structures
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Liveness is tracked using the following data structures::
|
||||||
|
|
||||||
|
enum bpf_reg_liveness {
|
||||||
|
REG_LIVE_NONE = 0,
|
||||||
|
REG_LIVE_READ32 = 0x1,
|
||||||
|
REG_LIVE_READ64 = 0x2,
|
||||||
|
REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
|
||||||
|
REG_LIVE_WRITTEN = 0x4,
|
||||||
|
REG_LIVE_DONE = 0x8,
|
||||||
|
};
|
||||||
|
|
||||||
|
struct bpf_reg_state {
|
||||||
|
...
|
||||||
|
struct bpf_reg_state *parent;
|
||||||
|
...
|
||||||
|
enum bpf_reg_liveness live;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
struct bpf_stack_state {
|
||||||
|
struct bpf_reg_state spilled_ptr;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
struct bpf_func_state {
|
||||||
|
struct bpf_reg_state regs[MAX_BPF_REG];
|
||||||
|
...
|
||||||
|
struct bpf_stack_state *stack;
|
||||||
|
}
|
||||||
|
|
||||||
|
struct bpf_verifier_state {
|
||||||
|
struct bpf_func_state *frame[MAX_CALL_FRAMES];
|
||||||
|
struct bpf_verifier_state *parent;
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new
|
||||||
|
verifier state creation;
|
||||||
|
|
||||||
|
* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is
|
||||||
|
defined by some instruction verified between this verifier state's parent and
|
||||||
|
verifier state itself;
|
||||||
|
|
||||||
|
* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot)
|
||||||
|
is read by a some child state of this verifier state;
|
||||||
|
|
||||||
|
* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid
|
||||||
|
processing same verifier state multiple times and for some sanity checks;
|
||||||
|
|
||||||
|
* ``->live`` field values are formed by combining ``enum bpf_reg_liveness``
|
||||||
|
values using bitwise or.
|
||||||
|
|
||||||
|
Register parentage chains
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In order to propagate information between parent and child states, a *register
|
||||||
|
parentage chain* is established. Each register or stack slot is linked to a
|
||||||
|
corresponding register or stack slot in its parent state via a ``->parent``
|
||||||
|
pointer. This link is established upon state creation in ``is_state_visited()``
|
||||||
|
and might be modified by ``set_callee_state()`` called from
|
||||||
|
``__check_func_call()``.
|
||||||
|
|
||||||
|
The rules for correspondence between registers / stack slots are as follows:
|
||||||
|
|
||||||
|
* For the current stack frame, registers and stack slots of the new state are
|
||||||
|
linked to the registers and stack slots of the parent state with the same
|
||||||
|
indices.
|
||||||
|
|
||||||
|
* For the outer stack frames, only caller saved registers (r6-r9) and stack
|
||||||
|
slots are linked to the registers and stack slots of the parent state with the
|
||||||
|
same indices.
|
||||||
|
|
||||||
|
* When function call is processed a new ``struct bpf_func_state`` instance is
|
||||||
|
allocated, it encapsulates a new set of registers and stack slots. For this
|
||||||
|
new frame, parent links for r6-r9 and stack slots are set to nil, parent links
|
||||||
|
for r1-r5 are set to match caller r1-r5 parent links.
|
||||||
|
|
||||||
|
This could be illustrated by the following diagram (arrows stand for
|
||||||
|
``->parent`` pointers)::
|
||||||
|
|
||||||
|
... ; Frame #0, some instructions
|
||||||
|
--- checkpoint #0 ---
|
||||||
|
1 : r6 = 42 ; Frame #0
|
||||||
|
--- checkpoint #1 ---
|
||||||
|
2 : call foo() ; Frame #0
|
||||||
|
... ; Frame #1, instructions from foo()
|
||||||
|
--- checkpoint #2 ---
|
||||||
|
... ; Frame #1, instructions from foo()
|
||||||
|
--- checkpoint #3 ---
|
||||||
|
exit ; Frame #1, return from foo()
|
||||||
|
3 : r1 = r6 ; Frame #0 <- current state
|
||||||
|
|
||||||
|
+-------------------------------+-------------------------------+
|
||||||
|
| Frame #0 | Frame #1 |
|
||||||
|
Checkpoint +-------------------------------+-------------------------------+
|
||||||
|
#0 | r0 | r1-r5 | r6-r9 | fp-8 ... |
|
||||||
|
+-------------------------------+
|
||||||
|
^ ^ ^ ^
|
||||||
|
| | | |
|
||||||
|
Checkpoint +-------------------------------+
|
||||||
|
#1 | r0 | r1-r5 | r6-r9 | fp-8 ... |
|
||||||
|
+-------------------------------+
|
||||||
|
^ ^ ^
|
||||||
|
|_______|_______|_______________
|
||||||
|
| | |
|
||||||
|
nil nil | | | nil nil
|
||||||
|
| | | | | | |
|
||||||
|
Checkpoint +-------------------------------+-------------------------------+
|
||||||
|
#2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
|
||||||
|
+-------------------------------+-------------------------------+
|
||||||
|
^ ^ ^ ^ ^
|
||||||
|
nil nil | | | | |
|
||||||
|
| | | | | | |
|
||||||
|
Checkpoint +-------------------------------+-------------------------------+
|
||||||
|
#3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
|
||||||
|
+-------------------------------+-------------------------------+
|
||||||
|
^ ^
|
||||||
|
nil nil | |
|
||||||
|
| | | |
|
||||||
|
Current +-------------------------------+
|
||||||
|
state | r0 | r1-r5 | r6-r9 | fp-8 ... |
|
||||||
|
+-------------------------------+
|
||||||
|
\
|
||||||
|
r6 read mark is propagated via these links
|
||||||
|
all the way up to checkpoint #1.
|
||||||
|
The checkpoint #1 contains a write mark for r6
|
||||||
|
because of instruction (1), thus read propagation
|
||||||
|
does not reach checkpoint #0 (see section below).
|
||||||
|
|
||||||
|
Liveness marks tracking
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
For each processed instruction, the verifier tracks read and written registers
|
||||||
|
and stack slots. The main idea of the algorithm is that read marks propagate
|
||||||
|
back along the state parentage chain until they hit a write mark, which 'screens
|
||||||
|
off' earlier states from the read. The information about reads is propagated by
|
||||||
|
function ``mark_reg_read()`` which could be summarized as follows::
|
||||||
|
|
||||||
|
mark_reg_read(struct bpf_reg_state *state, ...):
|
||||||
|
parent = state->parent
|
||||||
|
while parent:
|
||||||
|
if state->live & REG_LIVE_WRITTEN:
|
||||||
|
break
|
||||||
|
if parent->live & REG_LIVE_READ64:
|
||||||
|
break
|
||||||
|
parent->live |= REG_LIVE_READ64
|
||||||
|
state = parent
|
||||||
|
parent = state->parent
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
* The read marks are applied to the **parent** state while write marks are
|
||||||
|
applied to the **current** state. The write mark on a register or stack slot
|
||||||
|
means that it is updated by some instruction in the straight-line code leading
|
||||||
|
from the parent state to the current state.
|
||||||
|
|
||||||
|
* Details about REG_LIVE_READ32 are omitted.
|
||||||
|
|
||||||
|
* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
|
||||||
|
might override the first parent link. Please refer to the comments in the
|
||||||
|
``propagate_liveness()`` and ``mark_reg_read()`` source code for further
|
||||||
|
details.
|
||||||
|
|
||||||
|
Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are
|
||||||
|
applied conservatively: stack slots are marked as written only if write size
|
||||||
|
corresponds to the size of the register, e.g. see function ``save_register_state()``.
|
||||||
|
|
||||||
|
Consider the following example::
|
||||||
|
|
||||||
|
0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8
|
||||||
|
--- checkpoint #0 ---
|
||||||
|
1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes
|
||||||
|
2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1)
|
||||||
|
3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0)
|
||||||
|
|
||||||
|
As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should
|
||||||
|
it be otherwise, the algorithm above wouldn't be able to propagate the read mark
|
||||||
|
from (3) to checkpoint #0.
|
||||||
|
|
||||||
|
Once the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is
|
||||||
|
called to update the ``->branches`` counter for each verifier state in a chain
|
||||||
|
of parent verifier states. When the ``->branches`` counter reaches zero the
|
||||||
|
verifier state becomes a valid entry in a set of cached verifier states.
|
||||||
|
|
||||||
|
Each entry of the verifier states cache is post-processed by a function
|
||||||
|
``clean_live_states()``. This function marks all registers and stack slots
|
||||||
|
without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``.
|
||||||
|
Registers/stack slots marked in this way are ignored in function ``stacksafe()``
|
||||||
|
called from ``states_equal()`` when a state cache entry is considered for
|
||||||
|
equivalence with a current state.
|
||||||
|
|
||||||
|
Now it is possible to explain how the example from the beginning of the section
|
||||||
|
works::
|
||||||
|
|
||||||
|
0: call bpf_get_prandom_u32()
|
||||||
|
1: r1 = 0
|
||||||
|
2: if r0 == 0 goto +1
|
||||||
|
3: r0 = 1
|
||||||
|
--- checkpoint[0] ---
|
||||||
|
4: r0 = r1
|
||||||
|
5: exit
|
||||||
|
|
||||||
|
* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }``
|
||||||
|
is pushed to states processing queue (pc stands for program counter).
|
||||||
|
|
||||||
|
* At instruction #4:
|
||||||
|
|
||||||
|
* ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``;
|
||||||
|
* ``checkpoint[0].r0`` is marked as written;
|
||||||
|
* ``checkpoint[0].r1`` is marked as read;
|
||||||
|
|
||||||
|
* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed
|
||||||
|
by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a
|
||||||
|
read mark and all other registers and stack slots are marked as ``NOT_INIT``
|
||||||
|
or ``STACK_INVALID``
|
||||||
|
|
||||||
|
* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue
|
||||||
|
and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states
|
||||||
|
are considered equivalent.
|
||||||
|
|
||||||
|
.. _read_marks_for_cache_hits:
|
||||||
|
|
||||||
|
Read marks propagation for cache hits
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Another point is the handling of read marks when a previously verified state is
|
||||||
|
found in the states cache. Upon cache hit verifier must behave in the same way
|
||||||
|
as if the current state was verified to the program exit. This means that all
|
||||||
|
read marks, present on registers and stack slots of the cached state, must be
|
||||||
|
propagated over the parentage chain of the current state. Example below shows
|
||||||
|
why this is important. Function ``propagate_liveness()`` handles this case.
|
||||||
|
|
||||||
|
Consider the following state parentage chain (S is a starting state, A-E are
|
||||||
|
derived states, -> arrows show which state is derived from which)::
|
||||||
|
|
||||||
|
r1 read
|
||||||
|
<------------- A[r1] == 0
|
||||||
|
C[r1] == 0
|
||||||
|
S ---> A ---> B ---> exit E[r1] == 1
|
||||||
|
|
|
||||||
|
` ---> C ---> D
|
||||||
|
|
|
||||||
|
` ---> E ^
|
||||||
|
|___ suppose all these
|
||||||
|
^ states are at insn #Y
|
||||||
|
|
|
||||||
|
suppose all these
|
||||||
|
states are at insn #X
|
||||||
|
|
||||||
|
* Chain of states ``S -> A -> B -> exit`` is verified first.
|
||||||
|
|
||||||
|
* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is
|
||||||
|
propagated up to state ``A``.
|
||||||
|
|
||||||
|
* When chain of states ``C -> D`` is verified the state ``D`` turns out to be
|
||||||
|
equivalent to state ``B``.
|
||||||
|
|
||||||
|
* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state
|
||||||
|
``C`` might get mistakenly marked as equivalent to state ``E`` even though
|
||||||
|
values for register ``r1`` differ between ``C`` and ``E``.
|
||||||
|
|
||||||
Understanding eBPF verifier messages
|
Understanding eBPF verifier messages
|
||||||
====================================
|
====================================
|
||||||
|
|
||||||
|
@ -116,6 +116,9 @@ if major >= 3:
|
|||||||
|
|
||||||
# include/linux/linkage.h:
|
# include/linux/linkage.h:
|
||||||
"asmlinkage",
|
"asmlinkage",
|
||||||
|
|
||||||
|
# include/linux/btf.h
|
||||||
|
"__bpf_kfunc",
|
||||||
]
|
]
|
||||||
|
|
||||||
else:
|
else:
|
||||||
|
@ -127,6 +127,7 @@ Documents that don't fit elsewhere or which have yet to be categorized.
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
librs
|
librs
|
||||||
|
netlink
|
||||||
|
|
||||||
.. only:: subproject and html
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
101
Documentation/core-api/netlink.rst
Normal file
101
Documentation/core-api/netlink.rst
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
.. SPDX-License-Identifier: BSD-3-Clause
|
||||||
|
|
||||||
|
.. _kernel_netlink:
|
||||||
|
|
||||||
|
===================================
|
||||||
|
Netlink notes for kernel developers
|
||||||
|
===================================
|
||||||
|
|
||||||
|
General guidance
|
||||||
|
================
|
||||||
|
|
||||||
|
Attribute enums
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Older families often define "null" attributes and commands with value
|
||||||
|
of ``0`` and named ``unspec``. This is supported (``type: unused``)
|
||||||
|
but should be avoided in new families. The ``unspec`` enum values are
|
||||||
|
not used in practice, so just set the value of the first attribute to ``1``.
|
||||||
|
|
||||||
|
Message enums
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Use the same command IDs for requests and replies. This makes it easier
|
||||||
|
to match them up, and we have plenty of ID space.
|
||||||
|
|
||||||
|
Use separate command IDs for notifications. This makes it easier to
|
||||||
|
sort the notifications from replies (and present them to the user
|
||||||
|
application via a different API than replies).
|
||||||
|
|
||||||
|
Answer requests
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Older families do not reply to all of the commands, especially NEW / ADD
|
||||||
|
commands. User only gets information whether the operation succeeded or
|
||||||
|
not via the ACK. Try to find useful data to return. Once the command is
|
||||||
|
added whether it replies with a full message or only an ACK is uAPI and
|
||||||
|
cannot be changed. It's better to err on the side of replying.
|
||||||
|
|
||||||
|
Specifically NEW and ADD commands should reply with information identifying
|
||||||
|
the created object such as the allocated object's ID (without having to
|
||||||
|
resort to using ``NLM_F_ECHO``).
|
||||||
|
|
||||||
|
NLM_F_ECHO
|
||||||
|
----------
|
||||||
|
|
||||||
|
Make sure to pass the request info to genl_notify() to allow ``NLM_F_ECHO``
|
||||||
|
to take effect. This is useful for programs that need precise feedback
|
||||||
|
from the kernel (for example for logging purposes).
|
||||||
|
|
||||||
|
Support dump consistency
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
If iterating over objects during dump may skip over objects or repeat
|
||||||
|
them - make sure to report dump inconsistency with ``NLM_F_DUMP_INTR``.
|
||||||
|
This is usually implemented by maintaining a generation id for the
|
||||||
|
structure and recording it in the ``seq`` member of struct netlink_callback.
|
||||||
|
|
||||||
|
Netlink specification
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Documentation of the Netlink specification parts which are only relevant
|
||||||
|
to the kernel space.
|
||||||
|
|
||||||
|
Globals
|
||||||
|
-------
|
||||||
|
|
||||||
|
kernel-policy
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Defines if the kernel validation policy is per operation (``per-op``)
|
||||||
|
or for the entire family (``global``). New families should use ``per-op``
|
||||||
|
(default) to be able to narrow down the attributes accepted by a specific
|
||||||
|
command.
|
||||||
|
|
||||||
|
checks
|
||||||
|
------
|
||||||
|
|
||||||
|
Documentation for the ``checks`` sub-sections of attribute specs.
|
||||||
|
|
||||||
|
unterminated-ok
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Accept strings without the null-termination (for legacy families only).
|
||||||
|
Switches from the ``NLA_NUL_STRING`` to ``NLA_STRING`` policy type.
|
||||||
|
|
||||||
|
max-len
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Defines max length for a binary or string attribute (corresponding
|
||||||
|
to the ``len`` member of struct nla_policy). For string attributes terminating
|
||||||
|
null character is not counted towards ``max-len``.
|
||||||
|
|
||||||
|
The field may either be a literal integer value or a name of a defined
|
||||||
|
constant. String types may reduce the constant by one
|
||||||
|
(i.e. specify ``max-len: CONST - 1``) to reserve space for the terminating
|
||||||
|
character so implementations should recognize such pattern.
|
||||||
|
|
||||||
|
min-len
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Similar to ``max-len`` but defines minimum length.
|
@ -161,6 +161,6 @@ xxx_packing() that calls it using the proper QUIRK_* one-hot bits set.
|
|||||||
|
|
||||||
The packing() function returns an int-encoded error code, which protects the
|
The packing() function returns an int-encoded error code, which protects the
|
||||||
programmer against incorrect API use. The errors are not expected to occur
|
programmer against incorrect API use. The errors are not expected to occur
|
||||||
durring runtime, therefore it is reasonable for xxx_packing() to return void
|
during runtime, therefore it is reasonable for xxx_packing() to return void
|
||||||
and simply swallow those errors. Optionally it can dump stack or print the
|
and simply swallow those errors. Optionally it can dump stack or print the
|
||||||
error description.
|
error description.
|
||||||
|
@ -57,6 +57,15 @@ patternProperties:
|
|||||||
enum:
|
enum:
|
||||||
- mscc,ocelot-miim
|
- mscc,ocelot-miim
|
||||||
|
|
||||||
|
"^ethernet-switch@[0-9a-f]+$":
|
||||||
|
type: object
|
||||||
|
$ref: /schemas/net/mscc,vsc7514-switch.yaml
|
||||||
|
unevaluatedProperties: false
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
enum:
|
||||||
|
- mscc,vsc7512-switch
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
- reg
|
- reg
|
||||||
|
@ -0,0 +1,80 @@
|
|||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/amlogic,g12a-mdio-mux.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: MDIO bus multiplexer/glue of Amlogic G12a SoC family
|
||||||
|
|
||||||
|
description:
|
||||||
|
This is a special case of a MDIO bus multiplexer. It allows to choose between
|
||||||
|
the internal mdio bus leading to the embedded 10/100 PHY or the external
|
||||||
|
MDIO bus.
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Neil Armstrong <neil.armstrong@linaro.org>
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: mdio-mux.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: amlogic,g12a-mdio-mux
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
items:
|
||||||
|
- description: peripheral clock
|
||||||
|
- description: platform crytal
|
||||||
|
- description: SoC 50MHz MPLL
|
||||||
|
|
||||||
|
clock-names:
|
||||||
|
items:
|
||||||
|
- const: pclk
|
||||||
|
- const: clkin0
|
||||||
|
- const: clkin1
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- clocks
|
||||||
|
- clock-names
|
||||||
|
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
#include <dt-bindings/interrupt-controller/irq.h>
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
mdio-multiplexer@4c000 {
|
||||||
|
compatible = "amlogic,g12a-mdio-mux";
|
||||||
|
reg = <0x4c000 0xa4>;
|
||||||
|
clocks = <&clkc_eth_phy>, <&xtal>, <&clkc_mpll>;
|
||||||
|
clock-names = "pclk", "clkin0", "clkin1";
|
||||||
|
mdio-parent-bus = <&mdio0>;
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
|
||||||
|
mdio@0 {
|
||||||
|
reg = <0>;
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
};
|
||||||
|
|
||||||
|
mdio@1 {
|
||||||
|
reg = <1>;
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
|
||||||
|
ethernet-phy@8 {
|
||||||
|
compatible = "ethernet-phy-id0180.3301",
|
||||||
|
"ethernet-phy-ieee802.3-c22";
|
||||||
|
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
reg = <8>;
|
||||||
|
max-speed = <100>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
...
|
@ -0,0 +1,64 @@
|
|||||||
|
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/amlogic,gxl-mdio-mux.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Amlogic GXL MDIO bus multiplexer
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Jerome Brunet <jbrunet@baylibre.com>
|
||||||
|
|
||||||
|
description:
|
||||||
|
This is a special case of a MDIO bus multiplexer. It allows to choose between
|
||||||
|
the internal mdio bus leading to the embedded 10/100 PHY or the external
|
||||||
|
MDIO bus on the Amlogic GXL SoC family.
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: mdio-mux.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: amlogic,gxl-mdio-mux
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
clock-names:
|
||||||
|
items:
|
||||||
|
- const: ref
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- clocks
|
||||||
|
- clock-names
|
||||||
|
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
eth_phy_mux: mdio@558 {
|
||||||
|
compatible = "amlogic,gxl-mdio-mux";
|
||||||
|
reg = <0x558 0xc>;
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
clocks = <&refclk>;
|
||||||
|
clock-names = "ref";
|
||||||
|
mdio-parent-bus = <&mdio0>;
|
||||||
|
|
||||||
|
external_mdio: mdio@0 {
|
||||||
|
reg = <0x0>;
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
};
|
||||||
|
|
||||||
|
internal_mdio: mdio@1 {
|
||||||
|
reg = <0x1>;
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
};
|
||||||
|
};
|
@ -19,6 +19,7 @@ description: |
|
|||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: ethernet-controller.yaml#
|
- $ref: ethernet-controller.yaml#
|
||||||
|
- $ref: /schemas/spi/spi-peripheral-props.yaml
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
@ -39,8 +40,8 @@ properties:
|
|||||||
it should be marked GPIO_ACTIVE_LOW.
|
it should be marked GPIO_ACTIVE_LOW.
|
||||||
maxItems: 1
|
maxItems: 1
|
||||||
|
|
||||||
|
controller-data: true
|
||||||
local-mac-address: true
|
local-mac-address: true
|
||||||
|
|
||||||
mac-address: true
|
mac-address: true
|
||||||
|
|
||||||
required:
|
required:
|
||||||
|
@ -28,6 +28,12 @@ properties:
|
|||||||
- renesas,r8a77995-canfd # R-Car D3
|
- renesas,r8a77995-canfd # R-Car D3
|
||||||
- const: renesas,rcar-gen3-canfd # R-Car Gen3 and RZ/G2
|
- const: renesas,rcar-gen3-canfd # R-Car Gen3 and RZ/G2
|
||||||
|
|
||||||
|
- items:
|
||||||
|
- enum:
|
||||||
|
- renesas,r8a779a0-canfd # R-Car V3U
|
||||||
|
- renesas,r8a779g0-canfd # R-Car V4H
|
||||||
|
- const: renesas,rcar-gen4-canfd # R-Car Gen4
|
||||||
|
|
||||||
- items:
|
- items:
|
||||||
- enum:
|
- enum:
|
||||||
- renesas,r9a07g043-canfd # RZ/G2UL and RZ/Five
|
- renesas,r9a07g043-canfd # RZ/G2UL and RZ/Five
|
||||||
@ -35,8 +41,6 @@ properties:
|
|||||||
- renesas,r9a07g054-canfd # RZ/V2L
|
- renesas,r9a07g054-canfd # RZ/V2L
|
||||||
- const: renesas,rzg2l-canfd # RZ/G2L family
|
- const: renesas,rzg2l-canfd # RZ/G2L family
|
||||||
|
|
||||||
- const: renesas,r8a779a0-canfd # R-Car V3U
|
|
||||||
|
|
||||||
reg:
|
reg:
|
||||||
maxItems: 1
|
maxItems: 1
|
||||||
|
|
||||||
@ -60,7 +64,7 @@ properties:
|
|||||||
$ref: /schemas/types.yaml#/definitions/flag
|
$ref: /schemas/types.yaml#/definitions/flag
|
||||||
description:
|
description:
|
||||||
The controller can operate in either CAN FD only mode (default) or
|
The controller can operate in either CAN FD only mode (default) or
|
||||||
Classical CAN only mode. The mode is global to both the channels.
|
Classical CAN only mode. The mode is global to all channels.
|
||||||
Specify this property to put the controller in Classical CAN only mode.
|
Specify this property to put the controller in Classical CAN only mode.
|
||||||
|
|
||||||
assigned-clocks:
|
assigned-clocks:
|
||||||
@ -80,6 +84,10 @@ patternProperties:
|
|||||||
The controller supports multiple channels and each is represented as a
|
The controller supports multiple channels and each is represented as a
|
||||||
child node. Each channel can be enabled/disabled individually.
|
child node. Each channel can be enabled/disabled individually.
|
||||||
|
|
||||||
|
properties:
|
||||||
|
phys:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
|
||||||
required:
|
required:
|
||||||
@ -159,7 +167,7 @@ allOf:
|
|||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
contains:
|
contains:
|
||||||
const: renesas,r8a779a0-canfd
|
const: renesas,rcar-gen4-canfd
|
||||||
then:
|
then:
|
||||||
patternProperties:
|
patternProperties:
|
||||||
"^channel[2-7]$": false
|
"^channel[2-7]$": false
|
||||||
|
@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
|||||||
title: Arrow SpeedChips XRS7000 Series Switch
|
title: Arrow SpeedChips XRS7000 Series Switch
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- George McCollister <george.mccollister@gmail.com>
|
- George McCollister <george.mccollister@gmail.com>
|
||||||
|
@ -66,7 +66,7 @@ required:
|
|||||||
- reg
|
- reg
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
- if:
|
- if:
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
|
@ -85,11 +85,16 @@ properties:
|
|||||||
ports:
|
ports:
|
||||||
type: object
|
type: object
|
||||||
|
|
||||||
properties:
|
patternProperties:
|
||||||
brcm,use-bcm-hdr:
|
'^port@[0-9a-f]$':
|
||||||
description: if present, indicates that the switch port has Broadcom
|
$ref: dsa-port.yaml#
|
||||||
tags enabled (per-packet metadata)
|
unevaluatedProperties: false
|
||||||
type: boolean
|
|
||||||
|
properties:
|
||||||
|
brcm,use-bcm-hdr:
|
||||||
|
description: if present, indicates that the switch port has Broadcom
|
||||||
|
tags enabled (per-packet metadata)
|
||||||
|
type: boolean
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- reg
|
- reg
|
||||||
|
@ -4,18 +4,19 @@
|
|||||||
$id: http://devicetree.org/schemas/net/dsa/dsa-port.yaml#
|
$id: http://devicetree.org/schemas/net/dsa/dsa-port.yaml#
|
||||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
title: Ethernet Switch port
|
title: Generic DSA Switch Port
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Andrew Lunn <andrew@lunn.ch>
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
- Florian Fainelli <f.fainelli@gmail.com>
|
- Florian Fainelli <f.fainelli@gmail.com>
|
||||||
- Vivien Didelot <vivien.didelot@gmail.com>
|
- Vladimir Oltean <olteanv@gmail.com>
|
||||||
|
|
||||||
description:
|
description:
|
||||||
Ethernet switch port Description
|
A DSA switch port is a component of a switch that manages one MAC, and can
|
||||||
|
pass Ethernet frames. It can act as a stanadard Ethernet switch port, or have
|
||||||
|
DSA-specific functionality.
|
||||||
|
|
||||||
allOf:
|
$ref: /schemas/net/ethernet-switch-port.yaml#
|
||||||
- $ref: /schemas/net/ethernet-controller.yaml#
|
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
reg:
|
reg:
|
||||||
@ -58,25 +59,6 @@ properties:
|
|||||||
- rtl8_4t
|
- rtl8_4t
|
||||||
- seville
|
- seville
|
||||||
|
|
||||||
phy-handle: true
|
|
||||||
|
|
||||||
phy-mode: true
|
|
||||||
|
|
||||||
fixed-link: true
|
|
||||||
|
|
||||||
mac-address: true
|
|
||||||
|
|
||||||
sfp: true
|
|
||||||
|
|
||||||
managed: true
|
|
||||||
|
|
||||||
rx-internal-delay-ps: true
|
|
||||||
|
|
||||||
tx-internal-delay-ps: true
|
|
||||||
|
|
||||||
required:
|
|
||||||
- reg
|
|
||||||
|
|
||||||
# CPU and DSA ports must have phylink-compatible link descriptions
|
# CPU and DSA ports must have phylink-compatible link descriptions
|
||||||
if:
|
if:
|
||||||
oneOf:
|
oneOf:
|
||||||
|
@ -9,7 +9,7 @@ title: Ethernet Switch
|
|||||||
maintainers:
|
maintainers:
|
||||||
- Andrew Lunn <andrew@lunn.ch>
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
- Florian Fainelli <f.fainelli@gmail.com>
|
- Florian Fainelli <f.fainelli@gmail.com>
|
||||||
- Vivien Didelot <vivien.didelot@gmail.com>
|
- Vladimir Oltean <olteanv@gmail.com>
|
||||||
|
|
||||||
description:
|
description:
|
||||||
This binding represents Ethernet Switches which have a dedicated CPU
|
This binding represents Ethernet Switches which have a dedicated CPU
|
||||||
@ -18,10 +18,9 @@ description:
|
|||||||
|
|
||||||
select: false
|
select: false
|
||||||
|
|
||||||
properties:
|
$ref: /schemas/net/ethernet-switch.yaml#
|
||||||
$nodename:
|
|
||||||
pattern: "^(ethernet-)?switch(@.*)?$"
|
|
||||||
|
|
||||||
|
properties:
|
||||||
dsa,member:
|
dsa,member:
|
||||||
minItems: 2
|
minItems: 2
|
||||||
maxItems: 2
|
maxItems: 2
|
||||||
@ -32,30 +31,28 @@ properties:
|
|||||||
(single device hanging off a CPU port) must not specify this property
|
(single device hanging off a CPU port) must not specify this property
|
||||||
$ref: /schemas/types.yaml#/definitions/uint32-array
|
$ref: /schemas/types.yaml#/definitions/uint32-array
|
||||||
|
|
||||||
patternProperties:
|
|
||||||
"^(ethernet-)?ports$":
|
|
||||||
type: object
|
|
||||||
properties:
|
|
||||||
'#address-cells':
|
|
||||||
const: 1
|
|
||||||
'#size-cells':
|
|
||||||
const: 0
|
|
||||||
|
|
||||||
patternProperties:
|
|
||||||
"^(ethernet-)?port@[0-9]+$":
|
|
||||||
type: object
|
|
||||||
description: Ethernet switch ports
|
|
||||||
|
|
||||||
$ref: dsa-port.yaml#
|
|
||||||
|
|
||||||
unevaluatedProperties: false
|
|
||||||
|
|
||||||
oneOf:
|
|
||||||
- required:
|
|
||||||
- ports
|
|
||||||
- required:
|
|
||||||
- ethernet-ports
|
|
||||||
|
|
||||||
additionalProperties: true
|
additionalProperties: true
|
||||||
|
|
||||||
|
$defs:
|
||||||
|
ethernet-ports:
|
||||||
|
description: A DSA switch without any extra port properties
|
||||||
|
$ref: '#/'
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
"^(ethernet-)?ports$":
|
||||||
|
type: object
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
properties:
|
||||||
|
'#address-cells':
|
||||||
|
const: 1
|
||||||
|
'#size-cells':
|
||||||
|
const: 0
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
"^(ethernet-)?port@[0-9]+$":
|
||||||
|
description: Ethernet switch ports
|
||||||
|
$ref: dsa-port.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
...
|
...
|
||||||
|
@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
|||||||
title: Hirschmann Hellcreek TSN Switch
|
title: Hirschmann Hellcreek TSN Switch
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Andrew Lunn <andrew@lunn.ch>
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
|
@ -24,56 +24,46 @@ description: |
|
|||||||
|
|
||||||
There is only the standalone version of MT7531.
|
There is only the standalone version of MT7531.
|
||||||
|
|
||||||
Port 5 on MT7530 has got various ways of configuration.
|
Port 5 on MT7530 has got various ways of configuration:
|
||||||
|
|
||||||
For standalone MT7530:
|
|
||||||
|
|
||||||
- Port 5 can be used as a CPU port.
|
- Port 5 can be used as a CPU port.
|
||||||
|
|
||||||
- PHY 0 or 4 of the switch can be muxed to connect to the gmac of the SoC
|
- PHY 0 or 4 of the switch can be muxed to gmac5 of the switch. Therefore,
|
||||||
which port 5 is wired to. Usually used for connecting the wan port
|
the gmac of the SoC which is wired to port 5 can connect to the PHY.
|
||||||
directly to the CPU to achieve 2 Gbps routing in total.
|
This is usually used for connecting the wan port directly to the CPU to
|
||||||
|
achieve 2 Gbps routing in total.
|
||||||
|
|
||||||
The driver looks up the reg on the ethernet-phy node which the phy-handle
|
The driver looks up the reg on the ethernet-phy node, which the phy-handle
|
||||||
property refers to on the gmac node to mux the specified phy.
|
property on the gmac node refers to, to mux the specified phy.
|
||||||
|
|
||||||
The driver requires the gmac of the SoC to have "mediatek,eth-mac" as the
|
The driver requires the gmac of the SoC to have "mediatek,eth-mac" as the
|
||||||
compatible string and the reg must be 1. So, for now, only gmac1 of an
|
compatible string and the reg must be 1. So, for now, only gmac1 of a
|
||||||
MediaTek SoC can benefit this. Banana Pi BPI-R2 suits this.
|
MediaTek SoC can benefit this. Banana Pi BPI-R2 suits this.
|
||||||
Check out example 5 for a similar configuration.
|
|
||||||
|
|
||||||
- Port 5 can be wired to an external phy. Port 5 becomes a DSA slave.
|
|
||||||
Check out example 7 for a similar configuration.
|
|
||||||
|
|
||||||
For multi-chip module MT7530:
|
|
||||||
|
|
||||||
- Port 5 can be used as a CPU port.
|
|
||||||
|
|
||||||
- PHY 0 or 4 of the switch can be muxed to connect to gmac1 of the SoC.
|
|
||||||
Usually used for connecting the wan port directly to the CPU to achieve 2
|
|
||||||
Gbps routing in total.
|
|
||||||
|
|
||||||
The driver looks up the reg on the ethernet-phy node which the phy-handle
|
|
||||||
property refers to on the gmac node to mux the specified phy.
|
|
||||||
|
|
||||||
For the MT7621 SoCs, rgmii2 group must be claimed with rgmii2 function.
|
For the MT7621 SoCs, rgmii2 group must be claimed with rgmii2 function.
|
||||||
|
|
||||||
Check out example 5.
|
Check out example 5.
|
||||||
|
|
||||||
- In case of an external phy wired to gmac1 of the SoC, port 5 must not be
|
- For the multi-chip module MT7530, in case of an external phy wired to
|
||||||
enabled.
|
gmac1 of the SoC, port 5 must not be enabled.
|
||||||
|
|
||||||
In case of muxing PHY 0 or 4, the external phy must not be enabled.
|
In case of muxing PHY 0 or 4, the external phy must not be enabled.
|
||||||
|
|
||||||
For the MT7621 SoCs, rgmii2 group must be claimed with rgmii2 function.
|
For the MT7621 SoCs, rgmii2 group must be claimed with rgmii2 function.
|
||||||
|
|
||||||
Check out example 6.
|
Check out example 6.
|
||||||
|
|
||||||
- Port 5 can be muxed to an external phy. Port 5 becomes a DSA slave.
|
- Port 5 can be wired to an external phy. Port 5 becomes a DSA slave.
|
||||||
The external phy must be wired TX to TX to gmac1 of the SoC for this to
|
|
||||||
work. Ubiquiti EdgeRouter X SFP is wired this way.
|
|
||||||
|
|
||||||
Muxing PHY 0 or 4 won't work when the external phy is connected TX to TX.
|
For the multi-chip module MT7530, the external phy must be wired TX to TX
|
||||||
|
to gmac1 of the SoC for this to work. Ubiquiti EdgeRouter X SFP is wired
|
||||||
|
this way.
|
||||||
|
|
||||||
|
For the multi-chip module MT7530, muxing PHY 0 or 4 won't work when the
|
||||||
|
external phy is connected TX to TX.
|
||||||
|
|
||||||
For the MT7621 SoCs, rgmii2 group must be claimed with gpio function.
|
For the MT7621 SoCs, rgmii2 group must be claimed with gpio function.
|
||||||
|
|
||||||
Check out example 7.
|
Check out example 7.
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
@ -157,9 +147,6 @@ patternProperties:
|
|||||||
patternProperties:
|
patternProperties:
|
||||||
"^(ethernet-)?port@[0-9]+$":
|
"^(ethernet-)?port@[0-9]+$":
|
||||||
type: object
|
type: object
|
||||||
description: Ethernet switch ports
|
|
||||||
|
|
||||||
unevaluatedProperties: false
|
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
reg:
|
reg:
|
||||||
@ -168,7 +155,6 @@ patternProperties:
|
|||||||
for user ports.
|
for user ports.
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa-port.yaml#
|
|
||||||
- if:
|
- if:
|
||||||
required: [ ethernet ]
|
required: [ ethernet ]
|
||||||
then:
|
then:
|
||||||
@ -238,7 +224,7 @@ $defs:
|
|||||||
- sgmii
|
- sgmii
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
- if:
|
- if:
|
||||||
required:
|
required:
|
||||||
- mediatek,mcm
|
- mediatek,mcm
|
||||||
@ -605,7 +591,7 @@ examples:
|
|||||||
label = "lan4";
|
label = "lan4";
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Commented out, phy4 is muxed to gmac1.
|
/* Commented out, phy4 is connected to gmac1.
|
||||||
port@4 {
|
port@4 {
|
||||||
reg = <4>;
|
reg = <4>;
|
||||||
label = "wan";
|
label = "wan";
|
||||||
|
@ -11,7 +11,7 @@ maintainers:
|
|||||||
- Woojung Huh <Woojung.Huh@microchip.com>
|
- Woojung Huh <Woojung.Huh@microchip.com>
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
- $ref: /schemas/spi/spi-peripheral-props.yaml#
|
- $ref: /schemas/spi/spi-peripheral-props.yaml#
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
|
@ -10,7 +10,7 @@ maintainers:
|
|||||||
- UNGLinuxDriver@microchip.com
|
- UNGLinuxDriver@microchip.com
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
|
@ -78,7 +78,7 @@ required:
|
|||||||
- reg
|
- reg
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
- if:
|
- if:
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
|
@ -13,7 +13,7 @@ description:
|
|||||||
depends on the SPI bus master driver.
|
depends on the SPI bus master driver.
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: "dsa.yaml#"
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
- $ref: /schemas/spi/spi-peripheral-props.yaml#
|
- $ref: /schemas/spi/spi-peripheral-props.yaml#
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
|
@ -66,15 +66,11 @@ properties:
|
|||||||
With the legacy mapping the reg corresponding to the internal
|
With the legacy mapping the reg corresponding to the internal
|
||||||
mdio is the switch reg with an offset of -1.
|
mdio is the switch reg with an offset of -1.
|
||||||
|
|
||||||
|
$ref: "dsa.yaml#"
|
||||||
|
|
||||||
patternProperties:
|
patternProperties:
|
||||||
"^(ethernet-)?ports$":
|
"^(ethernet-)?ports$":
|
||||||
type: object
|
type: object
|
||||||
properties:
|
|
||||||
'#address-cells':
|
|
||||||
const: 1
|
|
||||||
'#size-cells':
|
|
||||||
const: 0
|
|
||||||
|
|
||||||
patternProperties:
|
patternProperties:
|
||||||
"^(ethernet-)?port@[0-6]$":
|
"^(ethernet-)?port@[0-6]$":
|
||||||
type: object
|
type: object
|
||||||
@ -116,7 +112,7 @@ required:
|
|||||||
- compatible
|
- compatible
|
||||||
- reg
|
- reg
|
||||||
|
|
||||||
additionalProperties: true
|
unevaluatedProperties: false
|
||||||
|
|
||||||
examples:
|
examples:
|
||||||
- |
|
- |
|
||||||
@ -148,8 +144,6 @@ examples:
|
|||||||
|
|
||||||
switch@10 {
|
switch@10 {
|
||||||
compatible = "qca,qca8337";
|
compatible = "qca,qca8337";
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <0>;
|
|
||||||
reset-gpios = <&gpio 42 GPIO_ACTIVE_LOW>;
|
reset-gpios = <&gpio 42 GPIO_ACTIVE_LOW>;
|
||||||
reg = <0x10>;
|
reg = <0x10>;
|
||||||
|
|
||||||
@ -209,8 +203,6 @@ examples:
|
|||||||
|
|
||||||
switch@10 {
|
switch@10 {
|
||||||
compatible = "qca,qca8337";
|
compatible = "qca,qca8337";
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <0>;
|
|
||||||
reset-gpios = <&gpio 42 GPIO_ACTIVE_LOW>;
|
reset-gpios = <&gpio 42 GPIO_ACTIVE_LOW>;
|
||||||
reg = <0x10>;
|
reg = <0x10>;
|
||||||
|
|
||||||
|
@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
|||||||
title: Realtek switches for unmanaged switches
|
title: Realtek switches for unmanaged switches
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Linus Walleij <linus.walleij@linaro.org>
|
- Linus Walleij <linus.walleij@linaro.org>
|
||||||
|
@ -14,7 +14,7 @@ description: |
|
|||||||
handles 4 ports + 1 CPU management port.
|
handles 4 ports + 1 CPU management port.
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: dsa.yaml#
|
- $ref: dsa.yaml#/$defs/ethernet-ports
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
|
@ -0,0 +1,26 @@
|
|||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/ethernet-switch-port.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Generic Ethernet Switch Port
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
|
- Florian Fainelli <f.fainelli@gmail.com>
|
||||||
|
- Vladimir Oltean <olteanv@gmail.com>
|
||||||
|
|
||||||
|
description:
|
||||||
|
An Ethernet switch port is a component of a switch that manages one MAC, and
|
||||||
|
can pass Ethernet frames.
|
||||||
|
|
||||||
|
$ref: ethernet-controller.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
description: Port number
|
||||||
|
|
||||||
|
additionalProperties: true
|
||||||
|
|
||||||
|
...
|
62
Documentation/devicetree/bindings/net/ethernet-switch.yaml
Normal file
62
Documentation/devicetree/bindings/net/ethernet-switch.yaml
Normal file
@ -0,0 +1,62 @@
|
|||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/ethernet-switch.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Generic Ethernet Switch
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
|
- Florian Fainelli <f.fainelli@gmail.com>
|
||||||
|
- Vladimir Oltean <olteanv@gmail.com>
|
||||||
|
|
||||||
|
description:
|
||||||
|
Ethernet switches are multi-port Ethernet controllers. Each port has
|
||||||
|
its own number and is represented as its own Ethernet controller.
|
||||||
|
The minimum required functionality is to pass packets to software.
|
||||||
|
They may or may not be able to forward packets automonously between
|
||||||
|
ports.
|
||||||
|
|
||||||
|
select: false
|
||||||
|
|
||||||
|
properties:
|
||||||
|
$nodename:
|
||||||
|
pattern: "^(ethernet-)?switch(@.*)?$"
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
"^(ethernet-)?ports$":
|
||||||
|
type: object
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
properties:
|
||||||
|
'#address-cells':
|
||||||
|
const: 1
|
||||||
|
'#size-cells':
|
||||||
|
const: 0
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
"^(ethernet-)?port@[0-9]+$":
|
||||||
|
type: object
|
||||||
|
description: Ethernet switch ports
|
||||||
|
|
||||||
|
oneOf:
|
||||||
|
- required:
|
||||||
|
- ports
|
||||||
|
- required:
|
||||||
|
- ethernet-ports
|
||||||
|
|
||||||
|
additionalProperties: true
|
||||||
|
|
||||||
|
$defs:
|
||||||
|
base:
|
||||||
|
description: An ethernet switch without any extra port properties
|
||||||
|
$ref: '#/'
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
"^(ethernet-)?port@[0-9]+$":
|
||||||
|
description: Ethernet switch ports
|
||||||
|
$ref: ethernet-switch-port.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
...
|
@ -51,6 +51,7 @@ properties:
|
|||||||
- fsl,imx8mm-fec
|
- fsl,imx8mm-fec
|
||||||
- fsl,imx8mn-fec
|
- fsl,imx8mn-fec
|
||||||
- fsl,imx8mp-fec
|
- fsl,imx8mp-fec
|
||||||
|
- fsl,imx93-fec
|
||||||
- const: fsl,imx8mq-fec
|
- const: fsl,imx8mq-fec
|
||||||
- const: fsl,imx6sx-fec
|
- const: fsl,imx6sx-fec
|
||||||
- items:
|
- items:
|
||||||
|
47
Documentation/devicetree/bindings/net/maxlinear,gpy2xx.yaml
Normal file
47
Documentation/devicetree/bindings/net/maxlinear,gpy2xx.yaml
Normal file
@ -0,0 +1,47 @@
|
|||||||
|
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/maxlinear,gpy2xx.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: MaxLinear GPY2xx PHY
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
|
- Michael Walle <michael@walle.cc>
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: ethernet-phy.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
maxlinear,use-broken-interrupts:
|
||||||
|
description: |
|
||||||
|
Interrupts are broken on some GPY2xx PHYs in that they keep the
|
||||||
|
interrupt line asserted even after the interrupt status register is
|
||||||
|
cleared. Thus it is blocking the interrupt line which is usually bad
|
||||||
|
for shared lines. By default interrupts are disabled for this PHY and
|
||||||
|
polling mode is used. If one can live with the consequences, this
|
||||||
|
property can be used to enable interrupt handling.
|
||||||
|
|
||||||
|
Affected PHYs (as far as known) are GPY215B and GPY215C.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
dependencies:
|
||||||
|
maxlinear,use-broken-interrupts: [ interrupts ]
|
||||||
|
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
ethernet {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
|
||||||
|
ethernet-phy@0 {
|
||||||
|
reg = <0>;
|
||||||
|
interrupts-extended = <&intc 0>;
|
||||||
|
maxlinear,use-broken-interrupts;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
...
|
@ -1,48 +0,0 @@
|
|||||||
Properties for the MDIO bus multiplexer/glue of Amlogic G12a SoC family.
|
|
||||||
|
|
||||||
This is a special case of a MDIO bus multiplexer. It allows to choose between
|
|
||||||
the internal mdio bus leading to the embedded 10/100 PHY or the external
|
|
||||||
MDIO bus.
|
|
||||||
|
|
||||||
Required properties in addition to the generic multiplexer properties:
|
|
||||||
- compatible : amlogic,g12a-mdio-mux
|
|
||||||
- reg: physical address and length of the multiplexer/glue registers
|
|
||||||
- clocks: list of clock phandle, one for each entry clock-names.
|
|
||||||
- clock-names: should contain the following:
|
|
||||||
* "pclk" : peripheral clock.
|
|
||||||
* "clkin0" : platform crytal
|
|
||||||
* "clkin1" : SoC 50MHz MPLL
|
|
||||||
|
|
||||||
Example :
|
|
||||||
|
|
||||||
mdio_mux: mdio-multiplexer@4c000 {
|
|
||||||
compatible = "amlogic,g12a-mdio-mux";
|
|
||||||
reg = <0x0 0x4c000 0x0 0xa4>;
|
|
||||||
clocks = <&clkc CLKID_ETH_PHY>,
|
|
||||||
<&xtal>,
|
|
||||||
<&clkc CLKID_MPLL_5OM>;
|
|
||||||
clock-names = "pclk", "clkin0", "clkin1";
|
|
||||||
mdio-parent-bus = <&mdio0>;
|
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <0>;
|
|
||||||
|
|
||||||
ext_mdio: mdio@0 {
|
|
||||||
reg = <0>;
|
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <0>;
|
|
||||||
};
|
|
||||||
|
|
||||||
int_mdio: mdio@1 {
|
|
||||||
reg = <1>;
|
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <0>;
|
|
||||||
|
|
||||||
internal_ephy: ethernet-phy@8 {
|
|
||||||
compatible = "ethernet-phy-id0180.3301",
|
|
||||||
"ethernet-phy-ieee802.3-c22";
|
|
||||||
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
reg = <8>;
|
|
||||||
max-speed = <100>;
|
|
||||||
};
|
|
||||||
};
|
|
||||||
};
|
|
@ -158,6 +158,7 @@ KSZ9031:
|
|||||||
no link will be established.
|
no link will be established.
|
||||||
|
|
||||||
KSZ9131:
|
KSZ9131:
|
||||||
|
LAN8841:
|
||||||
|
|
||||||
All skew control options are specified in picoseconds. The increment
|
All skew control options are specified in picoseconds. The increment
|
||||||
step is 100ps. Unlike KSZ9031, the values represent picoseccond delays.
|
step is 100ps. Unlike KSZ9031, the values represent picoseccond delays.
|
||||||
|
117
Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml
Normal file
117
Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml
Normal file
@ -0,0 +1,117 @@
|
|||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/motorcomm,yt8xxx.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: MotorComm yt8xxx Ethernet PHY
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Frank Sae <frank.sae@motor-comm.com>
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: ethernet-phy.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
enum:
|
||||||
|
- ethernet-phy-id4f51.e91a
|
||||||
|
- ethernet-phy-id4f51.e91b
|
||||||
|
|
||||||
|
rx-internal-delay-ps:
|
||||||
|
description: |
|
||||||
|
RGMII RX Clock Delay used only when PHY operates in RGMII mode with
|
||||||
|
internal delay (phy-mode is 'rgmii-id' or 'rgmii-rxid') in pico-seconds.
|
||||||
|
enum: [ 0, 150, 300, 450, 600, 750, 900, 1050, 1200, 1350, 1500, 1650,
|
||||||
|
1800, 1900, 1950, 2050, 2100, 2200, 2250, 2350, 2500, 2650, 2800,
|
||||||
|
2950, 3100, 3250, 3400, 3550, 3700, 3850, 4000, 4150 ]
|
||||||
|
default: 1950
|
||||||
|
|
||||||
|
tx-internal-delay-ps:
|
||||||
|
description: |
|
||||||
|
RGMII TX Clock Delay used only when PHY operates in RGMII mode with
|
||||||
|
internal delay (phy-mode is 'rgmii-id' or 'rgmii-txid') in pico-seconds.
|
||||||
|
enum: [ 0, 150, 300, 450, 600, 750, 900, 1050, 1200, 1350, 1500, 1650, 1800,
|
||||||
|
1950, 2100, 2250 ]
|
||||||
|
default: 1950
|
||||||
|
|
||||||
|
motorcomm,clk-out-frequency-hz:
|
||||||
|
description: clock output on clock output pin.
|
||||||
|
enum: [0, 25000000, 125000000]
|
||||||
|
default: 0
|
||||||
|
|
||||||
|
motorcomm,keep-pll-enabled:
|
||||||
|
description: |
|
||||||
|
If set, keep the PLL enabled even if there is no link. Useful if you
|
||||||
|
want to use the clock output without an ethernet link.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
motorcomm,auto-sleep-disabled:
|
||||||
|
description: |
|
||||||
|
If set, PHY will not enter sleep mode and close AFE after unplug cable
|
||||||
|
for a timer.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
motorcomm,tx-clk-adj-enabled:
|
||||||
|
description: |
|
||||||
|
This configuration is mainly to adapt to VF2 with JH7110 SoC.
|
||||||
|
Useful if you want to use tx-clk-xxxx-inverted to adj the delay of tx clk.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
motorcomm,tx-clk-10-inverted:
|
||||||
|
description: |
|
||||||
|
Use original or inverted RGMII Transmit PHY Clock to drive the RGMII
|
||||||
|
Transmit PHY Clock delay train configuration when speed is 10Mbps.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
motorcomm,tx-clk-100-inverted:
|
||||||
|
description: |
|
||||||
|
Use original or inverted RGMII Transmit PHY Clock to drive the RGMII
|
||||||
|
Transmit PHY Clock delay train configuration when speed is 100Mbps.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
motorcomm,tx-clk-1000-inverted:
|
||||||
|
description: |
|
||||||
|
Use original or inverted RGMII Transmit PHY Clock to drive the RGMII
|
||||||
|
Transmit PHY Clock delay train configuration when speed is 1000Mbps.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
mdio {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
phy-mode = "rgmii-id";
|
||||||
|
ethernet-phy@4 {
|
||||||
|
/* Only needed to make DT lint tools work. Do not copy/paste
|
||||||
|
* into real DTS files.
|
||||||
|
*/
|
||||||
|
compatible = "ethernet-phy-id4f51.e91a";
|
||||||
|
|
||||||
|
reg = <4>;
|
||||||
|
rx-internal-delay-ps = <2100>;
|
||||||
|
tx-internal-delay-ps = <150>;
|
||||||
|
motorcomm,clk-out-frequency-hz = <0>;
|
||||||
|
motorcomm,keep-pll-enabled;
|
||||||
|
motorcomm,auto-sleep-disabled;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
- |
|
||||||
|
mdio {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
phy-mode = "rgmii";
|
||||||
|
ethernet-phy@5 {
|
||||||
|
/* Only needed to make DT lint tools work. Do not copy/paste
|
||||||
|
* into real DTS files.
|
||||||
|
*/
|
||||||
|
compatible = "ethernet-phy-id4f51.e91a";
|
||||||
|
|
||||||
|
reg = <5>;
|
||||||
|
motorcomm,clk-out-frequency-hz = <125000000>;
|
||||||
|
motorcomm,keep-pll-enabled;
|
||||||
|
motorcomm,auto-sleep-disabled;
|
||||||
|
};
|
||||||
|
};
|
@ -18,14 +18,52 @@ description: |
|
|||||||
packets using CPU. Additionally, PTP is supported as well as FDMA for faster
|
packets using CPU. Additionally, PTP is supported as well as FDMA for faster
|
||||||
packet extraction/injection.
|
packet extraction/injection.
|
||||||
|
|
||||||
properties:
|
allOf:
|
||||||
$nodename:
|
- if:
|
||||||
pattern: "^switch@[0-9a-f]+$"
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: mscc,vsc7514-switch
|
||||||
|
then:
|
||||||
|
$ref: ethernet-switch.yaml#
|
||||||
|
required:
|
||||||
|
- interrupts
|
||||||
|
- interrupt-names
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
minItems: 21
|
||||||
|
reg-names:
|
||||||
|
minItems: 21
|
||||||
|
ethernet-ports:
|
||||||
|
patternProperties:
|
||||||
|
"^port@[0-9a-f]+$":
|
||||||
|
$ref: ethernet-switch-port.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
- if:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: mscc,vsc7512-switch
|
||||||
|
then:
|
||||||
|
$ref: /schemas/net/dsa/dsa.yaml#
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
maxItems: 20
|
||||||
|
reg-names:
|
||||||
|
maxItems: 20
|
||||||
|
ethernet-ports:
|
||||||
|
patternProperties:
|
||||||
|
"^port@[0-9a-f]+$":
|
||||||
|
$ref: /schemas/net/dsa/dsa-port.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
const: mscc,vsc7514-switch
|
enum:
|
||||||
|
- mscc,vsc7512-switch
|
||||||
|
- mscc,vsc7514-switch
|
||||||
|
|
||||||
reg:
|
reg:
|
||||||
|
minItems: 20
|
||||||
items:
|
items:
|
||||||
- description: system target
|
- description: system target
|
||||||
- description: rewriter target
|
- description: rewriter target
|
||||||
@ -50,6 +88,7 @@ properties:
|
|||||||
- description: fdma target
|
- description: fdma target
|
||||||
|
|
||||||
reg-names:
|
reg-names:
|
||||||
|
minItems: 20
|
||||||
items:
|
items:
|
||||||
- const: sys
|
- const: sys
|
||||||
- const: rew
|
- const: rew
|
||||||
@ -87,59 +126,16 @@ properties:
|
|||||||
- const: xtr
|
- const: xtr
|
||||||
- const: fdma
|
- const: fdma
|
||||||
|
|
||||||
ethernet-ports:
|
|
||||||
type: object
|
|
||||||
|
|
||||||
properties:
|
|
||||||
'#address-cells':
|
|
||||||
const: 1
|
|
||||||
'#size-cells':
|
|
||||||
const: 0
|
|
||||||
|
|
||||||
additionalProperties: false
|
|
||||||
|
|
||||||
patternProperties:
|
|
||||||
"^port@[0-9a-f]+$":
|
|
||||||
type: object
|
|
||||||
description: Ethernet ports handled by the switch
|
|
||||||
|
|
||||||
$ref: ethernet-controller.yaml#
|
|
||||||
|
|
||||||
unevaluatedProperties: false
|
|
||||||
|
|
||||||
properties:
|
|
||||||
reg:
|
|
||||||
description: Switch port number
|
|
||||||
|
|
||||||
phy-handle: true
|
|
||||||
|
|
||||||
phy-mode: true
|
|
||||||
|
|
||||||
fixed-link: true
|
|
||||||
|
|
||||||
mac-address: true
|
|
||||||
|
|
||||||
required:
|
|
||||||
- reg
|
|
||||||
- phy-mode
|
|
||||||
|
|
||||||
oneOf:
|
|
||||||
- required:
|
|
||||||
- phy-handle
|
|
||||||
- required:
|
|
||||||
- fixed-link
|
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
- reg
|
- reg
|
||||||
- reg-names
|
- reg-names
|
||||||
- interrupts
|
|
||||||
- interrupt-names
|
|
||||||
- ethernet-ports
|
- ethernet-ports
|
||||||
|
|
||||||
additionalProperties: false
|
unevaluatedProperties: false
|
||||||
|
|
||||||
examples:
|
examples:
|
||||||
|
# VSC7514 (Switchdev)
|
||||||
- |
|
- |
|
||||||
switch@1010000 {
|
switch@1010000 {
|
||||||
compatible = "mscc,vsc7514-switch";
|
compatible = "mscc,vsc7514-switch";
|
||||||
@ -187,5 +183,51 @@ examples:
|
|||||||
};
|
};
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
# VSC7512 (DSA)
|
||||||
|
- |
|
||||||
|
ethernet-switch@1{
|
||||||
|
compatible = "mscc,vsc7512-switch";
|
||||||
|
reg = <0x71010000 0x10000>,
|
||||||
|
<0x71030000 0x10000>,
|
||||||
|
<0x71080000 0x100>,
|
||||||
|
<0x710e0000 0x10000>,
|
||||||
|
<0x711e0000 0x100>,
|
||||||
|
<0x711f0000 0x100>,
|
||||||
|
<0x71200000 0x100>,
|
||||||
|
<0x71210000 0x100>,
|
||||||
|
<0x71220000 0x100>,
|
||||||
|
<0x71230000 0x100>,
|
||||||
|
<0x71240000 0x100>,
|
||||||
|
<0x71250000 0x100>,
|
||||||
|
<0x71260000 0x100>,
|
||||||
|
<0x71270000 0x100>,
|
||||||
|
<0x71280000 0x100>,
|
||||||
|
<0x71800000 0x80000>,
|
||||||
|
<0x71880000 0x10000>,
|
||||||
|
<0x71040000 0x10000>,
|
||||||
|
<0x71050000 0x10000>,
|
||||||
|
<0x71060000 0x10000>;
|
||||||
|
reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
|
||||||
|
"port2", "port3", "port4", "port5", "port6",
|
||||||
|
"port7", "port8", "port9", "port10", "qsys",
|
||||||
|
"ana", "s0", "s1", "s2";
|
||||||
|
|
||||||
|
ethernet-ports {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
|
||||||
|
port@0 {
|
||||||
|
reg = <0>;
|
||||||
|
ethernet = <&mac_sw>;
|
||||||
|
phy-handle = <&phy0>;
|
||||||
|
phy-mode = "internal";
|
||||||
|
};
|
||||||
|
port@1 {
|
||||||
|
reg = <1>;
|
||||||
|
phy-handle = <&phy1>;
|
||||||
|
phy-mode = "internal";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
...
|
...
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
$id: http://devicetree.org/schemas/net/nxp,dwmac-imx.yaml#
|
$id: http://devicetree.org/schemas/net/nxp,dwmac-imx.yaml#
|
||||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
title: NXP i.MX8 DWMAC glue layer
|
title: NXP i.MX8/9 DWMAC glue layer
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Clark Wang <xiaoning.wang@nxp.com>
|
- Clark Wang <xiaoning.wang@nxp.com>
|
||||||
@ -19,6 +19,7 @@ select:
|
|||||||
enum:
|
enum:
|
||||||
- nxp,imx8mp-dwmac-eqos
|
- nxp,imx8mp-dwmac-eqos
|
||||||
- nxp,imx8dxl-dwmac-eqos
|
- nxp,imx8dxl-dwmac-eqos
|
||||||
|
- nxp,imx93-dwmac-eqos
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
|
|
||||||
@ -32,6 +33,7 @@ properties:
|
|||||||
- enum:
|
- enum:
|
||||||
- nxp,imx8mp-dwmac-eqos
|
- nxp,imx8mp-dwmac-eqos
|
||||||
- nxp,imx8dxl-dwmac-eqos
|
- nxp,imx8dxl-dwmac-eqos
|
||||||
|
- nxp,imx93-dwmac-eqos
|
||||||
- const: snps,dwmac-5.10a
|
- const: snps,dwmac-5.10a
|
||||||
|
|
||||||
clocks:
|
clocks:
|
||||||
|
51
Documentation/devicetree/bindings/net/rfkill-gpio.yaml
Normal file
51
Documentation/devicetree/bindings/net/rfkill-gpio.yaml
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/rfkill-gpio.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: GPIO controlled rfkill switch
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Johannes Berg <johannes@sipsolutions.net>
|
||||||
|
- Philipp Zabel <p.zabel@pengutronix.de>
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: rfkill-gpio
|
||||||
|
|
||||||
|
label:
|
||||||
|
description: rfkill switch name, defaults to node name
|
||||||
|
|
||||||
|
radio-type:
|
||||||
|
description: rfkill radio type
|
||||||
|
enum:
|
||||||
|
- bluetooth
|
||||||
|
- fm
|
||||||
|
- gps
|
||||||
|
- nfc
|
||||||
|
- ultrawideband
|
||||||
|
- wimax
|
||||||
|
- wlan
|
||||||
|
- wwan
|
||||||
|
|
||||||
|
shutdown-gpios:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- radio-type
|
||||||
|
- shutdown-gpios
|
||||||
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
#include <dt-bindings/gpio/gpio.h>
|
||||||
|
|
||||||
|
rfkill {
|
||||||
|
compatible = "rfkill-gpio";
|
||||||
|
label = "rfkill-pcie-wlan";
|
||||||
|
radio-type = "wlan";
|
||||||
|
shutdown-gpios = <&gpio2 25 GPIO_ACTIVE_HIGH>;
|
||||||
|
};
|
@ -49,11 +49,11 @@ properties:
|
|||||||
- rockchip,rk3368-gmac
|
- rockchip,rk3368-gmac
|
||||||
- rockchip,rk3399-gmac
|
- rockchip,rk3399-gmac
|
||||||
- rockchip,rv1108-gmac
|
- rockchip,rv1108-gmac
|
||||||
- rockchip,rv1126-gmac
|
|
||||||
- items:
|
- items:
|
||||||
- enum:
|
- enum:
|
||||||
- rockchip,rk3568-gmac
|
- rockchip,rk3568-gmac
|
||||||
- rockchip,rk3588-gmac
|
- rockchip,rk3588-gmac
|
||||||
|
- rockchip,rv1126-gmac
|
||||||
- const: snps,dwmac-4.20a
|
- const: snps,dwmac-4.20a
|
||||||
|
|
||||||
clocks:
|
clocks:
|
||||||
|
@ -552,7 +552,7 @@ required:
|
|||||||
|
|
||||||
dependencies:
|
dependencies:
|
||||||
snps,reset-active-low: ["snps,reset-gpio"]
|
snps,reset-active-low: ["snps,reset-gpio"]
|
||||||
snps,reset-delay-us: ["snps,reset-gpio"]
|
snps,reset-delays-us: ["snps,reset-gpio"]
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: "ethernet-controller.yaml#"
|
- $ref: "ethernet-controller.yaml#"
|
||||||
|
@ -57,6 +57,7 @@ properties:
|
|||||||
- ti,am654-cpsw-nuss
|
- ti,am654-cpsw-nuss
|
||||||
- ti,j7200-cpswxg-nuss
|
- ti,j7200-cpswxg-nuss
|
||||||
- ti,j721e-cpsw-nuss
|
- ti,j721e-cpsw-nuss
|
||||||
|
- ti,j721e-cpswxg-nuss
|
||||||
- ti,am642-cpsw-nuss
|
- ti,am642-cpsw-nuss
|
||||||
|
|
||||||
reg:
|
reg:
|
||||||
@ -111,7 +112,7 @@ properties:
|
|||||||
const: 0
|
const: 0
|
||||||
|
|
||||||
patternProperties:
|
patternProperties:
|
||||||
"^port@[1-4]$":
|
"^port@[1-8]$":
|
||||||
type: object
|
type: object
|
||||||
description: CPSWxG NUSS external ports
|
description: CPSWxG NUSS external ports
|
||||||
|
|
||||||
@ -121,7 +122,7 @@ properties:
|
|||||||
properties:
|
properties:
|
||||||
reg:
|
reg:
|
||||||
minimum: 1
|
minimum: 1
|
||||||
maximum: 4
|
maximum: 8
|
||||||
description: CPSW port number
|
description: CPSW port number
|
||||||
|
|
||||||
phys:
|
phys:
|
||||||
@ -186,12 +187,36 @@ allOf:
|
|||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
contains:
|
contains:
|
||||||
const: ti,j7200-cpswxg-nuss
|
const: ti,j721e-cpswxg-nuss
|
||||||
then:
|
then:
|
||||||
properties:
|
properties:
|
||||||
ethernet-ports:
|
ethernet-ports:
|
||||||
patternProperties:
|
patternProperties:
|
||||||
"^port@[3-4]$": false
|
"^port@[5-8]$": false
|
||||||
|
"^port@[1-4]$":
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
minimum: 1
|
||||||
|
maximum: 4
|
||||||
|
|
||||||
|
- if:
|
||||||
|
not:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
contains:
|
||||||
|
enum:
|
||||||
|
- ti,j721e-cpswxg-nuss
|
||||||
|
- ti,j7200-cpswxg-nuss
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
ethernet-ports:
|
||||||
|
patternProperties:
|
||||||
|
"^port@[3-8]$": false
|
||||||
|
"^port@[1-2]$":
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
minimum: 1
|
||||||
|
maximum: 2
|
||||||
|
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
|
||||||
|
@ -93,6 +93,14 @@ properties:
|
|||||||
description:
|
description:
|
||||||
Number of timestamp Generator function outputs (TS_GENFx)
|
Number of timestamp Generator function outputs (TS_GENFx)
|
||||||
|
|
||||||
|
ti,pps:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32-array
|
||||||
|
minItems: 2
|
||||||
|
maxItems: 2
|
||||||
|
description: |
|
||||||
|
The pair of HWx_TS_PUSH input and TS_GENFy output indexes used for
|
||||||
|
PPS events generation. Platform/board specific.
|
||||||
|
|
||||||
refclk-mux:
|
refclk-mux:
|
||||||
type: object
|
type: object
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
@ -29,15 +29,15 @@ additionalProperties: false
|
|||||||
|
|
||||||
examples:
|
examples:
|
||||||
- |
|
- |
|
||||||
mmc {
|
mmc {
|
||||||
#address-cells = <1>;
|
#address-cells = <1>;
|
||||||
#size-cells = <0>;
|
#size-cells = <0>;
|
||||||
|
|
||||||
wifi@1 {
|
wifi@1 {
|
||||||
compatible = "esp,esp8089";
|
compatible = "esp,esp8089";
|
||||||
reg = <1>;
|
reg = <1>;
|
||||||
esp,crystal-26M-en = <2>;
|
esp,crystal-26M-en = <2>;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
...
|
...
|
||||||
|
@ -1,6 +1,5 @@
|
|||||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved.
|
# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved.
|
||||||
|
|
||||||
%YAML 1.2
|
%YAML 1.2
|
||||||
---
|
---
|
||||||
$id: http://devicetree.org/schemas/net/wireless/ieee80211.yaml#
|
$id: http://devicetree.org/schemas/net/wireless/ieee80211.yaml#
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
Marvell 8787/8897/8997 (sd8787/sd8897/sd8997/pcie8997) SDIO/PCIE devices
|
Marvell 8787/8897/8978/8997 (sd8787/sd8897/sd8978/sd8997/pcie8997) SDIO/PCIE devices
|
||||||
------
|
------
|
||||||
|
|
||||||
This node provides properties for controlling the Marvell SDIO/PCIE wireless device.
|
This node provides properties for controlling the Marvell SDIO/PCIE wireless device.
|
||||||
@ -10,7 +10,9 @@ Required properties:
|
|||||||
- compatible : should be one of the following:
|
- compatible : should be one of the following:
|
||||||
* "marvell,sd8787"
|
* "marvell,sd8787"
|
||||||
* "marvell,sd8897"
|
* "marvell,sd8897"
|
||||||
|
* "marvell,sd8978"
|
||||||
* "marvell,sd8997"
|
* "marvell,sd8997"
|
||||||
|
* "nxp,iw416"
|
||||||
* "pci11ab,2b42"
|
* "pci11ab,2b42"
|
||||||
* "pci1b4b,2b42"
|
* "pci1b4b,2b42"
|
||||||
|
|
||||||
|
@ -1,6 +1,5 @@
|
|||||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved.
|
# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved.
|
||||||
|
|
||||||
%YAML 1.2
|
%YAML 1.2
|
||||||
---
|
---
|
||||||
$id: http://devicetree.org/schemas/net/wireless/mediatek,mt76.yaml#
|
$id: http://devicetree.org/schemas/net/wireless/mediatek,mt76.yaml#
|
||||||
|
@ -1,6 +1,5 @@
|
|||||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved.
|
# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved.
|
||||||
|
|
||||||
%YAML 1.2
|
%YAML 1.2
|
||||||
---
|
---
|
||||||
$id: http://devicetree.org/schemas/net/wireless/qcom,ath11k.yaml#
|
$id: http://devicetree.org/schemas/net/wireless/qcom,ath11k.yaml#
|
||||||
@ -21,6 +20,7 @@ properties:
|
|||||||
- qcom,ipq8074-wifi
|
- qcom,ipq8074-wifi
|
||||||
- qcom,ipq6018-wifi
|
- qcom,ipq6018-wifi
|
||||||
- qcom,wcn6750-wifi
|
- qcom,wcn6750-wifi
|
||||||
|
- qcom,ipq5018-wifi
|
||||||
|
|
||||||
reg:
|
reg:
|
||||||
maxItems: 1
|
maxItems: 1
|
||||||
@ -262,10 +262,10 @@ allOf:
|
|||||||
examples:
|
examples:
|
||||||
- |
|
- |
|
||||||
|
|
||||||
q6v5_wcss: q6v5_wcss@CD00000 {
|
q6v5_wcss: remoteproc@cd00000 {
|
||||||
compatible = "qcom,ipq8074-wcss-pil";
|
compatible = "qcom,ipq8074-wcss-pil";
|
||||||
reg = <0xCD00000 0x4040>,
|
reg = <0xcd00000 0x4040>,
|
||||||
<0x4AB000 0x20>;
|
<0x4ab000 0x20>;
|
||||||
reg-names = "qdsp6",
|
reg-names = "qdsp6",
|
||||||
"rmb";
|
"rmb";
|
||||||
};
|
};
|
||||||
@ -386,7 +386,7 @@ examples:
|
|||||||
#address-cells = <2>;
|
#address-cells = <2>;
|
||||||
#size-cells = <2>;
|
#size-cells = <2>;
|
||||||
|
|
||||||
qcn9074_0: qcn9074_0@51100000 {
|
qcn9074_0: wifi@51100000 {
|
||||||
no-map;
|
no-map;
|
||||||
reg = <0x0 0x51100000 0x0 0x03500000>;
|
reg = <0x0 0x51100000 0x0 0x03500000>;
|
||||||
};
|
};
|
||||||
@ -463,6 +463,6 @@ examples:
|
|||||||
qcom,smem-states = <&wlan_smp2p_out 0>;
|
qcom,smem-states = <&wlan_smp2p_out 0>;
|
||||||
qcom,smem-state-names = "wlan-smp2p-out";
|
qcom,smem-state-names = "wlan-smp2p-out";
|
||||||
wifi-firmware {
|
wifi-firmware {
|
||||||
iommus = <&apps_smmu 0x1c02 0x1>;
|
iommus = <&apps_smmu 0x1c02 0x1>;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
@ -2,7 +2,6 @@
|
|||||||
# Copyright (c) 2020, Silicon Laboratories, Inc.
|
# Copyright (c) 2020, Silicon Laboratories, Inc.
|
||||||
%YAML 1.2
|
%YAML 1.2
|
||||||
---
|
---
|
||||||
|
|
||||||
$id: http://devicetree.org/schemas/net/wireless/silabs,wfx.yaml#
|
$id: http://devicetree.org/schemas/net/wireless/silabs,wfx.yaml#
|
||||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
@ -90,47 +90,47 @@ examples:
|
|||||||
|
|
||||||
// For wl12xx family:
|
// For wl12xx family:
|
||||||
spi1 {
|
spi1 {
|
||||||
#address-cells = <1>;
|
#address-cells = <1>;
|
||||||
#size-cells = <0>;
|
#size-cells = <0>;
|
||||||
|
|
||||||
wlcore1: wlcore@1 {
|
wlcore1: wlcore@1 {
|
||||||
compatible = "ti,wl1271";
|
compatible = "ti,wl1271";
|
||||||
reg = <1>;
|
reg = <1>;
|
||||||
spi-max-frequency = <48000000>;
|
spi-max-frequency = <48000000>;
|
||||||
interrupts = <8 IRQ_TYPE_LEVEL_HIGH>;
|
interrupts = <8 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
vwlan-supply = <&vwlan_fixed>;
|
vwlan-supply = <&vwlan_fixed>;
|
||||||
clock-xtal;
|
clock-xtal;
|
||||||
ref-clock-frequency = <38400000>;
|
ref-clock-frequency = <38400000>;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
// For wl18xx family:
|
// For wl18xx family:
|
||||||
spi2 {
|
spi2 {
|
||||||
#address-cells = <1>;
|
#address-cells = <1>;
|
||||||
#size-cells = <0>;
|
#size-cells = <0>;
|
||||||
|
|
||||||
wlcore2: wlcore@0 {
|
wlcore2: wlcore@0 {
|
||||||
compatible = "ti,wl1835";
|
compatible = "ti,wl1835";
|
||||||
reg = <0>;
|
reg = <0>;
|
||||||
spi-max-frequency = <48000000>;
|
spi-max-frequency = <48000000>;
|
||||||
interrupts = <27 IRQ_TYPE_EDGE_RISING>;
|
interrupts = <27 IRQ_TYPE_EDGE_RISING>;
|
||||||
vwlan-supply = <&vwlan_fixed>;
|
vwlan-supply = <&vwlan_fixed>;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
// SDIO example:
|
// SDIO example:
|
||||||
mmc3 {
|
mmc3 {
|
||||||
vmmc-supply = <&wlan_en_reg>;
|
vmmc-supply = <&wlan_en_reg>;
|
||||||
bus-width = <4>;
|
bus-width = <4>;
|
||||||
cap-power-off-card;
|
cap-power-off-card;
|
||||||
keep-power-in-suspend;
|
keep-power-in-suspend;
|
||||||
|
|
||||||
#address-cells = <1>;
|
#address-cells = <1>;
|
||||||
#size-cells = <0>;
|
#size-cells = <0>;
|
||||||
|
|
||||||
wlcore3: wlcore@2 {
|
wlcore3: wlcore@2 {
|
||||||
compatible = "ti,wl1835";
|
compatible = "ti,wl1835";
|
||||||
reg = <2>;
|
reg = <2>;
|
||||||
interrupts = <19 IRQ_TYPE_LEVEL_HIGH>;
|
interrupts = <19 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
@ -785,6 +785,8 @@ patternProperties:
|
|||||||
description: MaxBotix Inc.
|
description: MaxBotix Inc.
|
||||||
"^maxim,.*":
|
"^maxim,.*":
|
||||||
description: Maxim Integrated Products
|
description: Maxim Integrated Products
|
||||||
|
"^maxlinear,.*":
|
||||||
|
description: MaxLinear Inc.
|
||||||
"^mbvl,.*":
|
"^mbvl,.*":
|
||||||
description: Mobiveil Inc.
|
description: Mobiveil Inc.
|
||||||
"^mcube,.*":
|
"^mcube,.*":
|
||||||
@ -855,6 +857,8 @@ patternProperties:
|
|||||||
description: Moortec Semiconductor Ltd.
|
description: Moortec Semiconductor Ltd.
|
||||||
"^mosaixtech,.*":
|
"^mosaixtech,.*":
|
||||||
description: Mosaix Technologies, Inc.
|
description: Mosaix Technologies, Inc.
|
||||||
|
"^motorcomm,.*":
|
||||||
|
description: MotorComm, Inc.
|
||||||
"^motorola,.*":
|
"^motorola,.*":
|
||||||
description: Motorola, Inc.
|
description: Motorola, Inc.
|
||||||
"^moxa,.*":
|
"^moxa,.*":
|
||||||
|
@ -323,7 +323,7 @@ If the lowest bit of showcapimsgs is set, kernelcapi logs controller and
|
|||||||
application up and down events.
|
application up and down events.
|
||||||
|
|
||||||
In addition, every registered CAPI controller has an associated traceflag
|
In addition, every registered CAPI controller has an associated traceflag
|
||||||
parameter controlling how CAPI messages sent from and to tha controller are
|
parameter controlling how CAPI messages sent from and to the controller are
|
||||||
logged. The traceflag parameter is initialized with the value of the
|
logged. The traceflag parameter is initialized with the value of the
|
||||||
showcapimsgs parameter when the controller is registered, but can later be
|
showcapimsgs parameter when the controller is registered, but can later be
|
||||||
changed via the MANUFACTURER_REQ command KCAPI_CMD_TRACE.
|
changed via the MANUFACTURER_REQ command KCAPI_CMD_TRACE.
|
||||||
|
@ -3,7 +3,7 @@ mISDN Driver
|
|||||||
============
|
============
|
||||||
|
|
||||||
mISDN is a new modular ISDN driver, in the long term it should replace
|
mISDN is a new modular ISDN driver, in the long term it should replace
|
||||||
the old I4L driver architecture for passiv ISDN cards.
|
the old I4L driver architecture for passive ISDN cards.
|
||||||
It was designed to allow a broad range of applications and interfaces
|
It was designed to allow a broad range of applications and interfaces
|
||||||
but only have the basic function in kernel, the interface to the user
|
but only have the basic function in kernel, the interface to the user
|
||||||
space is based on sockets with a own address family AF_ISDN.
|
space is based on sockets with a own address family AF_ISDN.
|
||||||
|
331
Documentation/netlink/genetlink-c.yaml
Normal file
331
Documentation/netlink/genetlink-c.yaml
Normal file
@ -0,0 +1,331 @@
|
|||||||
|
# SPDX-License-Identifier: GPL-2.0
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://kernel.org/schemas/netlink/genetlink-c.yaml#
|
||||||
|
$schema: https://json-schema.org/draft-07/schema
|
||||||
|
|
||||||
|
# Common defines
|
||||||
|
$defs:
|
||||||
|
uint:
|
||||||
|
type: integer
|
||||||
|
minimum: 0
|
||||||
|
len-or-define:
|
||||||
|
type: [ string, integer ]
|
||||||
|
pattern: ^[0-9A-Za-z_]+( - 1)?$
|
||||||
|
minimum: 0
|
||||||
|
|
||||||
|
# Schema for specs
|
||||||
|
title: Protocol
|
||||||
|
description: Specification of a genetlink protocol
|
||||||
|
type: object
|
||||||
|
required: [ name, doc, attribute-sets, operations ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: Name of the genetlink family.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
version:
|
||||||
|
description: Generic Netlink family version. Default is 1.
|
||||||
|
type: integer
|
||||||
|
minimum: 1
|
||||||
|
protocol:
|
||||||
|
description: Schema compatibility level. Default is "genetlink".
|
||||||
|
enum: [ genetlink, genetlink-c ]
|
||||||
|
# Start genetlink-c
|
||||||
|
uapi-header:
|
||||||
|
description: Path to the uAPI header, default is linux/${family-name}.h
|
||||||
|
type: string
|
||||||
|
c-family-name:
|
||||||
|
description: Name of the define for the family name.
|
||||||
|
type: string
|
||||||
|
c-version-name:
|
||||||
|
description: Name of the define for the verion of the family.
|
||||||
|
type: string
|
||||||
|
max-by-define:
|
||||||
|
description: Makes the number of attributes and commands be specified by a define, not an enum value.
|
||||||
|
type: boolean
|
||||||
|
# End genetlink-c
|
||||||
|
|
||||||
|
definitions:
|
||||||
|
description: List of type and constant definitions (enums, flags, defines).
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ type, name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
header:
|
||||||
|
description: For C-compatible languages, header which already defines this value.
|
||||||
|
type: string
|
||||||
|
type:
|
||||||
|
enum: [ const, enum, flags ]
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
# For const
|
||||||
|
value:
|
||||||
|
description: For const - the value.
|
||||||
|
type: [ string, integer ]
|
||||||
|
# For enum and flags
|
||||||
|
value-start:
|
||||||
|
description: For enum or flags the literal initializer for the first value.
|
||||||
|
type: [ string, integer ]
|
||||||
|
entries:
|
||||||
|
description: For enum or flags array of values.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
oneOf:
|
||||||
|
- type: string
|
||||||
|
- type: object
|
||||||
|
required: [ name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
type: integer
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
render-max:
|
||||||
|
description: Render the max members for this enum.
|
||||||
|
type: boolean
|
||||||
|
# Start genetlink-c
|
||||||
|
enum-name:
|
||||||
|
description: Name for enum, if empty no name will be used.
|
||||||
|
type: [ string, "null" ]
|
||||||
|
name-prefix:
|
||||||
|
description: For enum the prefix of the values, optional.
|
||||||
|
type: string
|
||||||
|
# End genetlink-c
|
||||||
|
|
||||||
|
attribute-sets:
|
||||||
|
description: Definition of attribute spaces for this family.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
description: Definition of a single attribute space.
|
||||||
|
type: object
|
||||||
|
required: [ name, attributes ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: |
|
||||||
|
Name used when referring to this space in other definitions, not used outside of the spec.
|
||||||
|
type: string
|
||||||
|
name-prefix:
|
||||||
|
description: |
|
||||||
|
Prefix for the C enum name of the attributes. Default family[name]-set[name]-a-
|
||||||
|
type: string
|
||||||
|
enum-name:
|
||||||
|
description: Name for the enum type of the attribute.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
description: Documentation of the space.
|
||||||
|
type: string
|
||||||
|
subset-of:
|
||||||
|
description: |
|
||||||
|
Name of another space which this is a logical part of. Sub-spaces can be used to define
|
||||||
|
a limited group of attributes which are used in a nest.
|
||||||
|
type: string
|
||||||
|
# Start genetlink-c
|
||||||
|
attr-cnt-name:
|
||||||
|
description: The explicit name for constant holding the count of attributes (last attr + 1).
|
||||||
|
type: string
|
||||||
|
attr-max-name:
|
||||||
|
description: The explicit name for last member of attribute enum.
|
||||||
|
type: string
|
||||||
|
# End genetlink-c
|
||||||
|
attributes:
|
||||||
|
description: List of attributes in the space.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name, type ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
type: &attr-type
|
||||||
|
enum: [ unused, pad, flag, binary, u8, u16, u32, u64, s32, s64,
|
||||||
|
string, nest, array-nest, nest-type-value ]
|
||||||
|
doc:
|
||||||
|
description: Documentation of the attribute.
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
description: Value for the enum item representing this attribute in the uAPI.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
type-value:
|
||||||
|
description: Name of the value extracted from the type of a nest-type-value attribute.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
byte-order:
|
||||||
|
enum: [ little-endian, big-endian ]
|
||||||
|
multi-attr:
|
||||||
|
type: boolean
|
||||||
|
nested-attributes:
|
||||||
|
description: Name of the space (sub-space) used inside the attribute.
|
||||||
|
type: string
|
||||||
|
enum:
|
||||||
|
description: Name of the enum type used for the attribute.
|
||||||
|
type: string
|
||||||
|
enum-as-flags:
|
||||||
|
description: |
|
||||||
|
Treat the enum as flags. In most cases enum is either used as flags or as values.
|
||||||
|
Sometimes, however, both forms are necessary, in which case header contains the enum
|
||||||
|
form while specific attributes may request to convert the values into a bitfield.
|
||||||
|
type: boolean
|
||||||
|
checks:
|
||||||
|
description: Kernel input validation.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
flags-mask:
|
||||||
|
description: Name of the flags constant on which to base mask (unsigned scalar types only).
|
||||||
|
type: string
|
||||||
|
min:
|
||||||
|
description: Min value for an integer attribute.
|
||||||
|
type: integer
|
||||||
|
min-len:
|
||||||
|
description: Min length for a binary attribute.
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
max-len:
|
||||||
|
description: Max length for a string or a binary attribute.
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
sub-type: *attr-type
|
||||||
|
|
||||||
|
# Make sure name-prefix does not appear in subsets (subsets inherit naming)
|
||||||
|
dependencies:
|
||||||
|
name-prefix:
|
||||||
|
not:
|
||||||
|
required: [ subset-of ]
|
||||||
|
subset-of:
|
||||||
|
not:
|
||||||
|
required: [ name-prefix ]
|
||||||
|
|
||||||
|
operations:
|
||||||
|
description: Operations supported by the protocol.
|
||||||
|
type: object
|
||||||
|
required: [ list ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
enum-model:
|
||||||
|
description: |
|
||||||
|
The model of assigning values to the operations.
|
||||||
|
"unified" is the recommended model where all message types belong
|
||||||
|
to a single enum.
|
||||||
|
"directional" has the messages sent to the kernel and from the kernel
|
||||||
|
enumerated separately.
|
||||||
|
enum: [ unified ]
|
||||||
|
name-prefix:
|
||||||
|
description: |
|
||||||
|
Prefix for the C enum name of the command. The name is formed by concatenating
|
||||||
|
the prefix with the upper case name of the command, with dashes replaced by underscores.
|
||||||
|
type: string
|
||||||
|
enum-name:
|
||||||
|
description: Name for the enum type with commands.
|
||||||
|
type: string
|
||||||
|
async-prefix:
|
||||||
|
description: Same as name-prefix but used to render notifications and events to separate enum.
|
||||||
|
type: string
|
||||||
|
async-enum:
|
||||||
|
description: Name for the enum type with notifications/events.
|
||||||
|
type: string
|
||||||
|
list:
|
||||||
|
description: List of commands
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
required: [ name, doc ]
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: Name of the operation, also defining its C enum value in uAPI.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
description: Documentation for the command.
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
description: Value for the enum in the uAPI.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
attribute-set:
|
||||||
|
description: |
|
||||||
|
Attribute space from which attributes directly in the requests and replies
|
||||||
|
to this command are defined.
|
||||||
|
type: string
|
||||||
|
flags: &cmd_flags
|
||||||
|
description: Command flags.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
enum: [ admin-perm ]
|
||||||
|
dont-validate:
|
||||||
|
description: Kernel attribute validation flags.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
enum: [ strict, dump ]
|
||||||
|
do: &subop-type
|
||||||
|
description: Main command handler.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
request: &subop-attr-list
|
||||||
|
description: Definition of the request message for a given command.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
attributes:
|
||||||
|
description: |
|
||||||
|
Names of attributes from the attribute-set (not full attribute
|
||||||
|
definitions, just names).
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
reply: *subop-attr-list
|
||||||
|
pre:
|
||||||
|
description: Hook for a function to run before the main callback (pre_doit or start).
|
||||||
|
type: string
|
||||||
|
post:
|
||||||
|
description: Hook for a function to run after the main callback (post_doit or done).
|
||||||
|
type: string
|
||||||
|
dump: *subop-type
|
||||||
|
notify:
|
||||||
|
description: Name of the command sharing the reply type with this notification.
|
||||||
|
type: string
|
||||||
|
event:
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
attributes:
|
||||||
|
description: Explicit list of the attributes for the notification.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
mcgrp:
|
||||||
|
description: Name of the multicast group generating given notification.
|
||||||
|
type: string
|
||||||
|
mcast-groups:
|
||||||
|
description: List of multicast groups.
|
||||||
|
type: object
|
||||||
|
required: [ list ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
list:
|
||||||
|
description: List of groups.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: |
|
||||||
|
The name for the group, used to form the define and the value of the define.
|
||||||
|
type: string
|
||||||
|
# Start genetlink-c
|
||||||
|
c-define-name:
|
||||||
|
description: Override for the name of the define in C uAPI.
|
||||||
|
type: string
|
||||||
|
# End genetlink-c
|
||||||
|
flags: *cmd_flags
|
361
Documentation/netlink/genetlink-legacy.yaml
Normal file
361
Documentation/netlink/genetlink-legacy.yaml
Normal file
@ -0,0 +1,361 @@
|
|||||||
|
# SPDX-License-Identifier: GPL-2.0
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://kernel.org/schemas/netlink/genetlink-legacy.yaml#
|
||||||
|
$schema: https://json-schema.org/draft-07/schema
|
||||||
|
|
||||||
|
# Common defines
|
||||||
|
$defs:
|
||||||
|
uint:
|
||||||
|
type: integer
|
||||||
|
minimum: 0
|
||||||
|
len-or-define:
|
||||||
|
type: [ string, integer ]
|
||||||
|
pattern: ^[0-9A-Za-z_]+( - 1)?$
|
||||||
|
minimum: 0
|
||||||
|
|
||||||
|
# Schema for specs
|
||||||
|
title: Protocol
|
||||||
|
description: Specification of a genetlink protocol
|
||||||
|
type: object
|
||||||
|
required: [ name, doc, attribute-sets, operations ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: Name of the genetlink family.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
version:
|
||||||
|
description: Generic Netlink family version. Default is 1.
|
||||||
|
type: integer
|
||||||
|
minimum: 1
|
||||||
|
protocol:
|
||||||
|
description: Schema compatibility level. Default is "genetlink".
|
||||||
|
enum: [ genetlink, genetlink-c, genetlink-legacy ] # Trim
|
||||||
|
# Start genetlink-c
|
||||||
|
uapi-header:
|
||||||
|
description: Path to the uAPI header, default is linux/${family-name}.h
|
||||||
|
type: string
|
||||||
|
c-family-name:
|
||||||
|
description: Name of the define for the family name.
|
||||||
|
type: string
|
||||||
|
c-version-name:
|
||||||
|
description: Name of the define for the verion of the family.
|
||||||
|
type: string
|
||||||
|
max-by-define:
|
||||||
|
description: Makes the number of attributes and commands be specified by a define, not an enum value.
|
||||||
|
type: boolean
|
||||||
|
# End genetlink-c
|
||||||
|
# Start genetlink-legacy
|
||||||
|
kernel-policy:
|
||||||
|
description: |
|
||||||
|
Defines if the input policy in the kernel is global, per-operation, or split per operation type.
|
||||||
|
Default is split.
|
||||||
|
enum: [ split, per-op, global ]
|
||||||
|
# End genetlink-legacy
|
||||||
|
|
||||||
|
definitions:
|
||||||
|
description: List of type and constant definitions (enums, flags, defines).
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ type, name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
header:
|
||||||
|
description: For C-compatible languages, header which already defines this value.
|
||||||
|
type: string
|
||||||
|
type:
|
||||||
|
enum: [ const, enum, flags, struct ] # Trim
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
# For const
|
||||||
|
value:
|
||||||
|
description: For const - the value.
|
||||||
|
type: [ string, integer ]
|
||||||
|
# For enum and flags
|
||||||
|
value-start:
|
||||||
|
description: For enum or flags the literal initializer for the first value.
|
||||||
|
type: [ string, integer ]
|
||||||
|
entries:
|
||||||
|
description: For enum or flags array of values.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
oneOf:
|
||||||
|
- type: string
|
||||||
|
- type: object
|
||||||
|
required: [ name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
type: integer
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
render-max:
|
||||||
|
description: Render the max members for this enum.
|
||||||
|
type: boolean
|
||||||
|
# Start genetlink-c
|
||||||
|
enum-name:
|
||||||
|
description: Name for enum, if empty no name will be used.
|
||||||
|
type: [ string, "null" ]
|
||||||
|
name-prefix:
|
||||||
|
description: For enum the prefix of the values, optional.
|
||||||
|
type: string
|
||||||
|
# End genetlink-c
|
||||||
|
# Start genetlink-legacy
|
||||||
|
members:
|
||||||
|
description: List of struct members. Only scalars and strings members allowed.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name, type ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
type:
|
||||||
|
enum: [ u8, u16, u32, u64, s8, s16, s32, s64, string ]
|
||||||
|
len:
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
# End genetlink-legacy
|
||||||
|
|
||||||
|
attribute-sets:
|
||||||
|
description: Definition of attribute spaces for this family.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
description: Definition of a single attribute space.
|
||||||
|
type: object
|
||||||
|
required: [ name, attributes ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: |
|
||||||
|
Name used when referring to this space in other definitions, not used outside of the spec.
|
||||||
|
type: string
|
||||||
|
name-prefix:
|
||||||
|
description: |
|
||||||
|
Prefix for the C enum name of the attributes. Default family[name]-set[name]-a-
|
||||||
|
type: string
|
||||||
|
enum-name:
|
||||||
|
description: Name for the enum type of the attribute.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
description: Documentation of the space.
|
||||||
|
type: string
|
||||||
|
subset-of:
|
||||||
|
description: |
|
||||||
|
Name of another space which this is a logical part of. Sub-spaces can be used to define
|
||||||
|
a limited group of attributes which are used in a nest.
|
||||||
|
type: string
|
||||||
|
# Start genetlink-c
|
||||||
|
attr-cnt-name:
|
||||||
|
description: The explicit name for constant holding the count of attributes (last attr + 1).
|
||||||
|
type: string
|
||||||
|
attr-max-name:
|
||||||
|
description: The explicit name for last member of attribute enum.
|
||||||
|
type: string
|
||||||
|
# End genetlink-c
|
||||||
|
attributes:
|
||||||
|
description: List of attributes in the space.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name, type ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
type: &attr-type
|
||||||
|
enum: [ unused, pad, flag, binary, u8, u16, u32, u64, s32, s64,
|
||||||
|
string, nest, array-nest, nest-type-value ]
|
||||||
|
doc:
|
||||||
|
description: Documentation of the attribute.
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
description: Value for the enum item representing this attribute in the uAPI.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
type-value:
|
||||||
|
description: Name of the value extracted from the type of a nest-type-value attribute.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
byte-order:
|
||||||
|
enum: [ little-endian, big-endian ]
|
||||||
|
multi-attr:
|
||||||
|
type: boolean
|
||||||
|
nested-attributes:
|
||||||
|
description: Name of the space (sub-space) used inside the attribute.
|
||||||
|
type: string
|
||||||
|
enum:
|
||||||
|
description: Name of the enum type used for the attribute.
|
||||||
|
type: string
|
||||||
|
enum-as-flags:
|
||||||
|
description: |
|
||||||
|
Treat the enum as flags. In most cases enum is either used as flags or as values.
|
||||||
|
Sometimes, however, both forms are necessary, in which case header contains the enum
|
||||||
|
form while specific attributes may request to convert the values into a bitfield.
|
||||||
|
type: boolean
|
||||||
|
checks:
|
||||||
|
description: Kernel input validation.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
flags-mask:
|
||||||
|
description: Name of the flags constant on which to base mask (unsigned scalar types only).
|
||||||
|
type: string
|
||||||
|
min:
|
||||||
|
description: Min value for an integer attribute.
|
||||||
|
type: integer
|
||||||
|
min-len:
|
||||||
|
description: Min length for a binary attribute.
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
max-len:
|
||||||
|
description: Max length for a string or a binary attribute.
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
sub-type: *attr-type
|
||||||
|
|
||||||
|
# Make sure name-prefix does not appear in subsets (subsets inherit naming)
|
||||||
|
dependencies:
|
||||||
|
name-prefix:
|
||||||
|
not:
|
||||||
|
required: [ subset-of ]
|
||||||
|
subset-of:
|
||||||
|
not:
|
||||||
|
required: [ name-prefix ]
|
||||||
|
|
||||||
|
operations:
|
||||||
|
description: Operations supported by the protocol.
|
||||||
|
type: object
|
||||||
|
required: [ list ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
enum-model:
|
||||||
|
description: |
|
||||||
|
The model of assigning values to the operations.
|
||||||
|
"unified" is the recommended model where all message types belong
|
||||||
|
to a single enum.
|
||||||
|
"directional" has the messages sent to the kernel and from the kernel
|
||||||
|
enumerated separately.
|
||||||
|
enum: [ unified, directional ] # Trim
|
||||||
|
name-prefix:
|
||||||
|
description: |
|
||||||
|
Prefix for the C enum name of the command. The name is formed by concatenating
|
||||||
|
the prefix with the upper case name of the command, with dashes replaced by underscores.
|
||||||
|
type: string
|
||||||
|
enum-name:
|
||||||
|
description: Name for the enum type with commands.
|
||||||
|
type: string
|
||||||
|
async-prefix:
|
||||||
|
description: Same as name-prefix but used to render notifications and events to separate enum.
|
||||||
|
type: string
|
||||||
|
async-enum:
|
||||||
|
description: Name for the enum type with notifications/events.
|
||||||
|
type: string
|
||||||
|
list:
|
||||||
|
description: List of commands
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
required: [ name, doc ]
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: Name of the operation, also defining its C enum value in uAPI.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
description: Documentation for the command.
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
description: Value for the enum in the uAPI.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
attribute-set:
|
||||||
|
description: |
|
||||||
|
Attribute space from which attributes directly in the requests and replies
|
||||||
|
to this command are defined.
|
||||||
|
type: string
|
||||||
|
flags: &cmd_flags
|
||||||
|
description: Command flags.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
enum: [ admin-perm ]
|
||||||
|
dont-validate:
|
||||||
|
description: Kernel attribute validation flags.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
enum: [ strict, dump ]
|
||||||
|
do: &subop-type
|
||||||
|
description: Main command handler.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
request: &subop-attr-list
|
||||||
|
description: Definition of the request message for a given command.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
attributes:
|
||||||
|
description: |
|
||||||
|
Names of attributes from the attribute-set (not full attribute
|
||||||
|
definitions, just names).
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
# Start genetlink-legacy
|
||||||
|
value:
|
||||||
|
description: |
|
||||||
|
ID of this message if value for request and response differ,
|
||||||
|
i.e. requests and responses have different message enums.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
# End genetlink-legacy
|
||||||
|
reply: *subop-attr-list
|
||||||
|
pre:
|
||||||
|
description: Hook for a function to run before the main callback (pre_doit or start).
|
||||||
|
type: string
|
||||||
|
post:
|
||||||
|
description: Hook for a function to run after the main callback (post_doit or done).
|
||||||
|
type: string
|
||||||
|
dump: *subop-type
|
||||||
|
notify:
|
||||||
|
description: Name of the command sharing the reply type with this notification.
|
||||||
|
type: string
|
||||||
|
event:
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
attributes:
|
||||||
|
description: Explicit list of the attributes for the notification.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
mcgrp:
|
||||||
|
description: Name of the multicast group generating given notification.
|
||||||
|
type: string
|
||||||
|
mcast-groups:
|
||||||
|
description: List of multicast groups.
|
||||||
|
type: object
|
||||||
|
required: [ list ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
list:
|
||||||
|
description: List of groups.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: |
|
||||||
|
The name for the group, used to form the define and the value of the define.
|
||||||
|
type: string
|
||||||
|
# Start genetlink-c
|
||||||
|
c-define-name:
|
||||||
|
description: Override for the name of the define in C uAPI.
|
||||||
|
type: string
|
||||||
|
# End genetlink-c
|
||||||
|
flags: *cmd_flags
|
296
Documentation/netlink/genetlink.yaml
Normal file
296
Documentation/netlink/genetlink.yaml
Normal file
@ -0,0 +1,296 @@
|
|||||||
|
# SPDX-License-Identifier: GPL-2.0
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://kernel.org/schemas/netlink/genetlink-legacy.yaml#
|
||||||
|
$schema: https://json-schema.org/draft-07/schema
|
||||||
|
|
||||||
|
# Common defines
|
||||||
|
$defs:
|
||||||
|
uint:
|
||||||
|
type: integer
|
||||||
|
minimum: 0
|
||||||
|
len-or-define:
|
||||||
|
type: [ string, integer ]
|
||||||
|
pattern: ^[0-9A-Za-z_]+( - 1)?$
|
||||||
|
minimum: 0
|
||||||
|
|
||||||
|
# Schema for specs
|
||||||
|
title: Protocol
|
||||||
|
description: Specification of a genetlink protocol
|
||||||
|
type: object
|
||||||
|
required: [ name, doc, attribute-sets, operations ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: Name of the genetlink family.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
version:
|
||||||
|
description: Generic Netlink family version. Default is 1.
|
||||||
|
type: integer
|
||||||
|
minimum: 1
|
||||||
|
protocol:
|
||||||
|
description: Schema compatibility level. Default is "genetlink".
|
||||||
|
enum: [ genetlink ]
|
||||||
|
|
||||||
|
definitions:
|
||||||
|
description: List of type and constant definitions (enums, flags, defines).
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ type, name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
header:
|
||||||
|
description: For C-compatible languages, header which already defines this value.
|
||||||
|
type: string
|
||||||
|
type:
|
||||||
|
enum: [ const, enum, flags ]
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
# For const
|
||||||
|
value:
|
||||||
|
description: For const - the value.
|
||||||
|
type: [ string, integer ]
|
||||||
|
# For enum and flags
|
||||||
|
value-start:
|
||||||
|
description: For enum or flags the literal initializer for the first value.
|
||||||
|
type: [ string, integer ]
|
||||||
|
entries:
|
||||||
|
description: For enum or flags array of values.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
oneOf:
|
||||||
|
- type: string
|
||||||
|
- type: object
|
||||||
|
required: [ name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
type: integer
|
||||||
|
doc:
|
||||||
|
type: string
|
||||||
|
render-max:
|
||||||
|
description: Render the max members for this enum.
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
attribute-sets:
|
||||||
|
description: Definition of attribute spaces for this family.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
description: Definition of a single attribute space.
|
||||||
|
type: object
|
||||||
|
required: [ name, attributes ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: |
|
||||||
|
Name used when referring to this space in other definitions, not used outside of the spec.
|
||||||
|
type: string
|
||||||
|
name-prefix:
|
||||||
|
description: |
|
||||||
|
Prefix for the C enum name of the attributes. Default family[name]-set[name]-a-
|
||||||
|
type: string
|
||||||
|
enum-name:
|
||||||
|
description: Name for the enum type of the attribute.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
description: Documentation of the space.
|
||||||
|
type: string
|
||||||
|
subset-of:
|
||||||
|
description: |
|
||||||
|
Name of another space which this is a logical part of. Sub-spaces can be used to define
|
||||||
|
a limited group of attributes which are used in a nest.
|
||||||
|
type: string
|
||||||
|
attributes:
|
||||||
|
description: List of attributes in the space.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name, type ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
type: &attr-type
|
||||||
|
enum: [ unused, pad, flag, binary, u8, u16, u32, u64, s32, s64,
|
||||||
|
string, nest, array-nest, nest-type-value ]
|
||||||
|
doc:
|
||||||
|
description: Documentation of the attribute.
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
description: Value for the enum item representing this attribute in the uAPI.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
type-value:
|
||||||
|
description: Name of the value extracted from the type of a nest-type-value attribute.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
byte-order:
|
||||||
|
enum: [ little-endian, big-endian ]
|
||||||
|
multi-attr:
|
||||||
|
type: boolean
|
||||||
|
nested-attributes:
|
||||||
|
description: Name of the space (sub-space) used inside the attribute.
|
||||||
|
type: string
|
||||||
|
enum:
|
||||||
|
description: Name of the enum type used for the attribute.
|
||||||
|
type: string
|
||||||
|
enum-as-flags:
|
||||||
|
description: |
|
||||||
|
Treat the enum as flags. In most cases enum is either used as flags or as values.
|
||||||
|
Sometimes, however, both forms are necessary, in which case header contains the enum
|
||||||
|
form while specific attributes may request to convert the values into a bitfield.
|
||||||
|
type: boolean
|
||||||
|
checks:
|
||||||
|
description: Kernel input validation.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
flags-mask:
|
||||||
|
description: Name of the flags constant on which to base mask (unsigned scalar types only).
|
||||||
|
type: string
|
||||||
|
min:
|
||||||
|
description: Min value for an integer attribute.
|
||||||
|
type: integer
|
||||||
|
min-len:
|
||||||
|
description: Min length for a binary attribute.
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
max-len:
|
||||||
|
description: Max length for a string or a binary attribute.
|
||||||
|
$ref: '#/$defs/len-or-define'
|
||||||
|
sub-type: *attr-type
|
||||||
|
|
||||||
|
# Make sure name-prefix does not appear in subsets (subsets inherit naming)
|
||||||
|
dependencies:
|
||||||
|
name-prefix:
|
||||||
|
not:
|
||||||
|
required: [ subset-of ]
|
||||||
|
subset-of:
|
||||||
|
not:
|
||||||
|
required: [ name-prefix ]
|
||||||
|
|
||||||
|
operations:
|
||||||
|
description: Operations supported by the protocol.
|
||||||
|
type: object
|
||||||
|
required: [ list ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
enum-model:
|
||||||
|
description: |
|
||||||
|
The model of assigning values to the operations.
|
||||||
|
"unified" is the recommended model where all message types belong
|
||||||
|
to a single enum.
|
||||||
|
"directional" has the messages sent to the kernel and from the kernel
|
||||||
|
enumerated separately.
|
||||||
|
enum: [ unified ]
|
||||||
|
name-prefix:
|
||||||
|
description: |
|
||||||
|
Prefix for the C enum name of the command. The name is formed by concatenating
|
||||||
|
the prefix with the upper case name of the command, with dashes replaced by underscores.
|
||||||
|
type: string
|
||||||
|
enum-name:
|
||||||
|
description: Name for the enum type with commands.
|
||||||
|
type: string
|
||||||
|
async-prefix:
|
||||||
|
description: Same as name-prefix but used to render notifications and events to separate enum.
|
||||||
|
type: string
|
||||||
|
async-enum:
|
||||||
|
description: Name for the enum type with notifications/events.
|
||||||
|
type: string
|
||||||
|
list:
|
||||||
|
description: List of commands
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
required: [ name, doc ]
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: Name of the operation, also defining its C enum value in uAPI.
|
||||||
|
type: string
|
||||||
|
doc:
|
||||||
|
description: Documentation for the command.
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
description: Value for the enum in the uAPI.
|
||||||
|
$ref: '#/$defs/uint'
|
||||||
|
attribute-set:
|
||||||
|
description: |
|
||||||
|
Attribute space from which attributes directly in the requests and replies
|
||||||
|
to this command are defined.
|
||||||
|
type: string
|
||||||
|
flags: &cmd_flags
|
||||||
|
description: Command flags.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
enum: [ admin-perm ]
|
||||||
|
dont-validate:
|
||||||
|
description: Kernel attribute validation flags.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
enum: [ strict, dump ]
|
||||||
|
do: &subop-type
|
||||||
|
description: Main command handler.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
request: &subop-attr-list
|
||||||
|
description: Definition of the request message for a given command.
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
attributes:
|
||||||
|
description: |
|
||||||
|
Names of attributes from the attribute-set (not full attribute
|
||||||
|
definitions, just names).
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
reply: *subop-attr-list
|
||||||
|
pre:
|
||||||
|
description: Hook for a function to run before the main callback (pre_doit or start).
|
||||||
|
type: string
|
||||||
|
post:
|
||||||
|
description: Hook for a function to run after the main callback (post_doit or done).
|
||||||
|
type: string
|
||||||
|
dump: *subop-type
|
||||||
|
notify:
|
||||||
|
description: Name of the command sharing the reply type with this notification.
|
||||||
|
type: string
|
||||||
|
event:
|
||||||
|
type: object
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
attributes:
|
||||||
|
description: Explicit list of the attributes for the notification.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
mcgrp:
|
||||||
|
description: Name of the multicast group generating given notification.
|
||||||
|
type: string
|
||||||
|
mcast-groups:
|
||||||
|
description: List of multicast groups.
|
||||||
|
type: object
|
||||||
|
required: [ list ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
list:
|
||||||
|
description: List of groups.
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: object
|
||||||
|
required: [ name ]
|
||||||
|
additionalProperties: False
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
description: |
|
||||||
|
The name for the group, used to form the define and the value of the define.
|
||||||
|
type: string
|
||||||
|
flags: *cmd_flags
|
397
Documentation/netlink/specs/ethtool.yaml
Normal file
397
Documentation/netlink/specs/ethtool.yaml
Normal file
@ -0,0 +1,397 @@
|
|||||||
|
name: ethtool
|
||||||
|
|
||||||
|
protocol: genetlink-legacy
|
||||||
|
|
||||||
|
doc: Partial family for Ethtool Netlink.
|
||||||
|
|
||||||
|
attribute-sets:
|
||||||
|
-
|
||||||
|
name: header
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: dev-index
|
||||||
|
type: u32
|
||||||
|
value: 1
|
||||||
|
-
|
||||||
|
name: dev-name
|
||||||
|
type: string
|
||||||
|
-
|
||||||
|
name: flags
|
||||||
|
type: u32
|
||||||
|
|
||||||
|
-
|
||||||
|
name: bitset-bit
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: index
|
||||||
|
type: u32
|
||||||
|
value: 1
|
||||||
|
-
|
||||||
|
name: name
|
||||||
|
type: string
|
||||||
|
-
|
||||||
|
name: value
|
||||||
|
type: flag
|
||||||
|
-
|
||||||
|
name: bitset-bits
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: bit
|
||||||
|
type: nest
|
||||||
|
nested-attributes: bitset-bit
|
||||||
|
value: 1
|
||||||
|
-
|
||||||
|
name: bitset
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: nomask
|
||||||
|
type: flag
|
||||||
|
value: 1
|
||||||
|
-
|
||||||
|
name: size
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: bits
|
||||||
|
type: nest
|
||||||
|
nested-attributes: bitset-bits
|
||||||
|
|
||||||
|
-
|
||||||
|
name: string
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: index
|
||||||
|
type: u32
|
||||||
|
value: 1
|
||||||
|
-
|
||||||
|
name: value
|
||||||
|
type: string
|
||||||
|
-
|
||||||
|
name: strings
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: string
|
||||||
|
type: nest
|
||||||
|
value: 1
|
||||||
|
multi-attr: true
|
||||||
|
nested-attributes: string
|
||||||
|
-
|
||||||
|
name: stringset
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: id
|
||||||
|
type: u32
|
||||||
|
value: 1
|
||||||
|
-
|
||||||
|
name: count
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: strings
|
||||||
|
type: nest
|
||||||
|
multi-attr: true
|
||||||
|
nested-attributes: strings
|
||||||
|
-
|
||||||
|
name: stringsets
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: stringset
|
||||||
|
type: nest
|
||||||
|
multi-attr: true
|
||||||
|
value: 1
|
||||||
|
nested-attributes: stringset
|
||||||
|
-
|
||||||
|
name: strset
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: header
|
||||||
|
value: 1
|
||||||
|
type: nest
|
||||||
|
nested-attributes: header
|
||||||
|
-
|
||||||
|
name: stringsets
|
||||||
|
type: nest
|
||||||
|
nested-attributes: stringsets
|
||||||
|
-
|
||||||
|
name: counts-only
|
||||||
|
type: flag
|
||||||
|
|
||||||
|
-
|
||||||
|
name: privflags
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: header
|
||||||
|
value: 1
|
||||||
|
type: nest
|
||||||
|
nested-attributes: header
|
||||||
|
-
|
||||||
|
name: flags
|
||||||
|
type: nest
|
||||||
|
nested-attributes: bitset
|
||||||
|
|
||||||
|
-
|
||||||
|
name: rings
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: header
|
||||||
|
value: 1
|
||||||
|
type: nest
|
||||||
|
nested-attributes: header
|
||||||
|
-
|
||||||
|
name: rx-max
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: rx-mini-max
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: rx-jumbo-max
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: tx-max
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: rx
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: rx-mini
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: rx-jumbo
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: tx
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: rx-buf-len
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: tcp-data-split
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: cqe-size
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: tx-push
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: rx-push
|
||||||
|
type: u8
|
||||||
|
|
||||||
|
-
|
||||||
|
name: mm-stat
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: pad
|
||||||
|
value: 1
|
||||||
|
type: pad
|
||||||
|
-
|
||||||
|
name: reassembly-errors
|
||||||
|
type: u64
|
||||||
|
-
|
||||||
|
name: smd-errors
|
||||||
|
type: u64
|
||||||
|
-
|
||||||
|
name: reassembly-ok
|
||||||
|
type: u64
|
||||||
|
-
|
||||||
|
name: rx-frag-count
|
||||||
|
type: u64
|
||||||
|
-
|
||||||
|
name: tx-frag-count
|
||||||
|
type: u64
|
||||||
|
-
|
||||||
|
name: hold-count
|
||||||
|
type: u64
|
||||||
|
-
|
||||||
|
name: mm
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: header
|
||||||
|
value: 1
|
||||||
|
type: nest
|
||||||
|
nested-attributes: header
|
||||||
|
-
|
||||||
|
name: pmac-enabled
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: tx-enabled
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: tx-active
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: tx-min-frag-size
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: tx-min-frag-size
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: verify-enabled
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: verify-status
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: verify-time
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: max-verify-time
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: stats
|
||||||
|
type: nest
|
||||||
|
nested-attributes: mm-stat
|
||||||
|
|
||||||
|
operations:
|
||||||
|
enum-model: directional
|
||||||
|
list:
|
||||||
|
-
|
||||||
|
name: strset-get
|
||||||
|
doc: Get string set from the kernel.
|
||||||
|
|
||||||
|
attribute-set: strset
|
||||||
|
|
||||||
|
do: &strset-get-op
|
||||||
|
request:
|
||||||
|
value: 1
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- stringsets
|
||||||
|
- counts-only
|
||||||
|
reply:
|
||||||
|
value: 1
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- stringsets
|
||||||
|
dump: *strset-get-op
|
||||||
|
|
||||||
|
# TODO: fill in the requests in between
|
||||||
|
|
||||||
|
-
|
||||||
|
name: privflags-get
|
||||||
|
doc: Get device private flags.
|
||||||
|
|
||||||
|
attribute-set: privflags
|
||||||
|
|
||||||
|
do: &privflag-get-op
|
||||||
|
request:
|
||||||
|
value: 13
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
reply:
|
||||||
|
value: 14
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- flags
|
||||||
|
dump: *privflag-get-op
|
||||||
|
-
|
||||||
|
name: privflags-set
|
||||||
|
doc: Set device private flags.
|
||||||
|
|
||||||
|
attribute-set: privflags
|
||||||
|
|
||||||
|
do:
|
||||||
|
request:
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- flags
|
||||||
|
-
|
||||||
|
name: privflags-ntf
|
||||||
|
doc: Notification for change in device private flags.
|
||||||
|
notify: privflags-get
|
||||||
|
|
||||||
|
-
|
||||||
|
name: rings-get
|
||||||
|
doc: Get ring params.
|
||||||
|
|
||||||
|
attribute-set: rings
|
||||||
|
|
||||||
|
do: &ring-get-op
|
||||||
|
request:
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
reply:
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- rx-max
|
||||||
|
- rx-mini-max
|
||||||
|
- rx-jumbo-max
|
||||||
|
- tx-max
|
||||||
|
- rx
|
||||||
|
- rx-mini
|
||||||
|
- rx-jumbo
|
||||||
|
- tx
|
||||||
|
- rx-buf-len
|
||||||
|
- tcp-data-split
|
||||||
|
- cqe-size
|
||||||
|
- tx-push
|
||||||
|
- rx-push
|
||||||
|
dump: *ring-get-op
|
||||||
|
-
|
||||||
|
name: rings-set
|
||||||
|
doc: Set ring params.
|
||||||
|
|
||||||
|
attribute-set: rings
|
||||||
|
|
||||||
|
do:
|
||||||
|
request:
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- rx
|
||||||
|
- rx-mini
|
||||||
|
- rx-jumbo
|
||||||
|
- tx
|
||||||
|
- rx-buf-len
|
||||||
|
- tcp-data-split
|
||||||
|
- cqe-size
|
||||||
|
- tx-push
|
||||||
|
- rx-push
|
||||||
|
-
|
||||||
|
name: rings-ntf
|
||||||
|
doc: Notification for change in ring params.
|
||||||
|
notify: rings-get
|
||||||
|
|
||||||
|
# TODO: fill in the requests in between
|
||||||
|
|
||||||
|
-
|
||||||
|
name: mm-get
|
||||||
|
doc: Get MAC Merge configuration and state
|
||||||
|
|
||||||
|
attribute-set: mm
|
||||||
|
|
||||||
|
do: &mm-get-op
|
||||||
|
request:
|
||||||
|
value: 42
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
reply:
|
||||||
|
value: 42
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- pmac-enabled
|
||||||
|
- tx-enabled
|
||||||
|
- tx-active
|
||||||
|
- tx-min-frag-size
|
||||||
|
- rx-min-frag-size
|
||||||
|
- verify-enabled
|
||||||
|
- verify-time
|
||||||
|
- max-verify-time
|
||||||
|
- stats
|
||||||
|
dump: *mm-get-op
|
||||||
|
-
|
||||||
|
name: mm-set
|
||||||
|
doc: Set MAC Merge configuration
|
||||||
|
|
||||||
|
attribute-set: mm
|
||||||
|
|
||||||
|
do:
|
||||||
|
request:
|
||||||
|
attributes:
|
||||||
|
- header
|
||||||
|
- verify-enabled
|
||||||
|
- verify-time
|
||||||
|
- tx-enabled
|
||||||
|
- pmac-enabled
|
||||||
|
- tx-min-frag-size
|
||||||
|
-
|
||||||
|
name: mm-ntf
|
||||||
|
doc: Notification for change in MAC Merge configuration.
|
||||||
|
notify: mm-get
|
128
Documentation/netlink/specs/fou.yaml
Normal file
128
Documentation/netlink/specs/fou.yaml
Normal file
@ -0,0 +1,128 @@
|
|||||||
|
name: fou
|
||||||
|
|
||||||
|
protocol: genetlink-legacy
|
||||||
|
|
||||||
|
doc: |
|
||||||
|
Foo-over-UDP.
|
||||||
|
|
||||||
|
c-family-name: fou-genl-name
|
||||||
|
c-version-name: fou-genl-version
|
||||||
|
max-by-define: true
|
||||||
|
kernel-policy: global
|
||||||
|
|
||||||
|
definitions:
|
||||||
|
-
|
||||||
|
type: enum
|
||||||
|
name: encap_type
|
||||||
|
name-prefix: fou-encap-
|
||||||
|
enum-name:
|
||||||
|
entries: [ unspec, direct, gue ]
|
||||||
|
|
||||||
|
attribute-sets:
|
||||||
|
-
|
||||||
|
name: fou
|
||||||
|
name-prefix: fou-attr-
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: unspec
|
||||||
|
type: unused
|
||||||
|
-
|
||||||
|
name: port
|
||||||
|
type: u16
|
||||||
|
byte-order: big-endian
|
||||||
|
-
|
||||||
|
name: af
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: ipproto
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: type
|
||||||
|
type: u8
|
||||||
|
-
|
||||||
|
name: remcsum_nopartial
|
||||||
|
type: flag
|
||||||
|
-
|
||||||
|
name: local_v4
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: local_v6
|
||||||
|
type: binary
|
||||||
|
checks:
|
||||||
|
min-len: 16
|
||||||
|
-
|
||||||
|
name: peer_v4
|
||||||
|
type: u32
|
||||||
|
-
|
||||||
|
name: peer_v6
|
||||||
|
type: binary
|
||||||
|
checks:
|
||||||
|
min-len: 16
|
||||||
|
-
|
||||||
|
name: peer_port
|
||||||
|
type: u16
|
||||||
|
byte-order: big-endian
|
||||||
|
-
|
||||||
|
name: ifindex
|
||||||
|
type: s32
|
||||||
|
|
||||||
|
operations:
|
||||||
|
list:
|
||||||
|
-
|
||||||
|
name: unspec
|
||||||
|
doc: unused
|
||||||
|
|
||||||
|
-
|
||||||
|
name: add
|
||||||
|
doc: Add port.
|
||||||
|
attribute-set: fou
|
||||||
|
|
||||||
|
dont-validate: [ strict, dump ]
|
||||||
|
flags: [ admin-perm ]
|
||||||
|
|
||||||
|
do:
|
||||||
|
request: &all_attrs
|
||||||
|
attributes:
|
||||||
|
- port
|
||||||
|
- ipproto
|
||||||
|
- type
|
||||||
|
- remcsum_nopartial
|
||||||
|
- local_v4
|
||||||
|
- peer_v4
|
||||||
|
- local_v6
|
||||||
|
- peer_v6
|
||||||
|
- peer_port
|
||||||
|
- ifindex
|
||||||
|
|
||||||
|
-
|
||||||
|
name: del
|
||||||
|
doc: Delete port.
|
||||||
|
attribute-set: fou
|
||||||
|
|
||||||
|
dont-validate: [ strict, dump ]
|
||||||
|
flags: [ admin-perm ]
|
||||||
|
|
||||||
|
do:
|
||||||
|
request: &select_attrs
|
||||||
|
attributes:
|
||||||
|
- af
|
||||||
|
- ifindex
|
||||||
|
- port
|
||||||
|
- peer_port
|
||||||
|
- local_v4
|
||||||
|
- peer_v4
|
||||||
|
- local_v6
|
||||||
|
- peer_v6
|
||||||
|
|
||||||
|
-
|
||||||
|
name: get
|
||||||
|
doc: Get tunnel info.
|
||||||
|
attribute-set: fou
|
||||||
|
dont-validate: [ strict, dump ]
|
||||||
|
|
||||||
|
do:
|
||||||
|
request: *select_attrs
|
||||||
|
reply: *all_attrs
|
||||||
|
|
||||||
|
dump:
|
||||||
|
reply: *all_attrs
|
100
Documentation/netlink/specs/netdev.yaml
Normal file
100
Documentation/netlink/specs/netdev.yaml
Normal file
@ -0,0 +1,100 @@
|
|||||||
|
name: netdev
|
||||||
|
|
||||||
|
doc:
|
||||||
|
netdev configuration over generic netlink.
|
||||||
|
|
||||||
|
definitions:
|
||||||
|
-
|
||||||
|
type: flags
|
||||||
|
name: xdp-act
|
||||||
|
entries:
|
||||||
|
-
|
||||||
|
name: basic
|
||||||
|
doc:
|
||||||
|
XDP feautues set supported by all drivers
|
||||||
|
(XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX)
|
||||||
|
-
|
||||||
|
name: redirect
|
||||||
|
doc:
|
||||||
|
The netdev supports XDP_REDIRECT
|
||||||
|
-
|
||||||
|
name: ndo-xmit
|
||||||
|
doc:
|
||||||
|
This feature informs if netdev implements ndo_xdp_xmit callback.
|
||||||
|
-
|
||||||
|
name: xsk-zerocopy
|
||||||
|
doc:
|
||||||
|
This feature informs if netdev supports AF_XDP in zero copy mode.
|
||||||
|
-
|
||||||
|
name: hw-offload
|
||||||
|
doc:
|
||||||
|
This feature informs if netdev supports XDP hw oflloading.
|
||||||
|
-
|
||||||
|
name: rx-sg
|
||||||
|
doc:
|
||||||
|
This feature informs if netdev implements non-linear XDP buffer
|
||||||
|
support in the driver napi callback.
|
||||||
|
-
|
||||||
|
name: ndo-xmit-sg
|
||||||
|
doc:
|
||||||
|
This feature informs if netdev implements non-linear XDP buffer
|
||||||
|
support in ndo_xdp_xmit callback.
|
||||||
|
|
||||||
|
attribute-sets:
|
||||||
|
-
|
||||||
|
name: dev
|
||||||
|
attributes:
|
||||||
|
-
|
||||||
|
name: ifindex
|
||||||
|
doc: netdev ifindex
|
||||||
|
type: u32
|
||||||
|
value: 1
|
||||||
|
checks:
|
||||||
|
min: 1
|
||||||
|
-
|
||||||
|
name: pad
|
||||||
|
type: pad
|
||||||
|
-
|
||||||
|
name: xdp-features
|
||||||
|
doc: Bitmask of enabled xdp-features.
|
||||||
|
type: u64
|
||||||
|
enum: xdp-act
|
||||||
|
enum-as-flags: true
|
||||||
|
|
||||||
|
operations:
|
||||||
|
list:
|
||||||
|
-
|
||||||
|
name: dev-get
|
||||||
|
doc: Get / dump information about a netdev.
|
||||||
|
value: 1
|
||||||
|
attribute-set: dev
|
||||||
|
do:
|
||||||
|
request:
|
||||||
|
attributes:
|
||||||
|
- ifindex
|
||||||
|
reply: &dev-all
|
||||||
|
attributes:
|
||||||
|
- ifindex
|
||||||
|
- xdp-features
|
||||||
|
dump:
|
||||||
|
reply: *dev-all
|
||||||
|
-
|
||||||
|
name: dev-add-ntf
|
||||||
|
doc: Notification about device appearing.
|
||||||
|
notify: dev-get
|
||||||
|
mcgrp: mgmt
|
||||||
|
-
|
||||||
|
name: dev-del-ntf
|
||||||
|
doc: Notification about device disappearing.
|
||||||
|
notify: dev-get
|
||||||
|
mcgrp: mgmt
|
||||||
|
-
|
||||||
|
name: dev-change-ntf
|
||||||
|
doc: Notification about device configuration being changed.
|
||||||
|
notify: dev-get
|
||||||
|
mcgrp: mgmt
|
||||||
|
|
||||||
|
mcast-groups:
|
||||||
|
list:
|
||||||
|
-
|
||||||
|
name: mgmt
|
@ -419,7 +419,7 @@ XDP_UMEM_REG setsockopt
|
|||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
This setsockopt registers a UMEM to a socket. This is the area that
|
This setsockopt registers a UMEM to a socket. This is the area that
|
||||||
contain all the buffers that packet can recide in. The call takes a
|
contain all the buffers that packet can reside in. The call takes a
|
||||||
pointer to the beginning of this area and the size of it. Moreover, it
|
pointer to the beginning of this area and the size of it. Moreover, it
|
||||||
also has parameter called chunk_size that is the size that the UMEM is
|
also has parameter called chunk_size that is the size that the UMEM is
|
||||||
divided into. It can only be 2K or 4K at the moment. If you have an
|
divided into. It can only be 2K or 4K at the moment. If you have an
|
||||||
@ -592,7 +592,7 @@ A: When a netdev of a physical NIC is initialized, Linux usually
|
|||||||
A number of other ways are possible all up to the capabilities of
|
A number of other ways are possible all up to the capabilities of
|
||||||
the NIC you have.
|
the NIC you have.
|
||||||
|
|
||||||
Q: Can I use the XSKMAP to implement a switch betwen different umems
|
Q: Can I use the XSKMAP to implement a switch between different umems
|
||||||
in copy mode?
|
in copy mode?
|
||||||
|
|
||||||
A: The short answer is no, that is not supported at the moment. The
|
A: The short answer is no, that is not supported at the moment. The
|
||||||
|
@ -1902,7 +1902,7 @@ of 32 possible I/O Base addresses using the following tables::
|
|||||||
6 | 10
|
6 | 10
|
||||||
|
|
||||||
The I/O address is sum of all switches set to "1". Remember that
|
The I/O address is sum of all switches set to "1". Remember that
|
||||||
the I/O address space bellow 0x200 is RESERVED for mainboard, so
|
the I/O address space below 0x200 is RESERVED for mainboard, so
|
||||||
switch 1 should be ALWAYS SET TO OFF.
|
switch 1 should be ALWAYS SET TO OFF.
|
||||||
|
|
||||||
|
|
||||||
|
@ -159,7 +159,7 @@ Please send us comments, experiences, questions, anything :)
|
|||||||
IRC:
|
IRC:
|
||||||
#batadv on ircs://irc.hackint.org/
|
#batadv on ircs://irc.hackint.org/
|
||||||
Mailing-list:
|
Mailing-list:
|
||||||
b.a.t.m.a.n@open-mesh.org (optional subscription at
|
b.a.t.m.a.n@lists.open-mesh.org (optional subscription at
|
||||||
https://lists.open-mesh.org/mailman3/postorius/lists/b.a.t.m.a.n.lists.open-mesh.org/)
|
https://lists.open-mesh.org/mailman3/postorius/lists/b.a.t.m.a.n.lists.open-mesh.org/)
|
||||||
|
|
||||||
You can also contact the Authors:
|
You can also contact the Authors:
|
||||||
|
@ -931,7 +931,7 @@ ival1:
|
|||||||
ival2:
|
ival2:
|
||||||
Throttle the received message rate down to the value of ival2. This
|
Throttle the received message rate down to the value of ival2. This
|
||||||
is useful to reduce messages for the application when the signal inside the
|
is useful to reduce messages for the application when the signal inside the
|
||||||
CAN frame is stateless as state changes within the ival2 periode may get
|
CAN frame is stateless as state changes within the ival2 period may get
|
||||||
lost.
|
lost.
|
||||||
|
|
||||||
Broadcast Manager Multiplex Message Receive Filter
|
Broadcast Manager Multiplex Message Receive Filter
|
||||||
|
@ -50,7 +50,7 @@ Setup Packet
|
|||||||
``wIndex`` USB Interface Index (0 for device commands)
|
``wIndex`` USB Interface Index (0 for device commands)
|
||||||
``wLength`` * Host to Device - Number of bytes to transmit
|
``wLength`` * Host to Device - Number of bytes to transmit
|
||||||
* Device to Host - Maximum Number of bytes to
|
* Device to Host - Maximum Number of bytes to
|
||||||
receive. If the device send less. Commom ZLP
|
receive. If the device send less. Common ZLP
|
||||||
semantics are used.
|
semantics are used.
|
||||||
================= =====================================================
|
================= =====================================================
|
||||||
|
|
||||||
|
@ -93,7 +93,7 @@ MBIM function can be looked up using sysfs. For example::
|
|||||||
USB configuration descriptors
|
USB configuration descriptors
|
||||||
-----------------------------
|
-----------------------------
|
||||||
The wMaxControlMessage field of the CDC MBIM functional descriptor
|
The wMaxControlMessage field of the CDC MBIM functional descriptor
|
||||||
limits the maximum control message size. The managament application is
|
limits the maximum control message size. The management application is
|
||||||
responsible for negotiating a control message size complying with the
|
responsible for negotiating a control message size complying with the
|
||||||
requirements in section 9.3.1 of [1], taking this descriptor field
|
requirements in section 9.3.1 of [1], taking this descriptor field
|
||||||
into consideration.
|
into consideration.
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
ATM (i)Chip IA Linux Driver Source
|
ATM (i)Chip IA Linux Driver Source
|
||||||
==================================
|
==================================
|
||||||
|
|
||||||
READ ME FISRT
|
READ ME FIRST
|
||||||
|
|
||||||
--------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -577,7 +577,7 @@ CTU CAN FD IP Core and Driver Development Acknowledgment
|
|||||||
|
|
||||||
* Linux driver development
|
* Linux driver development
|
||||||
* continuous integration platform architect and GHDL updates
|
* continuous integration platform architect and GHDL updates
|
||||||
* theses `Open-source and Open-hardware CAN FD Protocol Support <https://dspace.cvut.cz/bitstream/handle/10467/80366/F3-DP-2019-Jerabek-Martin-Jerabek-thesis-2019-canfd.pdf>`_
|
* thesis `Open-source and Open-hardware CAN FD Protocol Support <https://dspace.cvut.cz/bitstream/handle/10467/80366/F3-DP-2019-Jerabek-Martin-Jerabek-thesis-2019-canfd.pdf>`_
|
||||||
|
|
||||||
* Jiri Novak <jnovak@fel.cvut.cz>
|
* Jiri Novak <jnovak@fel.cvut.cz>
|
||||||
|
|
||||||
@ -603,7 +603,7 @@ CTU CAN FD IP Core and Driver Development Acknowledgment
|
|||||||
* Jan Charvat
|
* Jan Charvat
|
||||||
|
|
||||||
* implemented CTU CAN FD functional model for QEMU which has been integrated into QEMU mainline (`docs/system/devices/can.rst <https://www.qemu.org/docs/master/system/devices/can.html>`_)
|
* implemented CTU CAN FD functional model for QEMU which has been integrated into QEMU mainline (`docs/system/devices/can.rst <https://www.qemu.org/docs/master/system/devices/can.html>`_)
|
||||||
* Bachelor theses Model of CAN FD Communication Controller for QEMU Emulator
|
* Bachelor thesis Model of CAN FD Communication Controller for QEMU Emulator
|
||||||
|
|
||||||
Notes
|
Notes
|
||||||
-----
|
-----
|
||||||
|
@ -129,10 +129,10 @@
|
|||||||
</g>
|
</g>
|
||||||
</g>
|
</g>
|
||||||
<text transform="matrix(.264583 0 0 .264583 91.8919 139.964)" x="26.959213" y="9.11724" fill="#2aa1ff" filter="url(#filter1204-6-2-9-1-3-1)" font-size="12px" stroke-width="3.77953" text-align="center" text-anchor="middle" style="line-height:1.1" xml:space="preserve"><tspan x="26.959213" y="9.11724" text-align="center">Set</tspan><tspan x="26.959213" y="22.31724" text-align="center">abort</tspan></text>
|
<text transform="matrix(.264583 0 0 .264583 91.8919 139.964)" x="26.959213" y="9.11724" fill="#2aa1ff" filter="url(#filter1204-6-2-9-1-3-1)" font-size="12px" stroke-width="3.77953" text-align="center" text-anchor="middle" style="line-height:1.1" xml:space="preserve"><tspan x="26.959213" y="9.11724" text-align="center">Set</tspan><tspan x="26.959213" y="22.31724" text-align="center">abort</tspan></text>
|
||||||
<text transform="translate(49.0277 104.823)" x="57.620724" y="16.855087" filter="url(#filter1204)" font-size="3.175px" text-align="center" text-anchor="middle" style="line-height:1.1" xml:space="preserve"><tspan x="57.620724" y="16.855087" text-align="center">Transmission</tspan><tspan x="57.620724" y="20.347588" text-align="center">unsuccesfull</tspan></text>
|
<text transform="translate(49.0277 104.823)" x="57.620724" y="16.855087" filter="url(#filter1204)" font-size="3.175px" text-align="center" text-anchor="middle" style="line-height:1.1" xml:space="preserve"><tspan x="57.620724" y="16.855087" text-align="center">Transmission</tspan><tspan x="57.620724" y="20.347588" text-align="center">unsuccessful</tspan></text>
|
||||||
<g font-size="12px" stroke-width="3.77953" text-anchor="middle">
|
<g font-size="12px" stroke-width="3.77953" text-anchor="middle">
|
||||||
<text transform="matrix(.264583 0 0 .264583 68.5988 118.913)" x="38.824219" y="9.1171875" filter="url(#filter1204)" text-align="center" style="line-height:1.1" xml:space="preserve"><tspan x="38.824219" y="9.1171875" text-align="center">Transmission</tspan><tspan x="38.824219" y="22.317188" text-align="center">starts</tspan></text>
|
<text transform="matrix(.264583 0 0 .264583 68.5988 118.913)" x="38.824219" y="9.1171875" filter="url(#filter1204)" text-align="center" style="line-height:1.1" xml:space="preserve"><tspan x="38.824219" y="9.1171875" text-align="center">Transmission</tspan><tspan x="38.824219" y="22.317188" text-align="center">starts</tspan></text>
|
||||||
<text transform="matrix(.264583 0 0 .264583 106.802 130.509)" x="38.824219" y="9.1171875" filter="url(#filter1204)" text-align="center" style="line-height:1.1" xml:space="preserve"><tspan x="38.824219" y="9.1171875" text-align="center">Transmission</tspan><tspan x="38.824219" y="22.317188" text-align="center">succesfull</tspan></text>
|
<text transform="matrix(.264583 0 0 .264583 106.802 130.509)" x="38.824219" y="9.1171875" filter="url(#filter1204)" text-align="center" style="line-height:1.1" xml:space="preserve"><tspan x="38.824219" y="9.1171875" text-align="center">Transmission</tspan><tspan x="38.824219" y="22.317188" text-align="center">successful</tspan></text>
|
||||||
<text transform="matrix(.264583 0 0 .264583 107.77 145.476)" x="38.824219" y="9.1171875" filter="url(#filter1204)" text-align="center" style="line-height:1.1" xml:space="preserve"><tspan x="38.824219" y="9.1171875" text-align="center">Transmission</tspan><tspan x="38.824219" y="22.317188" text-align="center">sborted</tspan></text>
|
<text transform="matrix(.264583 0 0 .264583 107.77 145.476)" x="38.824219" y="9.1171875" filter="url(#filter1204)" text-align="center" style="line-height:1.1" xml:space="preserve"><tspan x="38.824219" y="9.1171875" text-align="center">Transmission</tspan><tspan x="38.824219" y="22.317188" text-align="center">sborted</tspan></text>
|
||||||
</g>
|
</g>
|
||||||
<g stroke-width="3.77953" text-anchor="middle">
|
<g stroke-width="3.77953" text-anchor="middle">
|
||||||
|
Before Width: | Height: | Size: 16 KiB After Width: | Height: | Size: 16 KiB |
@ -254,7 +254,7 @@ Media selection
|
|||||||
A number of the older NICs such as the 3c590 and 3c900 series have
|
A number of the older NICs such as the 3c590 and 3c900 series have
|
||||||
10base2 and AUI interfaces.
|
10base2 and AUI interfaces.
|
||||||
|
|
||||||
Prior to January, 2001 this driver would autoeselect the 10base2 or AUI
|
Prior to January, 2001 this driver would autoselect the 10base2 or AUI
|
||||||
port if it didn't detect activity on the 10baseT port. It would then
|
port if it didn't detect activity on the 10baseT port. It would then
|
||||||
get stuck on the 10base2 port and a driver reload was necessary to
|
get stuck on the 10base2 port and a driver reload was necessary to
|
||||||
switch back to 10baseT. This behaviour could not be prevented with a
|
switch back to 10baseT. This behaviour could not be prevented with a
|
||||||
|
@ -270,7 +270,7 @@ RX flow rules (ntuple filters)
|
|||||||
|
|
||||||
ethtool -K ethX ntuple <on|off>
|
ethtool -K ethX ntuple <on|off>
|
||||||
|
|
||||||
When disabling ntuple filters, all the user programed filters are
|
When disabling ntuple filters, all the user programmed filters are
|
||||||
flushed from the driver cache and hardware. All needed filters must
|
flushed from the driver cache and hardware. All needed filters must
|
||||||
be re-added when ntuple is re-enabled.
|
be re-added when ntuple is re-enabled.
|
||||||
|
|
||||||
@ -418,7 +418,7 @@ Default value: 0xFFFF
|
|||||||
0 Disable interrupt throttling.
|
0 Disable interrupt throttling.
|
||||||
1 Enable interrupt throttling and use specified tx and rx rates.
|
1 Enable interrupt throttling and use specified tx and rx rates.
|
||||||
0xFFFF Auto throttling mode. Driver will choose the best RX and TX
|
0xFFFF Auto throttling mode. Driver will choose the best RX and TX
|
||||||
interrupt throtting settings based on link speed.
|
interrupt throttling settings based on link speed.
|
||||||
====== ==============================================================
|
====== ==============================================================
|
||||||
|
|
||||||
aq_itr_tx - TX interrupt throttle rate
|
aq_itr_tx - TX interrupt throttle rate
|
||||||
@ -456,7 +456,7 @@ AQ_CFG_RX_PAGEORDER
|
|||||||
|
|
||||||
Default value: 0
|
Default value: 0
|
||||||
|
|
||||||
RX page order override. Thats a power of 2 number of RX pages allocated for
|
RX page order override. That's a power of 2 number of RX pages allocated for
|
||||||
each descriptor. Received descriptor size is still limited by
|
each descriptor. Received descriptor size is still limited by
|
||||||
AQ_CFG_RX_FRAME_MAX.
|
AQ_CFG_RX_FRAME_MAX.
|
||||||
|
|
||||||
|
@ -11,7 +11,7 @@ Overview
|
|||||||
--------
|
--------
|
||||||
|
|
||||||
The DPAA2 MAC / PHY support consists of a set of APIs that help DPAA2 network
|
The DPAA2 MAC / PHY support consists of a set of APIs that help DPAA2 network
|
||||||
drivers (dpaa2-eth, dpaa2-ethsw) interract with the PHY library.
|
drivers (dpaa2-eth, dpaa2-ethsw) interact with the PHY library.
|
||||||
|
|
||||||
DPAA2 Software Architecture
|
DPAA2 Software Architecture
|
||||||
---------------------------
|
---------------------------
|
||||||
|
@ -39,7 +39,7 @@ Contents:
|
|||||||
intel/ice
|
intel/ice
|
||||||
marvell/octeontx2
|
marvell/octeontx2
|
||||||
marvell/octeon_ep
|
marvell/octeon_ep
|
||||||
mellanox/mlx5
|
mellanox/mlx5/index
|
||||||
microsoft/netvsc
|
microsoft/netvsc
|
||||||
neterion/s2io
|
neterion/s2io
|
||||||
netronome/nfp
|
netronome/nfp
|
||||||
|
@ -901,15 +901,17 @@ To enable/disable UDP Segmentation Offload, issue the following command::
|
|||||||
|
|
||||||
# ethtool -K <ethX> tx-udp-segmentation [off|on]
|
# ethtool -K <ethX> tx-udp-segmentation [off|on]
|
||||||
|
|
||||||
|
|
||||||
GNSS module
|
GNSS module
|
||||||
-----------
|
-----------
|
||||||
Allows user to read messages from the GNSS module and write supported commands.
|
Requires kernel compiled with CONFIG_GNSS=y or CONFIG_GNSS=m.
|
||||||
If the module is physically present, driver creates 2 TTYs for each supported
|
Allows user to read messages from the GNSS hardware module and write supported
|
||||||
device in /dev, ttyGNSS_<device>:<function>_0 and _1. First one (_0) is RW and
|
commands. If the module is physically present, a GNSS device is spawned:
|
||||||
the second one is RO.
|
``/dev/gnss<id>``.
|
||||||
The protocol of write commands is dependent on the GNSS module as the driver
|
The protocol of write command is dependent on the GNSS hardware module as the
|
||||||
writes raw bytes from the TTY to the GNSS i2c. Please refer to the module
|
driver writes raw bytes by the GNSS object to the receiver through i2c. Please
|
||||||
documentation for details.
|
refer to the hardware GNSS module documentation for configuration details.
|
||||||
|
|
||||||
|
|
||||||
Performance Optimization
|
Performance Optimization
|
||||||
========================
|
========================
|
||||||
|
@ -127,7 +127,7 @@ Type1:
|
|||||||
Type2:
|
Type2:
|
||||||
- RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
|
- RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
|
||||||
- A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
|
- A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
|
||||||
VF0 will be received by VF1 and viceversa.
|
VF0 will be received by VF1 and vice versa.
|
||||||
- These VFs can be used by applications or virtual machines to communicate between them
|
- These VFs can be used by applications or virtual machines to communicate between them
|
||||||
without sending traffic outside. There is no switch present in HW, hence the support
|
without sending traffic outside. There is no switch present in HW, hence the support
|
||||||
for loopback VFs.
|
for loopback VFs.
|
||||||
|
@ -1,746 +0,0 @@
|
|||||||
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
|
||||||
|
|
||||||
=================================================
|
|
||||||
Mellanox ConnectX(R) mlx5 core VPI Network Driver
|
|
||||||
=================================================
|
|
||||||
|
|
||||||
Copyright (c) 2019, Mellanox Technologies LTD.
|
|
||||||
|
|
||||||
Contents
|
|
||||||
========
|
|
||||||
|
|
||||||
- `Enabling the driver and kconfig options`_
|
|
||||||
- `Devlink info`_
|
|
||||||
- `Devlink parameters`_
|
|
||||||
- `Bridge offload`_
|
|
||||||
- `mlx5 subfunction`_
|
|
||||||
- `mlx5 function attributes`_
|
|
||||||
- `Devlink health reporters`_
|
|
||||||
- `mlx5 tracepoints`_
|
|
||||||
|
|
||||||
Enabling the driver and kconfig options
|
|
||||||
=======================================
|
|
||||||
|
|
||||||
| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out)
|
|
||||||
| at build time via kernel Kconfig flags.
|
|
||||||
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
|
|
||||||
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
|
|
||||||
| For the list of advanced features, please see below.
|
|
||||||
|
|
||||||
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
|
|
||||||
|
|
||||||
| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config.
|
|
||||||
| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib).
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_CORE_EN=(y/n)**
|
|
||||||
|
|
||||||
| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads.
|
|
||||||
| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be
|
|
||||||
| built-in into mlx5_core.ko.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_EN_ARFS=(y/n)**
|
|
||||||
|
|
||||||
| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering.
|
|
||||||
| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_EN_RXNFC=(y/n)**
|
|
||||||
|
|
||||||
| Enables ethtool receive network flow classification, which allows user defined
|
|
||||||
| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_CORE_EN_DCB=(y/n)**:
|
|
||||||
|
|
||||||
| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_MPFS=(y/n)**
|
|
||||||
|
|
||||||
| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC.
|
|
||||||
| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing
|
|
||||||
| user configured unicast MAC addresses to the requesting PF.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_ESWITCH=(y/n)**
|
|
||||||
|
|
||||||
| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering
|
|
||||||
| and switching for the enabled VFs and PF in two available modes:
|
|
||||||
| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_.
|
|
||||||
| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_CORE_IPOIB=(y/n)**
|
|
||||||
|
|
||||||
| IPoIB offloads & acceleration support.
|
|
||||||
| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma
|
|
||||||
| IPoIB ulp netdevice.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_FPGA=(y/n)**
|
|
||||||
|
|
||||||
| Build support for the Innova family of network cards by Mellanox Technologies.
|
|
||||||
| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board.
|
|
||||||
| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow
|
|
||||||
| building sandbox-specific client drivers.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_EN_IPSEC=(y/n)**
|
|
||||||
|
|
||||||
| Enables `IPSec XFRM cryptography-offload acceleration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
|
|
||||||
|
|
||||||
**CONFIG_MLX5_EN_TLS=(y/n)**
|
|
||||||
|
|
||||||
| TLS cryptography-offload acceleration.
|
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
|
|
||||||
|
|
||||||
| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
|
|
||||||
|
|
||||||
**CONFIG_MLX5_SF=(y/n)**
|
|
||||||
|
|
||||||
| Build support for subfunction.
|
|
||||||
| Subfunctons are more light weight than PCI SRIOV VFs. Choosing this option
|
|
||||||
| will enable support for creating subfunction devices.
|
|
||||||
|
|
||||||
**External options** ( Choose if the corresponding mlx5 feature is required )
|
|
||||||
|
|
||||||
- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
|
|
||||||
- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
|
|
||||||
- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
|
|
||||||
|
|
||||||
Devlink info
|
|
||||||
============
|
|
||||||
|
|
||||||
The devlink info reports the running and stored firmware versions on device.
|
|
||||||
It also prints the device PSID which represents the HCA board type ID.
|
|
||||||
|
|
||||||
User command example::
|
|
||||||
|
|
||||||
$ devlink dev info pci/0000:00:06.0
|
|
||||||
pci/0000:00:06.0:
|
|
||||||
driver mlx5_core
|
|
||||||
versions:
|
|
||||||
fixed:
|
|
||||||
fw.psid MT_0000000009
|
|
||||||
running:
|
|
||||||
fw.version 16.26.0100
|
|
||||||
stored:
|
|
||||||
fw.version 16.26.0100
|
|
||||||
|
|
||||||
Devlink parameters
|
|
||||||
==================
|
|
||||||
|
|
||||||
flow_steering_mode: Device flow steering mode
|
|
||||||
---------------------------------------------
|
|
||||||
The flow steering mode parameter controls the flow steering mode of the driver.
|
|
||||||
Two modes are supported:
|
|
||||||
1. 'dmfs' - Device managed flow steering.
|
|
||||||
2. 'smfs' - Software/Driver managed flow steering.
|
|
||||||
|
|
||||||
In DMFS mode, the HW steering entities are created and managed through the
|
|
||||||
Firmware.
|
|
||||||
In SMFS mode, the HW steering entities are created and managed though by
|
|
||||||
the driver directly into hardware without firmware intervention.
|
|
||||||
|
|
||||||
SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode.
|
|
||||||
|
|
||||||
User command examples:
|
|
||||||
|
|
||||||
- Set SMFS flow steering mode::
|
|
||||||
|
|
||||||
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
|
|
||||||
|
|
||||||
- Read device flow steering mode::
|
|
||||||
|
|
||||||
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
|
|
||||||
pci/0000:06:00.0:
|
|
||||||
name flow_steering_mode type driver-specific
|
|
||||||
values:
|
|
||||||
cmode runtime value smfs
|
|
||||||
|
|
||||||
enable_roce: RoCE enablement state
|
|
||||||
----------------------------------
|
|
||||||
RoCE enablement state controls driver support for RoCE traffic.
|
|
||||||
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well-known UDP RoCE port is handled as raw ethernet traffic.
|
|
||||||
|
|
||||||
To change RoCE enablement state, a user must change the driverinit cmode value and run devlink reload.
|
|
||||||
|
|
||||||
User command examples:
|
|
||||||
|
|
||||||
- Disable RoCE::
|
|
||||||
|
|
||||||
$ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
|
|
||||||
$ devlink dev reload pci/0000:06:00.0
|
|
||||||
|
|
||||||
- Read RoCE enablement state::
|
|
||||||
|
|
||||||
$ devlink dev param show pci/0000:06:00.0 name enable_roce
|
|
||||||
pci/0000:06:00.0:
|
|
||||||
name enable_roce type generic
|
|
||||||
values:
|
|
||||||
cmode driverinit value true
|
|
||||||
|
|
||||||
esw_port_metadata: Eswitch port metadata state
|
|
||||||
----------------------------------------------
|
|
||||||
When applicable, disabling eswitch metadata can increase packet rate
|
|
||||||
up to 20% depending on the use case and packet sizes.
|
|
||||||
|
|
||||||
Eswitch port metadata state controls whether to internally tag packets with
|
|
||||||
metadata. Metadata tagging must be enabled for multi-port RoCE, failover
|
|
||||||
between representors and stacked devices.
|
|
||||||
By default metadata is enabled on the supported devices in E-switch.
|
|
||||||
Metadata is applicable only for E-switch in switchdev mode and
|
|
||||||
users may disable it when NONE of the below use cases will be in use:
|
|
||||||
1. HCA is in Dual/multi-port RoCE mode.
|
|
||||||
2. VF/SF representor bonding (Usually used for Live migration)
|
|
||||||
3. Stacked devices
|
|
||||||
|
|
||||||
When metadata is disabled, the above use cases will fail to initialize if
|
|
||||||
users try to enable them.
|
|
||||||
|
|
||||||
- Show eswitch port metadata::
|
|
||||||
|
|
||||||
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
|
|
||||||
pci/0000:06:00.0:
|
|
||||||
name esw_port_metadata type driver-specific
|
|
||||||
values:
|
|
||||||
cmode runtime value true
|
|
||||||
|
|
||||||
- Disable eswitch port metadata::
|
|
||||||
|
|
||||||
$ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime
|
|
||||||
|
|
||||||
- Change eswitch mode to switchdev mode where after choosing the metadata value::
|
|
||||||
|
|
||||||
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
|
||||||
|
|
||||||
Bridge offload
|
|
||||||
==============
|
|
||||||
The mlx5 driver implements support for offloading bridge rules when in switchdev
|
|
||||||
mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
|
|
||||||
representor is attached to bridge.
|
|
||||||
|
|
||||||
- Change device to switchdev mode::
|
|
||||||
|
|
||||||
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
|
||||||
|
|
||||||
- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
|
|
||||||
|
|
||||||
$ ip link set enp8s0f0 master bridge1
|
|
||||||
|
|
||||||
VLANs
|
|
||||||
-----
|
|
||||||
Following bridge VLAN functions are supported by mlx5:
|
|
||||||
|
|
||||||
- VLAN filtering (including multiple VLANs per port)::
|
|
||||||
|
|
||||||
$ ip link set bridge1 type bridge vlan_filtering 1
|
|
||||||
$ bridge vlan add dev enp8s0f0 vid 2-3
|
|
||||||
|
|
||||||
- VLAN push on bridge ingress::
|
|
||||||
|
|
||||||
$ bridge vlan add dev enp8s0f0 vid 3 pvid
|
|
||||||
|
|
||||||
- VLAN pop on bridge egress::
|
|
||||||
|
|
||||||
$ bridge vlan add dev enp8s0f0 vid 3 untagged
|
|
||||||
|
|
||||||
mlx5 subfunction
|
|
||||||
================
|
|
||||||
mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
|
|
||||||
|
|
||||||
A subfunction has its own function capabilities and its own resources. This
|
|
||||||
means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
|
|
||||||
queues are neither shared nor stolen from the parent PCI function.
|
|
||||||
|
|
||||||
When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
|
|
||||||
resources neither shared nor stolen from the parent PCI function.
|
|
||||||
|
|
||||||
A subfunction has a dedicated window in PCI BAR space that is not shared
|
|
||||||
with the other subfunctions or the parent PCI function. This ensures that all
|
|
||||||
devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
|
|
||||||
PCI BAR space.
|
|
||||||
|
|
||||||
A subfunction supports eswitch representation through which it supports tc
|
|
||||||
offloads. The user configures eswitch to send/receive packets from/to
|
|
||||||
the subfunction port.
|
|
||||||
|
|
||||||
Subfunctions share PCI level resources such as PCI MSI-X IRQs with
|
|
||||||
other subfunctions and/or with its parent PCI function.
|
|
||||||
|
|
||||||
Example mlx5 software, system, and device view::
|
|
||||||
|
|
||||||
_______
|
|
||||||
| admin |
|
|
||||||
| user |----------
|
|
||||||
|_______| |
|
|
||||||
| |
|
|
||||||
____|____ __|______ _________________
|
|
||||||
| | | | | |
|
|
||||||
| devlink | | tc tool | | user |
|
|
||||||
| tool | |_________| | applications |
|
|
||||||
|_________| | |_________________|
|
|
||||||
| | | |
|
|
||||||
| | | | Userspace
|
|
||||||
+---------|-------------|-------------------|----------|--------------------+
|
|
||||||
| | +----------+ +----------+ Kernel
|
|
||||||
| | | netdev | | rdma dev |
|
|
||||||
| | +----------+ +----------+
|
|
||||||
(devlink port add/del | ^ ^
|
|
||||||
port function set) | | |
|
|
||||||
| | +---------------|
|
|
||||||
_____|___ | | _______|_______
|
|
||||||
| | | | | mlx5 class |
|
|
||||||
| devlink | +------------+ | | drivers |
|
|
||||||
| kernel | | rep netdev | | |(mlx5_core,ib) |
|
|
||||||
|_________| +------------+ | |_______________|
|
|
||||||
| | | ^
|
|
||||||
(devlink ops) | | (probe/remove)
|
|
||||||
_________|________ | | ____|________
|
|
||||||
| subfunction | | +---------------+ | subfunction |
|
|
||||||
| management driver|----- | subfunction |---| driver |
|
|
||||||
| (mlx5_core) | | auxiliary dev | | (mlx5_core) |
|
|
||||||
|__________________| +---------------+ |_____________|
|
|
||||||
| ^
|
|
||||||
(sf add/del, vhca events) |
|
|
||||||
| (device add/del)
|
|
||||||
_____|____ ____|________
|
|
||||||
| | | subfunction |
|
|
||||||
| PCI NIC |--- activate/deactivate events--->| host driver |
|
|
||||||
|__________| | (mlx5_core) |
|
|
||||||
|_____________|
|
|
||||||
|
|
||||||
Subfunction is created using devlink port interface.
|
|
||||||
|
|
||||||
- Change device to switchdev mode::
|
|
||||||
|
|
||||||
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
|
||||||
|
|
||||||
- Add a devlink port of subfunction flavour::
|
|
||||||
|
|
||||||
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:00:00 state inactive opstate detached
|
|
||||||
|
|
||||||
- Show a devlink port of the subfunction::
|
|
||||||
|
|
||||||
$ devlink port show pci/0000:06:00.0/32768
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:00:00 state inactive opstate detached
|
|
||||||
|
|
||||||
- Delete a devlink port of subfunction after use::
|
|
||||||
|
|
||||||
$ devlink port del pci/0000:06:00.0/32768
|
|
||||||
|
|
||||||
mlx5 function attributes
|
|
||||||
========================
|
|
||||||
The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
|
|
||||||
a unified way for SmartNIC and non-SmartNIC.
|
|
||||||
|
|
||||||
This is supported only when the eswitch mode is set to switchdev. Port function
|
|
||||||
configuration of the PCI VF/SF is supported through devlink eswitch port.
|
|
||||||
|
|
||||||
Port function attributes should be set before PCI VF/SF is enumerated by the
|
|
||||||
driver.
|
|
||||||
|
|
||||||
MAC address setup
|
|
||||||
-----------------
|
|
||||||
mlx5 driver support devlink port function attr mechanism to setup MAC
|
|
||||||
address. (refer to Documentation/networking/devlink/devlink-port.rst)
|
|
||||||
|
|
||||||
RoCE capability setup
|
|
||||||
---------------------
|
|
||||||
Not all mlx5 PCI devices/SFs require RoCE capability.
|
|
||||||
|
|
||||||
When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
|
|
||||||
PCI devices/SF.
|
|
||||||
|
|
||||||
mlx5 driver support devlink port function attr mechanism to setup RoCE
|
|
||||||
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
|
||||||
|
|
||||||
migratable capability setup
|
|
||||||
---------------------------
|
|
||||||
User who wants mlx5 PCI VFs to be able to perform live migration need to
|
|
||||||
explicitly enable the VF migratable capability.
|
|
||||||
|
|
||||||
mlx5 driver support devlink port function attr mechanism to setup migratable
|
|
||||||
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
|
||||||
|
|
||||||
SF state setup
|
|
||||||
--------------
|
|
||||||
To use the SF, the user must activate the SF using the SF function state
|
|
||||||
attribute.
|
|
||||||
|
|
||||||
- Get the state of the SF identified by its unique devlink port index::
|
|
||||||
|
|
||||||
$ devlink port show ens2f0npf0sf88
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:88:88 state inactive opstate detached
|
|
||||||
|
|
||||||
- Activate the function and verify its state is active::
|
|
||||||
|
|
||||||
$ devlink port function set ens2f0npf0sf88 state active
|
|
||||||
|
|
||||||
$ devlink port show ens2f0npf0sf88
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:88:88 state active opstate detached
|
|
||||||
|
|
||||||
Upon function activation, the PF driver instance gets the event from the device
|
|
||||||
that a particular SF was activated. It's the cue to put the device on bus, probe
|
|
||||||
it and instantiate the devlink instance and class specific auxiliary devices
|
|
||||||
for it.
|
|
||||||
|
|
||||||
- Show the auxiliary device and port of the subfunction::
|
|
||||||
|
|
||||||
$ devlink dev show
|
|
||||||
devlink dev show auxiliary/mlx5_core.sf.4
|
|
||||||
|
|
||||||
$ devlink port show auxiliary/mlx5_core.sf.4/1
|
|
||||||
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
|
|
||||||
|
|
||||||
$ rdma link show mlx5_0/1
|
|
||||||
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
|
|
||||||
|
|
||||||
$ rdma dev show
|
|
||||||
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
|
|
||||||
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
|
|
||||||
|
|
||||||
- Subfunction auxiliary device and class device hierarchy::
|
|
||||||
|
|
||||||
mlx5_core.sf.4
|
|
||||||
(subfunction auxiliary device)
|
|
||||||
/\
|
|
||||||
/ \
|
|
||||||
/ \
|
|
||||||
/ \
|
|
||||||
/ \
|
|
||||||
mlx5_core.eth.4 mlx5_core.rdma.4
|
|
||||||
(sf eth aux dev) (sf rdma aux dev)
|
|
||||||
| |
|
|
||||||
| |
|
|
||||||
p0sf88 mlx5_0
|
|
||||||
(sf netdev) (sf rdma device)
|
|
||||||
|
|
||||||
Additionally, the SF port also gets the event when the driver attaches to the
|
|
||||||
auxiliary device of the subfunction. This results in changing the operational
|
|
||||||
state of the function. This provides visibility to the user to decide when is it
|
|
||||||
safe to delete the SF port for graceful termination of the subfunction.
|
|
||||||
|
|
||||||
- Show the SF port operational state::
|
|
||||||
|
|
||||||
$ devlink port show ens2f0npf0sf88
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:88:88 state active opstate attached
|
|
||||||
|
|
||||||
Devlink health reporters
|
|
||||||
========================
|
|
||||||
|
|
||||||
tx reporter
|
|
||||||
-----------
|
|
||||||
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
|
|
||||||
|
|
||||||
- tx timeout
|
|
||||||
Report on kernel tx timeout detection.
|
|
||||||
Recover by searching lost interrupts.
|
|
||||||
- tx error completion
|
|
||||||
Report on error tx completion.
|
|
||||||
Recover by flushing the tx queue and reset it.
|
|
||||||
|
|
||||||
tx reporter also support on demand diagnose callback, on which it provides
|
|
||||||
real time information of its send queues status.
|
|
||||||
|
|
||||||
User commands examples:
|
|
||||||
|
|
||||||
- Diagnose send queues status::
|
|
||||||
|
|
||||||
$ devlink health diagnose pci/0000:82:00.0 reporter tx
|
|
||||||
|
|
||||||
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
|
|
||||||
|
|
||||||
- Show number of tx errors indicated, number of recover flows ended successfully,
|
|
||||||
is autorecover enabled and graceful period from last recover::
|
|
||||||
|
|
||||||
$ devlink health show pci/0000:82:00.0 reporter tx
|
|
||||||
|
|
||||||
rx reporter
|
|
||||||
-----------
|
|
||||||
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
|
|
||||||
|
|
||||||
- rx queues' initialization (population) timeout
|
|
||||||
Population of rx queues' descriptors on ring initialization is done
|
|
||||||
in napi context via triggering an irq. In case of a failure to get
|
|
||||||
the minimum amount of descriptors, a timeout would occur, and
|
|
||||||
descriptors could be recovered by polling the EQ (Event Queue).
|
|
||||||
- rx completions with errors (reported by HW on interrupt context)
|
|
||||||
Report on rx completion error.
|
|
||||||
Recover (if needed) by flushing the related queue and reset it.
|
|
||||||
|
|
||||||
rx reporter also supports on demand diagnose callback, on which it
|
|
||||||
provides real time information of its receive queues' status.
|
|
||||||
|
|
||||||
- Diagnose rx queues' status and corresponding completion queue::
|
|
||||||
|
|
||||||
$ devlink health diagnose pci/0000:82:00.0 reporter rx
|
|
||||||
|
|
||||||
NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output.
|
|
||||||
|
|
||||||
- Show number of rx errors indicated, number of recover flows ended successfully,
|
|
||||||
is autorecover enabled, and graceful period from last recover::
|
|
||||||
|
|
||||||
$ devlink health show pci/0000:82:00.0 reporter rx
|
|
||||||
|
|
||||||
fw reporter
|
|
||||||
-----------
|
|
||||||
The fw reporter implements `diagnose` and `dump` callbacks.
|
|
||||||
It follows symptoms of fw error such as fw syndrome by triggering
|
|
||||||
fw core dump and storing it into the dump buffer.
|
|
||||||
The fw reporter diagnose command can be triggered any time by the user to check
|
|
||||||
current fw status.
|
|
||||||
|
|
||||||
User commands examples:
|
|
||||||
|
|
||||||
- Check fw heath status::
|
|
||||||
|
|
||||||
$ devlink health diagnose pci/0000:82:00.0 reporter fw
|
|
||||||
|
|
||||||
- Read FW core dump if already stored or trigger new one::
|
|
||||||
|
|
||||||
$ devlink health dump show pci/0000:82:00.0 reporter fw
|
|
||||||
|
|
||||||
NOTE: This command can run only on the PF which has fw tracer ownership,
|
|
||||||
running it on other PF or any VF will return "Operation not permitted".
|
|
||||||
|
|
||||||
fw fatal reporter
|
|
||||||
-----------------
|
|
||||||
The fw fatal reporter implements `dump` and `recover` callbacks.
|
|
||||||
It follows fatal errors indications by CR-space dump and recover flow.
|
|
||||||
The CR-space dump uses vsc interface which is valid even if the FW command
|
|
||||||
interface is not functional, which is the case in most FW fatal errors.
|
|
||||||
The recover function runs recover flow which reloads the driver and triggers fw
|
|
||||||
reset if needed.
|
|
||||||
On firmware error, the health buffer is dumped into the dmesg. The log
|
|
||||||
level is derived from the error's severity (given in health buffer).
|
|
||||||
|
|
||||||
User commands examples:
|
|
||||||
|
|
||||||
- Run fw recover flow manually::
|
|
||||||
|
|
||||||
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
|
|
||||||
|
|
||||||
- Read FW CR-space dump if already stored or trigger new one::
|
|
||||||
|
|
||||||
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
|
|
||||||
|
|
||||||
NOTE: This command can run only on PF.
|
|
||||||
|
|
||||||
mlx5 tracepoints
|
|
||||||
================
|
|
||||||
|
|
||||||
mlx5 driver provides internal tracepoints for tracking and debugging using
|
|
||||||
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
|
|
||||||
|
|
||||||
For the list of support mlx5 events, check `/sys/kernel/debug/tracing/events/mlx5/`.
|
|
||||||
|
|
||||||
tc and eswitch offloads tracepoints:
|
|
||||||
|
|
||||||
- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT
|
|
||||||
|
|
||||||
- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL
|
|
||||||
|
|
||||||
- mlx5e_stats_flower: trace flower stats request::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217
|
|
||||||
|
|
||||||
- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1
|
|
||||||
|
|
||||||
- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
|
|
||||||
|
|
||||||
Bridge offloads tracepoints:
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_fdb_entry_init: trace bridge FDB entry offloaded to mlx5::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_fdb_entry_init >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u20:9-2217 [003] ...1 318.582243: mlx5_esw_bridge_fdb_entry_init: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=0 flags=0 used=0
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_fdb_entry_cleanup: trace bridge FDB entry deleted from mlx5::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_fdb_entry_cleanup >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
ip-2581 [005] ...1 318.629871: mlx5_esw_bridge_fdb_entry_cleanup: net_device=enp8s0f0_1 addr=e4:fd:05:08:00:03 vid=0 flags=0 used=16
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_fdb_entry_refresh: trace bridge FDB entry offload refreshed in
|
|
||||||
mlx5::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_fdb_entry_refresh >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u20:8-3849 [003] ...1 466716: mlx5_esw_bridge_fdb_entry_refresh: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=3 flags=0 used=0
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_vlan_create: trace bridge VLAN object add on mlx5
|
|
||||||
representor::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_vlan_create >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
ip-2560 [007] ...1 318.460258: mlx5_esw_bridge_vlan_create: vid=1 flags=6
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_vlan_cleanup: trace bridge VLAN object delete from mlx5
|
|
||||||
representor::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_vlan_cleanup >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
bridge-2582 [007] ...1 318.653496: mlx5_esw_bridge_vlan_cleanup: vid=2 flags=8
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_vport_init: trace mlx5 vport assigned with bridge upper
|
|
||||||
device::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_vport_init >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
ip-2560 [007] ...1 318.458915: mlx5_esw_bridge_vport_init: vport_num=1
|
|
||||||
|
|
||||||
- mlx5_esw_bridge_vport_cleanup: trace mlx5 vport removed from bridge upper
|
|
||||||
device::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_bridge_vport_cleanup >> set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
ip-5387 [000] ...1 573713: mlx5_esw_bridge_vport_cleanup: vport_num=1
|
|
||||||
|
|
||||||
Eswitch QoS tracepoints:
|
|
||||||
|
|
||||||
- mlx5_esw_vport_qos_create: trace creation of transmit scheduler arbiter for vport::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_vport_qos_create >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
<...>-23496 [018] .... 73136.838831: mlx5_esw_vport_qos_create: (0000:82:00.0) vport=2 tsar_ix=4 bw_share=0, max_rate=0 group=000000007b576bb3
|
|
||||||
|
|
||||||
- mlx5_esw_vport_qos_config: trace configuration of transmit scheduler arbiter for vport::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_vport_qos_config >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
<...>-26548 [023] .... 75754.223823: mlx5_esw_vport_qos_config: (0000:82:00.0) vport=1 tsar_ix=3 bw_share=34, max_rate=10000 group=000000007b576bb3
|
|
||||||
|
|
||||||
- mlx5_esw_vport_qos_destroy: trace deletion of transmit scheduler arbiter for vport::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_vport_qos_destroy >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
<...>-27418 [004] .... 76546.680901: mlx5_esw_vport_qos_destroy: (0000:82:00.0) vport=1 tsar_ix=3
|
|
||||||
|
|
||||||
- mlx5_esw_group_qos_create: trace creation of transmit scheduler arbiter for rate group::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_group_qos_create >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
<...>-26578 [008] .... 75776.022112: mlx5_esw_group_qos_create: (0000:82:00.0) group=000000008dac63ea tsar_ix=5
|
|
||||||
|
|
||||||
- mlx5_esw_group_qos_config: trace configuration of transmit scheduler arbiter for rate group::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_group_qos_config >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
<...>-27303 [020] .... 76461.455356: mlx5_esw_group_qos_config: (0000:82:00.0) group=000000008dac63ea tsar_ix=5 bw_share=100 max_rate=20000
|
|
||||||
|
|
||||||
- mlx5_esw_group_qos_destroy: trace deletion of transmit scheduler arbiter for group::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_esw_group_qos_destroy >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
<...>-27418 [006] .... 76547.187258: mlx5_esw_group_qos_destroy: (0000:82:00.0) group=000000007b576bb3 tsar_ix=1
|
|
||||||
|
|
||||||
SF tracepoints:
|
|
||||||
|
|
||||||
- mlx5_sf_add: trace addition of the SF port::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
devlink-9363 [031] ..... 24610.188722: mlx5_sf_add: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 sfnum=88
|
|
||||||
|
|
||||||
- mlx5_sf_free: trace freeing of the SF port::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
devlink-9830 [038] ..... 26300.404749: mlx5_sf_free: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000
|
|
||||||
|
|
||||||
- mlx5_sf_hwc_alloc: trace allocating of the hardware SF context::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
devlink-9775 [031] ..... 26296.385259: mlx5_sf_hwc_alloc: (0000:06:00.0) controller=0 hw_id=0x8000 sfnum=88
|
|
||||||
|
|
||||||
- mlx5_sf_hwc_free: trace freeing of the hardware SF context::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u128:3-9093 [046] ..... 24625.365771: mlx5_sf_hwc_free: (0000:06:00.0) hw_id=0x8000
|
|
||||||
|
|
||||||
- mlx5_sf_hwc_deferred_free : trace deferred freeing of the hardware SF context::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
devlink-9519 [046] ..... 24624.400271: mlx5_sf_hwc_deferred_free: (0000:06:00.0) hw_id=0x8000
|
|
||||||
|
|
||||||
- mlx5_sf_vhca_event: trace SF vhca event and state::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u128:3-9093 [046] ..... 24625.365525: mlx5_sf_vhca_event: (0000:06:00.0) hw_id=0x8000 sfnum=88 vhca_state=1
|
|
||||||
|
|
||||||
- mlx5_sf_dev_add : trace SF device add event::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_dev_add>> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u128:3-9093 [000] ..... 24616.524495: mlx5_sf_dev_add: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
|
|
||||||
|
|
||||||
- mlx5_sf_dev_del : trace SF device delete event::
|
|
||||||
|
|
||||||
$ echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event
|
|
||||||
$ cat /sys/kernel/debug/tracing/trace
|
|
||||||
...
|
|
||||||
kworker/u128:3-9093 [044] ..... 24624.400749: mlx5_sf_dev_del: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
|
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,224 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
|
=======
|
||||||
|
Devlink
|
||||||
|
=======
|
||||||
|
|
||||||
|
:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
|
|
||||||
|
Contents
|
||||||
|
========
|
||||||
|
|
||||||
|
- `Info`_
|
||||||
|
- `Parameters`_
|
||||||
|
- `Health reporters`_
|
||||||
|
|
||||||
|
Info
|
||||||
|
====
|
||||||
|
|
||||||
|
The devlink info reports the running and stored firmware versions on device.
|
||||||
|
It also prints the device PSID which represents the HCA board type ID.
|
||||||
|
|
||||||
|
User command example::
|
||||||
|
|
||||||
|
$ devlink dev info pci/0000:00:06.0
|
||||||
|
pci/0000:00:06.0:
|
||||||
|
driver mlx5_core
|
||||||
|
versions:
|
||||||
|
fixed:
|
||||||
|
fw.psid MT_0000000009
|
||||||
|
running:
|
||||||
|
fw.version 16.26.0100
|
||||||
|
stored:
|
||||||
|
fw.version 16.26.0100
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
==========
|
||||||
|
|
||||||
|
flow_steering_mode: Device flow steering mode
|
||||||
|
---------------------------------------------
|
||||||
|
The flow steering mode parameter controls the flow steering mode of the driver.
|
||||||
|
Two modes are supported:
|
||||||
|
1. 'dmfs' - Device managed flow steering.
|
||||||
|
2. 'smfs' - Software/Driver managed flow steering.
|
||||||
|
|
||||||
|
In DMFS mode, the HW steering entities are created and managed through the
|
||||||
|
Firmware.
|
||||||
|
In SMFS mode, the HW steering entities are created and managed though by
|
||||||
|
the driver directly into hardware without firmware intervention.
|
||||||
|
|
||||||
|
SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode.
|
||||||
|
|
||||||
|
User command examples:
|
||||||
|
|
||||||
|
- Set SMFS flow steering mode::
|
||||||
|
|
||||||
|
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
|
||||||
|
|
||||||
|
- Read device flow steering mode::
|
||||||
|
|
||||||
|
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
|
||||||
|
pci/0000:06:00.0:
|
||||||
|
name flow_steering_mode type driver-specific
|
||||||
|
values:
|
||||||
|
cmode runtime value smfs
|
||||||
|
|
||||||
|
enable_roce: RoCE enablement state
|
||||||
|
----------------------------------
|
||||||
|
If the device supports RoCE disablement, RoCE enablement state controls device
|
||||||
|
support for RoCE capability. Otherwise, the control occurs in the driver stack.
|
||||||
|
When RoCE is disabled at the driver level, only raw ethernet QPs are supported.
|
||||||
|
|
||||||
|
To change RoCE enablement state, a user must change the driverinit cmode value
|
||||||
|
and run devlink reload.
|
||||||
|
|
||||||
|
User command examples:
|
||||||
|
|
||||||
|
- Disable RoCE::
|
||||||
|
|
||||||
|
$ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
|
||||||
|
$ devlink dev reload pci/0000:06:00.0
|
||||||
|
|
||||||
|
- Read RoCE enablement state::
|
||||||
|
|
||||||
|
$ devlink dev param show pci/0000:06:00.0 name enable_roce
|
||||||
|
pci/0000:06:00.0:
|
||||||
|
name enable_roce type generic
|
||||||
|
values:
|
||||||
|
cmode driverinit value true
|
||||||
|
|
||||||
|
esw_port_metadata: Eswitch port metadata state
|
||||||
|
----------------------------------------------
|
||||||
|
When applicable, disabling eswitch metadata can increase packet rate
|
||||||
|
up to 20% depending on the use case and packet sizes.
|
||||||
|
|
||||||
|
Eswitch port metadata state controls whether to internally tag packets with
|
||||||
|
metadata. Metadata tagging must be enabled for multi-port RoCE, failover
|
||||||
|
between representors and stacked devices.
|
||||||
|
By default metadata is enabled on the supported devices in E-switch.
|
||||||
|
Metadata is applicable only for E-switch in switchdev mode and
|
||||||
|
users may disable it when NONE of the below use cases will be in use:
|
||||||
|
1. HCA is in Dual/multi-port RoCE mode.
|
||||||
|
2. VF/SF representor bonding (Usually used for Live migration)
|
||||||
|
3. Stacked devices
|
||||||
|
|
||||||
|
When metadata is disabled, the above use cases will fail to initialize if
|
||||||
|
users try to enable them.
|
||||||
|
|
||||||
|
- Show eswitch port metadata::
|
||||||
|
|
||||||
|
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
|
||||||
|
pci/0000:06:00.0:
|
||||||
|
name esw_port_metadata type driver-specific
|
||||||
|
values:
|
||||||
|
cmode runtime value true
|
||||||
|
|
||||||
|
- Disable eswitch port metadata::
|
||||||
|
|
||||||
|
$ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime
|
||||||
|
|
||||||
|
- Change eswitch mode to switchdev mode where after choosing the metadata value::
|
||||||
|
|
||||||
|
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
||||||
|
|
||||||
|
Health reporters
|
||||||
|
================
|
||||||
|
|
||||||
|
tx reporter
|
||||||
|
-----------
|
||||||
|
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||||
|
|
||||||
|
- tx timeout
|
||||||
|
Report on kernel tx timeout detection.
|
||||||
|
Recover by searching lost interrupts.
|
||||||
|
- tx error completion
|
||||||
|
Report on error tx completion.
|
||||||
|
Recover by flushing the tx queue and reset it.
|
||||||
|
|
||||||
|
tx reporter also support on demand diagnose callback, on which it provides
|
||||||
|
real time information of its send queues status.
|
||||||
|
|
||||||
|
User commands examples:
|
||||||
|
|
||||||
|
- Diagnose send queues status::
|
||||||
|
|
||||||
|
$ devlink health diagnose pci/0000:82:00.0 reporter tx
|
||||||
|
|
||||||
|
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
|
||||||
|
|
||||||
|
- Show number of tx errors indicated, number of recover flows ended successfully,
|
||||||
|
is autorecover enabled and graceful period from last recover::
|
||||||
|
|
||||||
|
$ devlink health show pci/0000:82:00.0 reporter tx
|
||||||
|
|
||||||
|
rx reporter
|
||||||
|
-----------
|
||||||
|
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||||
|
|
||||||
|
- rx queues' initialization (population) timeout
|
||||||
|
Population of rx queues' descriptors on ring initialization is done
|
||||||
|
in napi context via triggering an irq. In case of a failure to get
|
||||||
|
the minimum amount of descriptors, a timeout would occur, and
|
||||||
|
descriptors could be recovered by polling the EQ (Event Queue).
|
||||||
|
- rx completions with errors (reported by HW on interrupt context)
|
||||||
|
Report on rx completion error.
|
||||||
|
Recover (if needed) by flushing the related queue and reset it.
|
||||||
|
|
||||||
|
rx reporter also supports on demand diagnose callback, on which it
|
||||||
|
provides real time information of its receive queues' status.
|
||||||
|
|
||||||
|
- Diagnose rx queues' status and corresponding completion queue::
|
||||||
|
|
||||||
|
$ devlink health diagnose pci/0000:82:00.0 reporter rx
|
||||||
|
|
||||||
|
NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output.
|
||||||
|
|
||||||
|
- Show number of rx errors indicated, number of recover flows ended successfully,
|
||||||
|
is autorecover enabled, and graceful period from last recover::
|
||||||
|
|
||||||
|
$ devlink health show pci/0000:82:00.0 reporter rx
|
||||||
|
|
||||||
|
fw reporter
|
||||||
|
-----------
|
||||||
|
The fw reporter implements `diagnose` and `dump` callbacks.
|
||||||
|
It follows symptoms of fw error such as fw syndrome by triggering
|
||||||
|
fw core dump and storing it into the dump buffer.
|
||||||
|
The fw reporter diagnose command can be triggered any time by the user to check
|
||||||
|
current fw status.
|
||||||
|
|
||||||
|
User commands examples:
|
||||||
|
|
||||||
|
- Check fw heath status::
|
||||||
|
|
||||||
|
$ devlink health diagnose pci/0000:82:00.0 reporter fw
|
||||||
|
|
||||||
|
- Read FW core dump if already stored or trigger new one::
|
||||||
|
|
||||||
|
$ devlink health dump show pci/0000:82:00.0 reporter fw
|
||||||
|
|
||||||
|
NOTE: This command can run only on the PF which has fw tracer ownership,
|
||||||
|
running it on other PF or any VF will return "Operation not permitted".
|
||||||
|
|
||||||
|
fw fatal reporter
|
||||||
|
-----------------
|
||||||
|
The fw fatal reporter implements `dump` and `recover` callbacks.
|
||||||
|
It follows fatal errors indications by CR-space dump and recover flow.
|
||||||
|
The CR-space dump uses vsc interface which is valid even if the FW command
|
||||||
|
interface is not functional, which is the case in most FW fatal errors.
|
||||||
|
The recover function runs recover flow which reloads the driver and triggers fw
|
||||||
|
reset if needed.
|
||||||
|
On firmware error, the health buffer is dumped into the dmesg. The log
|
||||||
|
level is derived from the error's severity (given in health buffer).
|
||||||
|
|
||||||
|
User commands examples:
|
||||||
|
|
||||||
|
- Run fw recover flow manually::
|
||||||
|
|
||||||
|
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
|
||||||
|
|
||||||
|
- Read FW CR-space dump if already stored or trigger new one::
|
||||||
|
|
||||||
|
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
|
||||||
|
|
||||||
|
NOTE: This command can run only on PF.
|
@ -0,0 +1,26 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
|
Mellanox ConnectX(R) mlx5 core VPI Network Driver
|
||||||
|
=================================================
|
||||||
|
|
||||||
|
:Copyright: |copy| 2019, Mellanox Technologies LTD.
|
||||||
|
:Copyright: |copy| 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
|
|
||||||
|
Contents:
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
kconfig
|
||||||
|
devlink
|
||||||
|
switchdev
|
||||||
|
tracepoints
|
||||||
|
counters
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
@ -0,0 +1,168 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
|
=======================================
|
||||||
|
Enabling the driver and kconfig options
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
|
|
||||||
|
| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out)
|
||||||
|
| at build time via kernel Kconfig flags.
|
||||||
|
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
|
||||||
|
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
|
||||||
|
| For the list of advanced features, please see below.
|
||||||
|
|
||||||
|
**CONFIG_MLX5_BRIDGE=(y/n)**
|
||||||
|
|
||||||
|
| Enable :ref:`Ethernet Bridging (BRIDGE) offloading support <mlx5_bridge_offload>`.
|
||||||
|
| This will provide the ability to add representors of mlx5 uplink and VF
|
||||||
|
| ports to Bridge and offloading rules for traffic between such ports.
|
||||||
|
| Supports VLANs (trunk and access modes).
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
|
||||||
|
|
||||||
|
| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config.
|
||||||
|
| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib).
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_CORE_EN=(y/n)**
|
||||||
|
|
||||||
|
| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads.
|
||||||
|
| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be
|
||||||
|
| built-in into mlx5_core.ko.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_CORE_EN_DCB=(y/n)**:
|
||||||
|
|
||||||
|
| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_CORE_IPOIB=(y/n)**
|
||||||
|
|
||||||
|
| IPoIB offloads & acceleration support.
|
||||||
|
| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma
|
||||||
|
| IPoIB ulp netdevice.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_CLS_ACT=(y/n)**
|
||||||
|
|
||||||
|
| Enables offload support for TC classifier action (NET_CLS_ACT).
|
||||||
|
| Works in both native NIC mode and Switchdev SRIOV mode.
|
||||||
|
| Flow-based classifiers, such as those registered through
|
||||||
|
| `tc-flower(8)`, are processed by the device, rather than the
|
||||||
|
| host. Actions that would then overwrite matching classification
|
||||||
|
| results would then be instant due to the offload.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_EN_ARFS=(y/n)**
|
||||||
|
|
||||||
|
| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering.
|
||||||
|
| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_EN_IPSEC=(y/n)**
|
||||||
|
|
||||||
|
| Enables `IPSec XFRM cryptography-offload acceleration <https://support.mellanox.com/s/article/ConnectX-6DX-Bluefield-2-IPsec-HW-Full-Offload-Configuration-Guide>`_.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_EN_MACSEC=(y/n)**
|
||||||
|
|
||||||
|
| Build support for MACsec cryptography-offload acceleration in the NIC.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_EN_RXNFC=(y/n)**
|
||||||
|
|
||||||
|
| Enables ethtool receive network flow classification, which allows user defined
|
||||||
|
| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_EN_TLS=(y/n)**
|
||||||
|
|
||||||
|
| TLS cryptography-offload acceleration.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_ESWITCH=(y/n)**
|
||||||
|
|
||||||
|
| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering
|
||||||
|
| and switching for the enabled VFs and PF in two available modes:
|
||||||
|
| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_.
|
||||||
|
| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_FPGA=(y/n)**
|
||||||
|
|
||||||
|
| Build support for the Innova family of network cards by Mellanox Technologies.
|
||||||
|
| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board.
|
||||||
|
| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow
|
||||||
|
| building sandbox-specific client drivers.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
|
||||||
|
|
||||||
|
| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_MPFS=(y/n)**
|
||||||
|
|
||||||
|
| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC.
|
||||||
|
| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing
|
||||||
|
| user configured unicast MAC addresses to the requesting PF.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_SF=(y/n)**
|
||||||
|
|
||||||
|
| Build support for subfunction.
|
||||||
|
| Subfunctons are more light weight than PCI SRIOV VFs. Choosing this option
|
||||||
|
| will enable support for creating subfunction devices.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_SF_MANAGER=(y/n)**
|
||||||
|
|
||||||
|
| Build support for subfuction port in the NIC. A Mellanox subfunction
|
||||||
|
| port is managed through devlink. A subfunction supports RDMA, netdevice
|
||||||
|
| and vdpa device. It is similar to a SRIOV VF but it doesn't require
|
||||||
|
| SRIOV support.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_SW_STEERING=(y/n)**
|
||||||
|
|
||||||
|
| Build support for software-managed steering in the NIC.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_TC_CT=(y/n)**
|
||||||
|
|
||||||
|
| Support offloading connection tracking rules via tc ct action.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_TC_SAMPLE=(y/n)**
|
||||||
|
|
||||||
|
| Support offloading sample rules via tc sample action.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_VDPA=(y/n)**
|
||||||
|
|
||||||
|
| Support library for Mellanox VDPA drivers. Provides code that is
|
||||||
|
| common for all types of VDPA drivers. The following drivers are planned:
|
||||||
|
| net, block.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_VDPA_NET=(y/n)**
|
||||||
|
|
||||||
|
| VDPA network driver for ConnectX6 and newer. Provides offloading
|
||||||
|
| of virtio net datapath such that descriptors put on the ring will
|
||||||
|
| be executed by the hardware. It also supports a variety of stateless
|
||||||
|
| offloads depending on the actual device used and firmware version.
|
||||||
|
|
||||||
|
|
||||||
|
**CONFIG_MLX5_VFIO_PCI=(y/n)**
|
||||||
|
|
||||||
|
| This provides migration support for MLX5 devices using the VFIO framework.
|
||||||
|
|
||||||
|
|
||||||
|
**External options** ( Choose if the corresponding mlx5 feature is required )
|
||||||
|
|
||||||
|
- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
|
||||||
|
- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
|
||||||
|
- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
|
@ -0,0 +1,239 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
|
=========
|
||||||
|
Switchdev
|
||||||
|
=========
|
||||||
|
|
||||||
|
:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
|
|
||||||
|
.. _mlx5_bridge_offload:
|
||||||
|
|
||||||
|
Bridge offload
|
||||||
|
==============
|
||||||
|
|
||||||
|
The mlx5 driver implements support for offloading bridge rules when in switchdev
|
||||||
|
mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
|
||||||
|
representor is attached to bridge.
|
||||||
|
|
||||||
|
- Change device to switchdev mode::
|
||||||
|
|
||||||
|
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
||||||
|
|
||||||
|
- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
|
||||||
|
|
||||||
|
$ ip link set enp8s0f0 master bridge1
|
||||||
|
|
||||||
|
VLANs
|
||||||
|
-----
|
||||||
|
|
||||||
|
Following bridge VLAN functions are supported by mlx5:
|
||||||
|
|
||||||
|
- VLAN filtering (including multiple VLANs per port)::
|
||||||
|
|
||||||
|
$ ip link set bridge1 type bridge vlan_filtering 1
|
||||||
|
$ bridge vlan add dev enp8s0f0 vid 2-3
|
||||||
|
|
||||||
|
- VLAN push on bridge ingress::
|
||||||
|
|
||||||
|
$ bridge vlan add dev enp8s0f0 vid 3 pvid
|
||||||
|
|
||||||
|
- VLAN pop on bridge egress::
|
||||||
|
|
||||||
|
$ bridge vlan add dev enp8s0f0 vid 3 untagged
|
||||||
|
|
||||||
|
Subfunction
|
||||||
|
===========
|
||||||
|
|
||||||
|
mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
|
||||||
|
|
||||||
|
A subfunction has its own function capabilities and its own resources. This
|
||||||
|
means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
|
||||||
|
queues are neither shared nor stolen from the parent PCI function.
|
||||||
|
|
||||||
|
When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
|
||||||
|
resources neither shared nor stolen from the parent PCI function.
|
||||||
|
|
||||||
|
A subfunction has a dedicated window in PCI BAR space that is not shared
|
||||||
|
with the other subfunctions or the parent PCI function. This ensures that all
|
||||||
|
devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
|
||||||
|
PCI BAR space.
|
||||||
|
|
||||||
|
A subfunction supports eswitch representation through which it supports tc
|
||||||
|
offloads. The user configures eswitch to send/receive packets from/to
|
||||||
|
the subfunction port.
|
||||||
|
|
||||||
|
Subfunctions share PCI level resources such as PCI MSI-X IRQs with
|
||||||
|
other subfunctions and/or with its parent PCI function.
|
||||||
|
|
||||||
|
Example mlx5 software, system, and device view::
|
||||||
|
|
||||||
|
_______
|
||||||
|
| admin |
|
||||||
|
| user |----------
|
||||||
|
|_______| |
|
||||||
|
| |
|
||||||
|
____|____ __|______ _________________
|
||||||
|
| | | | | |
|
||||||
|
| devlink | | tc tool | | user |
|
||||||
|
| tool | |_________| | applications |
|
||||||
|
|_________| | |_________________|
|
||||||
|
| | | |
|
||||||
|
| | | | Userspace
|
||||||
|
+---------|-------------|-------------------|----------|--------------------+
|
||||||
|
| | +----------+ +----------+ Kernel
|
||||||
|
| | | netdev | | rdma dev |
|
||||||
|
| | +----------+ +----------+
|
||||||
|
(devlink port add/del | ^ ^
|
||||||
|
port function set) | | |
|
||||||
|
| | +---------------|
|
||||||
|
_____|___ | | _______|_______
|
||||||
|
| | | | | mlx5 class |
|
||||||
|
| devlink | +------------+ | | drivers |
|
||||||
|
| kernel | | rep netdev | | |(mlx5_core,ib) |
|
||||||
|
|_________| +------------+ | |_______________|
|
||||||
|
| | | ^
|
||||||
|
(devlink ops) | | (probe/remove)
|
||||||
|
_________|________ | | ____|________
|
||||||
|
| subfunction | | +---------------+ | subfunction |
|
||||||
|
| management driver|----- | subfunction |---| driver |
|
||||||
|
| (mlx5_core) | | auxiliary dev | | (mlx5_core) |
|
||||||
|
|__________________| +---------------+ |_____________|
|
||||||
|
| ^
|
||||||
|
(sf add/del, vhca events) |
|
||||||
|
| (device add/del)
|
||||||
|
_____|____ ____|________
|
||||||
|
| | | subfunction |
|
||||||
|
| PCI NIC |--- activate/deactivate events--->| host driver |
|
||||||
|
|__________| | (mlx5_core) |
|
||||||
|
|_____________|
|
||||||
|
|
||||||
|
Subfunction is created using devlink port interface.
|
||||||
|
|
||||||
|
- Change device to switchdev mode::
|
||||||
|
|
||||||
|
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
||||||
|
|
||||||
|
- Add a devlink port of subfunction flavour::
|
||||||
|
|
||||||
|
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00 state inactive opstate detached
|
||||||
|
|
||||||
|
- Show a devlink port of the subfunction::
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/32768
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00 state inactive opstate detached
|
||||||
|
|
||||||
|
- Delete a devlink port of subfunction after use::
|
||||||
|
|
||||||
|
$ devlink port del pci/0000:06:00.0/32768
|
||||||
|
|
||||||
|
Function attributes
|
||||||
|
===================
|
||||||
|
|
||||||
|
The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
|
||||||
|
a unified way for SmartNIC and non-SmartNIC.
|
||||||
|
|
||||||
|
This is supported only when the eswitch mode is set to switchdev. Port function
|
||||||
|
configuration of the PCI VF/SF is supported through devlink eswitch port.
|
||||||
|
|
||||||
|
Port function attributes should be set before PCI VF/SF is enumerated by the
|
||||||
|
driver.
|
||||||
|
|
||||||
|
MAC address setup
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
mlx5 driver support devlink port function attr mechanism to setup MAC
|
||||||
|
address. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||||
|
|
||||||
|
RoCE capability setup
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Not all mlx5 PCI devices/SFs require RoCE capability.
|
||||||
|
|
||||||
|
When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
|
||||||
|
PCI devices/SF.
|
||||||
|
|
||||||
|
mlx5 driver support devlink port function attr mechanism to setup RoCE
|
||||||
|
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||||
|
|
||||||
|
migratable capability setup
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
User who wants mlx5 PCI VFs to be able to perform live migration need to
|
||||||
|
explicitly enable the VF migratable capability.
|
||||||
|
|
||||||
|
mlx5 driver support devlink port function attr mechanism to setup migratable
|
||||||
|
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||||
|
|
||||||
|
SF state setup
|
||||||
|
--------------
|
||||||
|
|
||||||
|
To use the SF, the user must activate the SF using the SF function state
|
||||||
|
attribute.
|
||||||
|
|
||||||
|
- Get the state of the SF identified by its unique devlink port index::
|
||||||
|
|
||||||
|
$ devlink port show ens2f0npf0sf88
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:88:88 state inactive opstate detached
|
||||||
|
|
||||||
|
- Activate the function and verify its state is active::
|
||||||
|
|
||||||
|
$ devlink port function set ens2f0npf0sf88 state active
|
||||||
|
|
||||||
|
$ devlink port show ens2f0npf0sf88
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:88:88 state active opstate detached
|
||||||
|
|
||||||
|
Upon function activation, the PF driver instance gets the event from the device
|
||||||
|
that a particular SF was activated. It's the cue to put the device on bus, probe
|
||||||
|
it and instantiate the devlink instance and class specific auxiliary devices
|
||||||
|
for it.
|
||||||
|
|
||||||
|
- Show the auxiliary device and port of the subfunction::
|
||||||
|
|
||||||
|
$ devlink dev show
|
||||||
|
devlink dev show auxiliary/mlx5_core.sf.4
|
||||||
|
|
||||||
|
$ devlink port show auxiliary/mlx5_core.sf.4/1
|
||||||
|
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
|
||||||
|
|
||||||
|
$ rdma link show mlx5_0/1
|
||||||
|
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
|
||||||
|
|
||||||
|
$ rdma dev show
|
||||||
|
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
|
||||||
|
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
|
||||||
|
|
||||||
|
- Subfunction auxiliary device and class device hierarchy::
|
||||||
|
|
||||||
|
mlx5_core.sf.4
|
||||||
|
(subfunction auxiliary device)
|
||||||
|
/\
|
||||||
|
/ \
|
||||||
|
/ \
|
||||||
|
/ \
|
||||||
|
/ \
|
||||||
|
mlx5_core.eth.4 mlx5_core.rdma.4
|
||||||
|
(sf eth aux dev) (sf rdma aux dev)
|
||||||
|
| |
|
||||||
|
| |
|
||||||
|
p0sf88 mlx5_0
|
||||||
|
(sf netdev) (sf rdma device)
|
||||||
|
|
||||||
|
Additionally, the SF port also gets the event when the driver attaches to the
|
||||||
|
auxiliary device of the subfunction. This results in changing the operational
|
||||||
|
state of the function. This provides visibility to the user to decide when is it
|
||||||
|
safe to delete the SF port for graceful termination of the subfunction.
|
||||||
|
|
||||||
|
- Show the SF port operational state::
|
||||||
|
|
||||||
|
$ devlink port show ens2f0npf0sf88
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:88:88 state active opstate attached
|
@ -0,0 +1,229 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
|
||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
|
===========
|
||||||
|
Tracepoints
|
||||||
|
===========
|
||||||
|
|
||||||
|
:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||||
|
|
||||||
|
mlx5 driver provides internal tracepoints for tracking and debugging using
|
||||||
|
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
|
||||||
|
|
||||||
|
For the list of support mlx5 events, check `/sys/kernel/debug/tracing/events/mlx5/`.
|
||||||
|
|
||||||
|
tc and eswitch offloads tracepoints:
|
||||||
|
|
||||||
|
- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT
|
||||||
|
|
||||||
|
- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL
|
||||||
|
|
||||||
|
- mlx5e_stats_flower: trace flower stats request::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217
|
||||||
|
|
||||||
|
- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1
|
||||||
|
|
||||||
|
- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
|
||||||
|
|
||||||
|
Bridge offloads tracepoints:
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_fdb_entry_init: trace bridge FDB entry offloaded to mlx5::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_fdb_entry_init >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u20:9-2217 [003] ...1 318.582243: mlx5_esw_bridge_fdb_entry_init: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=0 flags=0 used=0
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_fdb_entry_cleanup: trace bridge FDB entry deleted from mlx5::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_fdb_entry_cleanup >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
ip-2581 [005] ...1 318.629871: mlx5_esw_bridge_fdb_entry_cleanup: net_device=enp8s0f0_1 addr=e4:fd:05:08:00:03 vid=0 flags=0 used=16
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_fdb_entry_refresh: trace bridge FDB entry offload refreshed in
|
||||||
|
mlx5::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_fdb_entry_refresh >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u20:8-3849 [003] ...1 466716: mlx5_esw_bridge_fdb_entry_refresh: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=3 flags=0 used=0
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_vlan_create: trace bridge VLAN object add on mlx5
|
||||||
|
representor::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_vlan_create >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
ip-2560 [007] ...1 318.460258: mlx5_esw_bridge_vlan_create: vid=1 flags=6
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_vlan_cleanup: trace bridge VLAN object delete from mlx5
|
||||||
|
representor::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_vlan_cleanup >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
bridge-2582 [007] ...1 318.653496: mlx5_esw_bridge_vlan_cleanup: vid=2 flags=8
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_vport_init: trace mlx5 vport assigned with bridge upper
|
||||||
|
device::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_vport_init >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
ip-2560 [007] ...1 318.458915: mlx5_esw_bridge_vport_init: vport_num=1
|
||||||
|
|
||||||
|
- mlx5_esw_bridge_vport_cleanup: trace mlx5 vport removed from bridge upper
|
||||||
|
device::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_bridge_vport_cleanup >> set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
ip-5387 [000] ...1 573713: mlx5_esw_bridge_vport_cleanup: vport_num=1
|
||||||
|
|
||||||
|
Eswitch QoS tracepoints:
|
||||||
|
|
||||||
|
- mlx5_esw_vport_qos_create: trace creation of transmit scheduler arbiter for vport::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_vport_qos_create >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
<...>-23496 [018] .... 73136.838831: mlx5_esw_vport_qos_create: (0000:82:00.0) vport=2 tsar_ix=4 bw_share=0, max_rate=0 group=000000007b576bb3
|
||||||
|
|
||||||
|
- mlx5_esw_vport_qos_config: trace configuration of transmit scheduler arbiter for vport::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_vport_qos_config >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
<...>-26548 [023] .... 75754.223823: mlx5_esw_vport_qos_config: (0000:82:00.0) vport=1 tsar_ix=3 bw_share=34, max_rate=10000 group=000000007b576bb3
|
||||||
|
|
||||||
|
- mlx5_esw_vport_qos_destroy: trace deletion of transmit scheduler arbiter for vport::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_vport_qos_destroy >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
<...>-27418 [004] .... 76546.680901: mlx5_esw_vport_qos_destroy: (0000:82:00.0) vport=1 tsar_ix=3
|
||||||
|
|
||||||
|
- mlx5_esw_group_qos_create: trace creation of transmit scheduler arbiter for rate group::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_group_qos_create >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
<...>-26578 [008] .... 75776.022112: mlx5_esw_group_qos_create: (0000:82:00.0) group=000000008dac63ea tsar_ix=5
|
||||||
|
|
||||||
|
- mlx5_esw_group_qos_config: trace configuration of transmit scheduler arbiter for rate group::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_group_qos_config >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
<...>-27303 [020] .... 76461.455356: mlx5_esw_group_qos_config: (0000:82:00.0) group=000000008dac63ea tsar_ix=5 bw_share=100 max_rate=20000
|
||||||
|
|
||||||
|
- mlx5_esw_group_qos_destroy: trace deletion of transmit scheduler arbiter for group::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_esw_group_qos_destroy >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
<...>-27418 [006] .... 76547.187258: mlx5_esw_group_qos_destroy: (0000:82:00.0) group=000000007b576bb3 tsar_ix=1
|
||||||
|
|
||||||
|
SF tracepoints:
|
||||||
|
|
||||||
|
- mlx5_sf_add: trace addition of the SF port::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
devlink-9363 [031] ..... 24610.188722: mlx5_sf_add: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 sfnum=88
|
||||||
|
|
||||||
|
- mlx5_sf_free: trace freeing of the SF port::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
devlink-9830 [038] ..... 26300.404749: mlx5_sf_free: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000
|
||||||
|
|
||||||
|
- mlx5_sf_activate: trace activation of the SF port::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
devlink-29841 [008] ..... 3669.635095: mlx5_sf_activate: (0000:08:00.0) port_index=32768 controller=0 hw_id=0x8000
|
||||||
|
|
||||||
|
- mlx5_sf_deactivate: trace deactivation of the SF port::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
devlink-29994 [008] ..... 4015.969467: mlx5_sf_deactivate: (0000:08:00.0) port_index=32768 controller=0 hw_id=0x8000
|
||||||
|
|
||||||
|
- mlx5_sf_hwc_alloc: trace allocating of the hardware SF context::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
devlink-9775 [031] ..... 26296.385259: mlx5_sf_hwc_alloc: (0000:06:00.0) controller=0 hw_id=0x8000 sfnum=88
|
||||||
|
|
||||||
|
- mlx5_sf_hwc_free: trace freeing of the hardware SF context::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u128:3-9093 [046] ..... 24625.365771: mlx5_sf_hwc_free: (0000:06:00.0) hw_id=0x8000
|
||||||
|
|
||||||
|
- mlx5_sf_hwc_deferred_free: trace deferred freeing of the hardware SF context::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
devlink-9519 [046] ..... 24624.400271: mlx5_sf_hwc_deferred_free: (0000:06:00.0) hw_id=0x8000
|
||||||
|
|
||||||
|
- mlx5_sf_update_state: trace state updates for SF contexts::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u20:3-29490 [009] ..... 4141.453530: mlx5_sf_update_state: (0000:08:00.0) port_index=32768 controller=0 hw_id=0x8000 state=2
|
||||||
|
|
||||||
|
- mlx5_sf_vhca_event: trace SF vhca event and state::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u128:3-9093 [046] ..... 24625.365525: mlx5_sf_vhca_event: (0000:06:00.0) hw_id=0x8000 sfnum=88 vhca_state=1
|
||||||
|
|
||||||
|
- mlx5_sf_dev_add: trace SF device add event::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_dev_add>> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u128:3-9093 [000] ..... 24616.524495: mlx5_sf_dev_add: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
|
||||||
|
|
||||||
|
- mlx5_sf_dev_del: trace SF device delete event::
|
||||||
|
|
||||||
|
$ echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event
|
||||||
|
$ cat /sys/kernel/debug/tracing/trace
|
||||||
|
...
|
||||||
|
kworker/u128:3-9093 [044] ..... 24624.400749: mlx5_sf_dev_del: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
|
@ -83,7 +83,7 @@ Configuring the Driver
|
|||||||
MTU
|
MTU
|
||||||
---
|
---
|
||||||
|
|
||||||
Jumbo frame support is available with a maximim size of 9194 bytes.
|
Jumbo frame support is available with a maximum size of 9194 bytes.
|
||||||
|
|
||||||
Interrupt coalescing
|
Interrupt coalescing
|
||||||
--------------------
|
--------------------
|
||||||
|
@ -124,7 +124,7 @@ Multicast flooding
|
|||||||
==================
|
==================
|
||||||
CPU port mcast_flooding is always on
|
CPU port mcast_flooding is always on
|
||||||
|
|
||||||
Turning flooding on/off on swithch ports:
|
Turning flooding on/off on switch ports:
|
||||||
bridge link set dev sw0p1 mcast_flood on/off
|
bridge link set dev sw0p1 mcast_flood on/off
|
||||||
|
|
||||||
Access and Trunk port
|
Access and Trunk port
|
||||||
|
@ -174,7 +174,7 @@ Multicast flooding
|
|||||||
==================
|
==================
|
||||||
CPU port mcast_flooding is always on
|
CPU port mcast_flooding is always on
|
||||||
|
|
||||||
Turning flooding on/off on swithch ports:
|
Turning flooding on/off on switch ports:
|
||||||
bridge link set dev sw0p1 mcast_flood on/off
|
bridge link set dev sw0p1 mcast_flood on/off
|
||||||
|
|
||||||
Access and Trunk port
|
Access and Trunk port
|
||||||
|
@ -69,7 +69,7 @@ wwan0-X network device
|
|||||||
The IOSM driver exposes IP link interface "wwan0-X" of type "wwan" for IP
|
The IOSM driver exposes IP link interface "wwan0-X" of type "wwan" for IP
|
||||||
traffic. Iproute network utility is used for creating "wwan0-X" network
|
traffic. Iproute network utility is used for creating "wwan0-X" network
|
||||||
interface and for associating it with MBIM IP session. The Driver supports
|
interface and for associating it with MBIM IP session. The Driver supports
|
||||||
upto 8 IP sessions for simultaneous IP communication.
|
up to 8 IP sessions for simultaneous IP communication.
|
||||||
|
|
||||||
The userspace management application is responsible for creating new IP link
|
The userspace management application is responsible for creating new IP link
|
||||||
prior to establishing MBIM IP session where the SessionId is greater than 0.
|
prior to establishing MBIM IP session where the SessionId is greater than 0.
|
||||||
|
@ -33,7 +33,7 @@ Device driver can provide specific callbacks for each "health reporter", e.g.:
|
|||||||
* Recovery procedures
|
* Recovery procedures
|
||||||
* Diagnostics procedures
|
* Diagnostics procedures
|
||||||
* Object dump procedures
|
* Object dump procedures
|
||||||
* OOB initial parameters
|
* Out Of Box initial parameters
|
||||||
|
|
||||||
Different parts of the driver can register different types of health reporters
|
Different parts of the driver can register different types of health reporters
|
||||||
with different handlers.
|
with different handlers.
|
||||||
@ -46,12 +46,31 @@ Once an error is reported, devlink health will perform the following actions:
|
|||||||
* A log is being send to the kernel trace events buffer
|
* A log is being send to the kernel trace events buffer
|
||||||
* Health status and statistics are being updated for the reporter instance
|
* Health status and statistics are being updated for the reporter instance
|
||||||
* Object dump is being taken and saved at the reporter instance (as long as
|
* Object dump is being taken and saved at the reporter instance (as long as
|
||||||
there is no other dump which is already stored)
|
auto-dump is set and there is no other dump which is already stored)
|
||||||
* Auto recovery attempt is being done. Depends on:
|
* Auto recovery attempt is being done. Depends on:
|
||||||
|
|
||||||
- Auto-recovery configuration
|
- Auto-recovery configuration
|
||||||
- Grace period vs. time passed since last recover
|
- Grace period vs. time passed since last recover
|
||||||
|
|
||||||
|
Devlink formatted message
|
||||||
|
=========================
|
||||||
|
|
||||||
|
To handle devlink health diagnose and health dump requests, devlink creates a
|
||||||
|
formatted message structure ``devlink_fmsg`` and send it to the driver's callback
|
||||||
|
to fill the data in using the devlink fmsg API.
|
||||||
|
|
||||||
|
Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in
|
||||||
|
json-like format. The API allows the driver to add nested attributes such as
|
||||||
|
object, object pair and value array, in addition to attributes such as name and
|
||||||
|
value.
|
||||||
|
|
||||||
|
Driver should use this API to fill the fmsg context in a format which will be
|
||||||
|
translated by the devlink to the netlink message later. When it needs to send
|
||||||
|
the data using SKBs to the netlink layer, it fragments the data between
|
||||||
|
different SKBs. In order to do this fragmentation, it uses virtual nests
|
||||||
|
attributes, to avoid actual nesting use which cannot be divided between
|
||||||
|
different SKBs.
|
||||||
|
|
||||||
User Interface
|
User Interface
|
||||||
==============
|
==============
|
||||||
|
|
||||||
|
@ -285,7 +285,7 @@ features are enabled after the hierarchy is exported, but before any
|
|||||||
changes are made.
|
changes are made.
|
||||||
|
|
||||||
This feature is also dependent on switchdev being enabled in the system.
|
This feature is also dependent on switchdev being enabled in the system.
|
||||||
It's required bacause devlink-rate requires devlink-port objects to be
|
It's required because devlink-rate requires devlink-port objects to be
|
||||||
present, and those objects are only created in switchdev mode.
|
present, and those objects are only created in switchdev mode.
|
||||||
|
|
||||||
If the driver is set to the switchdev mode, it will export internal
|
If the driver is set to the switchdev mode, it will export internal
|
||||||
@ -320,7 +320,7 @@ nodes and nodes with children also can't be deleted.
|
|||||||
* - ``tx_weight``
|
* - ``tx_weight``
|
||||||
- allows for usage of Weighted Fair Queuing arbitration scheme among
|
- allows for usage of Weighted Fair Queuing arbitration scheme among
|
||||||
siblings. This arbitration scheme can be used simultaneously with
|
siblings. This arbitration scheme can be used simultaneously with
|
||||||
the strict priority. Range 1-200. Only relative values mater for
|
the strict priority. Range 1-200. Only relative values matter for
|
||||||
arbitration.
|
arbitration.
|
||||||
|
|
||||||
``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
|
``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
|
||||||
|
@ -66,3 +66,4 @@ parameters, info versions, and other features it supports.
|
|||||||
prestera
|
prestera
|
||||||
iosm
|
iosm
|
||||||
octeontx2
|
octeontx2
|
||||||
|
sfc
|
||||||
|
@ -54,6 +54,24 @@ parameters.
|
|||||||
- Control the number of large groups (size > 1) in the FDB table.
|
- Control the number of large groups (size > 1) in the FDB table.
|
||||||
|
|
||||||
* The default value is 15, and the range is between 1 and 1024.
|
* The default value is 15, and the range is between 1 and 1024.
|
||||||
|
* - ``esw_multiport``
|
||||||
|
- Boolean
|
||||||
|
- runtime
|
||||||
|
- Control MultiPort E-Switch shared fdb mode.
|
||||||
|
|
||||||
|
An experimental mode where a single E-Switch is used and all the vports
|
||||||
|
and physical ports on the NIC are connected to it.
|
||||||
|
|
||||||
|
An example is to send traffic from a VF that is created on PF0 to an
|
||||||
|
uplink that is natively associated with the uplink of PF1
|
||||||
|
|
||||||
|
Note: Future devices, ConnectX-8 and onward, will eventually have this
|
||||||
|
as the default to allow forwarding between all NIC ports in a single
|
||||||
|
E-switch environment and the dual E-switch mode will likely get
|
||||||
|
deprecated.
|
||||||
|
|
||||||
|
Default: disabled
|
||||||
|
|
||||||
|
|
||||||
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
|
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
|
||||||
|
|
||||||
|
@ -95,5 +95,5 @@ Driver-specific Traps
|
|||||||
* - ``fid_miss``
|
* - ``fid_miss``
|
||||||
- ``exception``
|
- ``exception``
|
||||||
- When a packet enters the device it is classified to a filtering
|
- When a packet enters the device it is classified to a filtering
|
||||||
indentifier (FID) based on the ingress port and VLAN. This trap is used
|
identifier (FID) based on the ingress port and VLAN. This trap is used
|
||||||
to trap packets for which a FID could not be found
|
to trap packets for which a FID could not be found
|
||||||
|
@ -138,4 +138,4 @@ Driver-specific Traps
|
|||||||
- Drops packets with zero (0) IPV4 source address.
|
- Drops packets with zero (0) IPV4 source address.
|
||||||
* - ``met_red``
|
* - ``met_red``
|
||||||
- ``drop``
|
- ``drop``
|
||||||
- Drops non-conforming packets (dropped by Ingress policer, metering drop), e.g. packet rate exceeded configured bandwith.
|
- Drops non-conforming packets (dropped by Ingress policer, metering drop), e.g. packet rate exceeded configured bandwidth.
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user