1017523 Commits

Author SHA1 Message Date
Bailey Forrest
a57e5de476 gve: DQO: Add TX path
TX SKBs will have their buffers DMA mapped with the device. Each buffer
will have at least one TX descriptor associated. Each SKB will also have
a metadata descriptor.

Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects.
Every TX SKB will have an associated pending_packet object. A TX SKB's
descriptors will use its pending_packet's index as the completion tag,
which will be returned on the TX completion queue.

The device implements a "flow-miss model". Most packets will simply
receive a packet completion. The flow-miss system may choose to process
a packet based on its contents. A TX packet which experiences a flow
miss would receive a miss completion followed by a later reinjection
completion. The miss-completion is received when the packet starts to be
processed by the flow-miss system and the reinjection completion is
received when the flow-miss system completes processing the packet and
sends it on the wire.

Notable mentions:

- Buffers may be freed after receiving the miss-completion, but in order
  to avoid packet reordering, we do not complete the SKB until receiving
  the reinjection completion.

- The driver must robustly handle the unlikely scenario where a miss
  completion does not have an associated reinjection completion. This is
  accomplished by maintaining a list of packets which have a pending
  reinjection completion. After a short timeout (5 seconds), the
  SKB and buffers are released and the pending_packet is moved to a
  second list which has a longer timeout (60 seconds), where the
  pending_packet will not be reused. When the longer timeout elapses,
  the driver may assume the reinjection completion would never be
  received and the pending_packet may be reused.

- Completion handling is triggered by an interrupt and is done in the
  NAPI poll function. Because the TX path and completion exist in
  different threading contexts they maintain their own lists for free
  pending_packet objects. The TX path uses a lock-free approach to steal
  the list from the completion path.

- Both the TSO context and general context descriptors have metadata
  bytes. The device requires that if multiple descriptors contain the
  same field, each descriptor must have the same value set for that
  field.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
0dcc144a79 gve: DQO: Configure interrupts on device up
When interrupts are first enabled, we also set the ratelimits, which
will be static for the entire usage of the device.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
9c1a59a2f4 gve: DQO: Add ring allocation and initialization
Allocate the buffer and completion ring structures. Do not populate the
rings yet. That will happen in the respective rx and tx datapath
follow-on patches

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
5e8c5adf95 gve: DQO: Add core netdev features
Add napi netdev device registration, interrupt handling and initial tx
and rx polling stubs. The stubs will be filled in follow-on patches.

Also:
- LRO feature advertisement and handling
- Also update ethtool logic

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
1f6228e459 gve: Update adminq commands to support DQO queues
DQO queue creation requires additional parameters:
- TX completion/RX buffer queue size
- TX completion/RX buffer queue address
- TX/RX queue size
- RX buffer size

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
a4aa1f1e69 gve: Add DQO fields for core data structures
- Add new DQO datapath structures:
  - `gve_rx_buf_queue_dqo`
  - `gve_rx_compl_queue_dqo`
  - `gve_rx_buf_state_dqo`
  - `gve_tx_desc_dqo`
  - `gve_tx_pending_packet_dqo`

- Incorporate these into the existing ring data structures:
  - `gve_rx_ring`
  - `gve_tx_ring`

Noteworthy mentions:

- `gve_rx_buf_state` represents an RX buffer which was posted to HW.
  Each RX queue has an array of these objects and the index into the
  array is used as the buffer_id when posted to HW.

- `gve_tx_pending_packet_dqo` is treated similarly for TX queues. The
  completion_tag is the index into the array.

- These two structures have links for linked lists which are represented
  by 16b indexes into a contiguous array of these structures.
  This reduces memory footprint compared to 64b pointers.

- We use unions for the writeable datapath structures to reduce cache
  footprint. GQI specific members will renamed like DQO members in a
  future patch.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
223198183f gve: Add dqo descriptors
General description of rings and descriptors:

TX ring is used for sending TX packet buffers to the NIC. It has the
following descriptors:
- `gve_tx_pkt_desc_dqo` - Data buffer descriptor
- `gve_tx_tso_context_desc_dqo` - TSO context descriptor
- `gve_tx_general_context_desc_dqo` - Generic metadata descriptor

Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo`
which represents the logical interpetation of the metadata bytes. It's
helpful to define this structure because the metadata bytes exist in
multiple descriptor types (including `gve_tx_tso_context_desc_dqo`),
and the device requires same field has the same value in all
descriptors.

The TX completion ring is used to receive completions from the NIC.
Having a separate ring allows for completions to be out of order. The
completion descriptor `gve_tx_compl_desc` has several different types,
most important are packet and descriptor completions. Descriptor
completions are used to notify the driver when descriptors sent on the
TX ring are done being consumed. The descriptor completion is only used
to signal that space is cleared in the TX ring. A packet completion will
be received when a packet transmitted on the TX queue is done being
transmitted.

In addition there are "miss" and "reinjection" completions. The device
implements a "flow-miss model". Most packets will simply receive a
packet completion. The flow-miss system may choose to process a packet
based on its contents. A TX packet which experiences a flow miss would
receive a miss completion followed by a later reinjection completion.
The miss-completion is received when the packet starts to be processed
by the flow-miss system and the reinjection completion is received when
the flow-miss system completes processing the packet and sends it on the
wire.

The RX buffer ring is used to send buffers to HW via the
`gve_rx_desc_dqo` descriptor.

Received packets are put into the RX queue by the device, which
populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors
refer to buffers posted by the buffer queue. Received buffers may be
returned out of order, such as when HW LRO is enabled.

Important concepts:
- "TX" and "RX buffer" queues, which send descriptors to the device, use
  MMIO doorbells to notify the device of new descriptors.

- "RX" and "TX completion" queues, which receive descriptors from the
  device, use a "generation bit" to know when a descriptor was populated
  by the device. The driver initializes all bits with the "current
  generation". The device will populate received descriptors with the
  "next generation" which is inverted from the current generation. When
  the ring wraps, the current/next generation are swapped.

- It's the driver's responsibility to ensure that the RX and TX
  completion queues are not overrun. This can be accomplished by
  limiting the number of descriptors posted to HW.

- TX packets have a 16 bit completion_tag and RX buffers have a 16 bit
  buffer_id. These will be returned on the TX completion and RX queues
  respectively to let the driver know which packet/buffer was completed.

Bitfields are used to describe descriptor fields. This notation is more
concise and readable than shift-and-mask. It is possible because the
driver is restricted to little endian platforms.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
c4b87ac876 gve: Add support for DQO RX PTYPE map
Unlike GQI, DQO RX descriptors do not contain the L3 and L4 type of the
packet. L3 and L4 types are necessary in order to set the hash and csum
on RX SKBs correctly.

DQO RX descriptors instead contain a 10 bit PTYPE index. The PTYPE map
enables the device to tell the driver how to map from PTYPE index to
L3/L4 type.

The device doesn't provide any guarantees about the range of possible
PTYPEs, so we just use a 1024 entry array to implement a fast mapping
structure.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
5ca2265eef gve: adminq: DQO specific device descriptor logic
- In addition to TX and RX queues, DQO has TX completion and RX buffer
  queues.
  - TX completions are received when the device has completed sending a
    packet on the wire.
  - RX buffers are posted on a separate queue form the RX completions.
- DQO descriptor rings are allowed to be smaller than PAGE_SIZE.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
a5886ef4f4 gve: Introduce per netdev enum gve_queue_format
The currently supported queue formats are:
- GQI_RDA - GQI with raw DMA addressing
- GQI_QPL - GQI with queue page list
- DQO_RDA - DQO with raw DMA addressing

The old `gve_priv.raw_addressing` value is only used for GQI_RDA, so we
remove it in favor of just checking against GQI_RDA

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
8a39d3e0da gve: Introduce a new model for device options
The current model uses an integer ID and a fixed size struct for the
parameters of each device option.

The new model allows the device option structs to grow in size over
time. A driver may assume that changes to device option structs will
always be appended.

New device options will also generally have a
`supported_features_mask` so that the driver knows which fields within a
particular device option are enabled.

`gve_device_option.feat_mask` is changed to `required_features_mask`,
and it is a bitmask which must match the value expected by the driver.
This gives the device the ability to break backwards compatibility with
old drivers for certain features by blocking the old drivers from trying
to use the feature.

We maintain ABI compatibility with the old model for
GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which
does not support the new model.

This patch introduces some new terminology:

RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
      mapped and read/updated by the device.

QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped
      with the device for read/write and data is copied from/to SKBs.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
920fb45193 gve: Make gve_rx_slot_page_info.page_offset an absolute offset
Using `page_offset` like a boolean means a page may only be split into
two sections. With page sizes larger than 4k, this can be very wasteful.
Future commits in this patchset use `struct gve_rx_slot_page_info` in a
way which supports a fixed buffer size and a variable page size.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
35f9b2f43f gve: gve_rx_copy: Move padding to an argument
Future use cases will have a different padding value.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
dbdaa67540 gve: Move some static functions to a common file
These functions will be shared by the GQI and DQO variants of the GVNIC
driver as of follow-up patches in this series.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
c6a7ed77ee gve: Update GVE documentation to describe DQO
DQO is a new descriptor format for our next generation virtual NIC.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Yajun Deng
478890682f usbnet: add usbnet_event_names[] for kevent
Modify the netdev_dbg content from int to char * in usbnet_defer_kevent(),
this looks more readable.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:34:45 -07:00
David S. Miller
67faf76d26 Merge branch 'add-sparx5i-driver'
Steen Hegelund says:

====================
Adding the Sparx5i Switch Driver

This series provides the Microchip Sparx5i Switch Driver

The SparX-5 Enterprise Ethernet switch family provides a rich set of
Enterprise switching features such as advanced TCAM-based VLAN and QoS
processing enabling delivery of differentiated services, and security
through TCAMbased frame processing using versatile content aware processor
(VCAP). IPv4/IPv6 Layer 3 (L3) unicast and multicast routing is supported
with up to 18K IPv4/9K IPv6 unicast LPM entries and up to 9K IPv4/3K IPv6
(S,G) multicast groups. L3 security features include source guard and
reverse path forwarding (uRPF) tasks. Additional L3 features include
VRF-Lite and IP tunnels (IP over GRE/IP).

The SparX-5 switch family features a highly flexible set of Ethernet ports
with support for 10G and 25G aggregation links, QSGMII, USGMII, and
USXGMII.  The device integrates a powerful 1 GHz dual-core ARM® Cortex®-A53
CPU enabling full management of the switch and advanced Enterprise
applications.

The SparX-5 switch family targets managed Layer 2 and Layer 3 equipment in
SMB, SME, and Enterprise where high port count 1G/2.5G/5G/10G switching
with 10G/25G aggregation links is required.

The SparX-5 switch family consists of following SKUs:

  VSC7546 SparX-5-64 supports up to 64 Gbps of bandwidth with the following
  primary port configurations.
   - 6 ×10G
   - 16 × 2.5G + 2 × 10G
   - 24 × 1G + 4 × 10G

  VSC7549 SparX-5-90 supports up to 90 Gbps of bandwidth with the following
  primary port configurations.
   - 9 × 10G
   - 16 × 2.5G + 4 × 10G
   - 48 × 1G + 4 × 10G

  VSC7552 SparX-5-128 supports up to 128 Gbps of bandwidth with the
  following primary port configurations.
   - 12 × 10G
   - 6 x 10G + 2 x 25G
   - 16 × 2.5G + 8 × 10G
   - 48 × 1G + 8 × 10G

  VSC7556 SparX-5-160 supports up to 160 Gbps of bandwidth with the
  following primary port configurations.
   - 16 × 10G
   - 10 × 10G + 2 × 25G
   - 16 × 2.5G + 10 × 10G
   - 48 × 1G + 10 × 10G

  VSC7558 SparX-5-200 supports up to 200 Gbps of bandwidth with the
  following primary port configurations.
   - 20 × 10G
   - 8 × 25G

In addition, the device supports one 10/100/1000/2500/5000 Mbps
SGMII/SerDes node processor interface (NPI) Ethernet port.

Time sensitive networking (TSN) is supported through a comprehensive set of
features including frame preemption, cut-through, frame replication and
elimination for reliability, enhanced scheduling: credit-based shaping,
time-aware shaping, cyclic queuing, and forwarding, and per-stream policing
and filtering.

Together with IEEE 1588 and IEEE 802.1AS support, this guarantees
low-latency deterministic networking for Industrial Ethernet.

The Sparx5i support is developed on the PCB134 and PCB135 evaluation boards.

- PCB134 main networking features:
  - 12x SFP+ front 10G module slots (connected to Sparx5i through SFI).
  - 8x SFP28 front 25G module slots (connected to Sparx5i through SFI high
    speed).
  - Optional, one additional 10/100/1000BASE-T (RJ45) Ethernet port
    (on-board VSC8211 PHY connected to Sparx5i through SGMII).

- PCB135 main networking features:
  - 48x1G (10/100/1000M) RJ45 front ports using 12xVSC8514 QuadPHY’s each
    connected to VSC7558 through QSGMII.
  - 4x10G (1G/2.5G/5G/10G) RJ45 front ports using the AQR407 10G QuadPHY
    each port connects to VSC7558 through SFI.
  - 4x SFP28 25G module slots on back connected to VSC7558 through SFI high
    speed.
  - Optional, one additional 1G (10/100/1000M) RJ45 port using an on-board
    VSC8211 PHY, which can be connected to VSC7558 NPI port through SGMII
    using a loopback add-on PCB)

This series provides support for:
  - SFPs and DAC cables via PHYLINK with a number of 5G, 10G and 25G
    devices and media types.
  - Port module configuration for 10M to 25G speeds with SGMII, QSGMII,
    1000BASEX, 2500BASEX and 10GBASER as appropriate for these modes.
  - SerDes configuration via the Sparx5i SerDes driver (see below).
  - Host mode providing register based injection and extraction.
  - Switch mode providing MAC/VLAN table learning and Layer2 switching
    offloaded to the Sparx5i switch.
  - STP state, VLAN support, host/bridge port mode, Forwarding DB, and
    configuration and statistics via ethtool.

More support will be added at a later stage.

The Sparx5i Chip Register Model can be browsed at this location:
https://github.com/microchip-ung/sparx-5_reginfo
and the datasheet is available here:
https://ww1.microchip.com/downloads/en/DeviceDoc/SparX-5_Family_L2L3_Enterprise_10G_Ethernet_Switches_Datasheet_00003822B.pdf

The series depends on the following series currently on their way
into the kernel:

- 25G Base-R phy mode
  Link: https://lore.kernel.org/r/20210611125453.313308-1-steen.hegelund@microchip.com/
- Sparx5 Reset Driver
  Link: https://lore.kernel.org/r/20210416084054.2922327-1-steen.hegelund@microchip.com/

ChangeLog:
v5:
    - cover letter
        - updated the description to match the latest data sheets
    - basic driver
        - added error message in case of reset controller error
        - port struct: replacing has_sfp with inband, adding pause_adv
    - host mode
        - port cleanup: unregisters netdevs and then removes phylink etc
        - checking for pause_adv when comparing port config changes
        - getting duplex and pause state in the link_up callback.
        - getting inband, autoneg and pause_adv config in the pcs_config
          callback.
    - port
        - use only the pause_adv bits when getting aneg status
        - use the inband state when updating the PCS and port config
v4:
    - basic driver:
        Using devm_reset_control_get_optional_shared to get the reset
        control, and let the reset framework check if it is valid.
    - host mode (phylink):
        Use the PCS operations to get state and update configuration.
        Removed the setting of interface modes.  Let phylink control this.
        Using the new 5gbase-r and 25gbase-r modes.
        Using a helper function to check if one of the 3 base-r modes has
        been selected.
        Currently it will not be possible to change the interface mode by
        changing the speed (e.g via ethtool).  This will be added later.
v3:
    - basic driver:
        - removed unneeded braces
        - release reference to ports node after use
        - use dev_err_probe to handle DEFER
        - update error value when bailing out (a few cases)
        - updated formatting of port struct and grouping of bool values
        - simplified the spx5_rmw and spx5_inst_rmw inline functions
    - host mode (netdev):
        - removed lockless flag
        - added port timer init
    - host mode (packet - manual injection):
        - updated error counters in error situations
        - implemented timer handling of watermark threshold: stop and
          restart netif queues.
        - fixed error message handling (rate limited)
        - fixed comment style error
        - used DIV_ROUND_UP macro
        - removed a debug message for open ports

v2:
    - Updated bindings:
        - drop minItems for the reg property
    - Statistics implementation:
        - Reorganized statistics into ethtool groups:
            eth-phy, eth-mac, eth-ctrl, rmon
          as defined by the IEEE 802.3 categories and RFC 2819.
        - The remaining statistics are provided by the classic ethtool
          statistics command.
    - Hostmode support:
        - Removed netdev renaming
        - Validate ethernet address in sparx5_set_mac_address()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
d0f482bb06 arm64: dts: sparx5: Add the Sparx5 switch node
This provides the configuration for the currently available evaluation
boards PCB134 and PCB135.

The series depends on the following series currently on its way
into the kernel:

- Sparx5 Reset Driver
  Link: https://lore.kernel.org/r/20210416084054.2922327-1-steen.hegelund@microchip.com/

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
af4b11022e net: sparx5: add ethtool configuration and statistics support
This adds statistic counters for the network interfaces provided
by the driver.  It also adds CPU port counters (which are not
exposed by ethtool).
This also adds support for configuring the network interface
parameters via ethtool: speed, duplex, aneg etc.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
0a9d48ad0d net: sparx5: add calendar bandwidth allocation support
This configures the Sparx5 calendars according to the bandwidth
requested in the Device Tree nodes.
It also checks if the total requested bandwidth is within the
specs of the detected Sparx5 models limits.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
d6fce51419 net: sparx5: add switching support
This adds SwitchDev support by hardware offloading the
software bridge.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
78eab33bb6 net: sparx5: add vlan support
This adds Sparx5 VLAN support.

Sparx5 has more VLAN features than provided here, but these will be added
in later series. For now we only add the basic L2 features.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
b37a1bae74 net: sparx5: add mactable support
This adds the Sparx5 MAC tables: listening for MAC table updates and
updating on request.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
946e7fd505 net: sparx5: add port module support
This add configuration of the Sparx5 port module instances.

Sparx5 has in total 65 logical ports (denoted D0 to D64) and 33
physical SerDes connections (S0 to S32).  The 65th port (D64) is fixed
allocated to SerDes0 (S0). The remaining 64 ports can in various
multiplexing scenarios be connected to the remaining 32 SerDes using
QSGMII, or USGMII or USXGMII extenders. 32 of the ports can have a 1:1
mapping to the 32 SerDes.

Some additional ports (D65 to D69) are internal to the device and do not
connect to port modules or SerDes macros. For example, internal ports are
used for frame injection and extraction to the CPU queues.

The 65 logical ports are split up into the following blocks.

- 13 x 5G ports (D0-D11, D64)
- 32 x 2G5 ports (D16-D47)
- 12 x 10G ports (D12-D15, D48-D55)
- 8 x 25G ports (D56-D63)

Each logical port supports different line speeds, and depending on the
speeds supported, different port modules (MAC+PCS) are needed. A port
supporting 5 Gbps, 10 Gbps, or 25 Gbps as maximum line speed, will have a
DEV5G, DEV10G, or DEV25G module to support the 5 Gbps, 10 Gbps (incl 5
Gbps), or 25 Gbps (including 10 Gbps and 5 Gbps) speeds. As well as, it
will have a shadow DEV2G5 port module to support the lower speeds
(10/100/1000/2500Mbps). When a port needs to operate at lower speed and the
shadow DEV2G5 needs to be connected to its corresponding SerDes

Not all interface modes are supported in this series, but will be added at
a later stage.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
Steen Hegelund
f3cad2611a net: sparx5: add hostmode with phylink support
This patch adds netdevs and phylink support for the ports in the switch.
It also adds register based injection and extraction for these ports.

Frame DMA support for injection and extraction will be added in a later
series.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
Steen Hegelund
3cfa11bac9 net: sparx5: add the basic sparx5 driver
This adds the Sparx5 basic SwitchDev driver framework with IO range
mapping, switch device detection and core clock configuration.

Support for ports, phylink, netdev, mactable etc. are in the following
patches.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
Steen Hegelund
f8c63088a9 dt-bindings: net: sparx5: Add sparx5-switch bindings
Document the Sparx5 switch device driver bindings

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
Marcin Wojtas
c88c192dc3 net: mdiobus: fix fwnode_mdbiobus_register() fallback case
The fallback case of fwnode_mdbiobus_register()
(relevant for !CONFIG_FWNODE_MDIO) was defined with wrong
argument name, causing a compilation error. Fix that.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:19:20 -07:00
Jakub Kicinski
6d123b81ac net: ip: avoid OOM kills with large UDP sends over loopback
Dave observed number of machines hitting OOM on the UDP send
path. The workload seems to be sending large UDP packets over
loopback. Since loopback has MTU of 64k kernel will try to
allocate an skb with up to 64k of head space. This has a good
chance of failing under memory pressure. What's worse if
the message length is <32k the allocation may trigger an
OOM killer.

This is entirely avoidable, we can use an skb with page frags.

af_unix solves a similar problem by limiting the head
length to SKB_MAX_ALLOC. This seems like a good and simple
approach. It means that UDP messages > 16kB will now
use fragments if underlying device supports SG, if extra
allocator pressure causes regressions in real workloads
we can switch to trying the large allocation first and
falling back.

v4: pre-calculate all the additions to alloclen so
    we can be sure it won't go over order-2

Reported-by: Dave Jones <dsj@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:17:21 -07:00
Lorenz Bauer
ae24bab257 tools/testing: add a selftest for SO_NETNS_COOKIE
Make sure that SO_NETNS_COOKIE returns a non-zero value, and
that sockets from different namespaces have a distinct cookie
value.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:13:05 -07:00
Martynas Pumputis
e8b9eab992 net: retrieve netns cookie via getsocketopt
It's getting more common to run nested container environments for
testing cloud software. One of such examples is Kind [1] which runs a
Kubernetes cluster in Docker containers on a single host. Each container
acts as a Kubernetes node, and thus can run any Pod (aka container)
inside the former. This approach simplifies testing a lot, as it
eliminates complicated VM setups.

Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
programs are used for load-balancing. The load-balancer BPF program
needs to detect whether a request originates from the host netns or a
container netns in order to allow some access, e.g. to a service via a
loopback IP address. Typically, the programs detect this by comparing
netns cookies with the one of the init ns via a call to
bpf_get_netns_cookie(NULL). However, in nested environments the latter
cannot be used given the Kubernetes node's netns is outside the init ns.
To fix this, we need to pass the Kubernetes node netns cookie to the
program in a different way: by extending getsockopt() with a
SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
node netns can retrieve the cookie and pass it to the program instead.

Thus, this is following up on Eric's commit 3d368ab87cf6 ("net:
initialize net->net_cookie at netns setup") to allow retrieval via
SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
via SO_COOKIE.

  [1] https://kind.sigs.k8s.io/

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:13:05 -07:00
Kalle Valo
c2a3823dad iwlwifi: acpi: remove unused function iwl_acpi_eval_dsm_func()
Stephen reported a warning:

drivers/net/wireless/intel/iwlwifi/fw/acpi.c:720:12: warning: 'iwl_acpi_eval_dsm_func' defined but not used [-Wunused-function]

The warning is correct and the function is not used anywhere, so let's
just remove it.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 7119f02b5d34 ("iwlwifi: mvm: support BIOS enable/disable for 11ax in Russia")
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210624052918.4946-1-kvalo@codeaurora.org
2021-06-24 19:21:57 +03:00
Po-Hao Huang
1d8820d546 rtw88: fix c2h memory leak
Fix erroneous code that leads to unreferenced objects. During H2C
operations, some functions returned without freeing the memory that only
the function have access to. Release these objects when they're no longer
needed to avoid potentially memory leaks.

Signed-off-by: Po-Hao Huang <phhuang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210624023459.10294-1-pkshih@realtek.com
2021-06-24 19:21:15 +03:00
Shawn Guo
1a3ac5c651 brcmfmac: support parse country code map from DT
With any regulatory domain requests coming from either user space or
802.11 IE (Information Element), the country is coded in ISO3166
standard.  It needs to be translated to firmware country code and
revision with the mapping info in settings->country_codes table.
Support populate country_codes table by parsing the mapping from DT.

The BRCMF_BUSTYPE_SDIO bus_type check gets separated from general DT
validation, so that country code can be handled as general part rather
than SDIO bus specific one.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210417075428.2671-1-shawn.guo@linaro.org
2021-06-24 19:20:31 +03:00
David S. Miller
35713d9b8f Merge branch 'devlink-rate-limit-fixes'
Dmytro Linkin says:

====================
Fixes for devlink rate objects API

Patch #1 fixes not decreased refcount of parent node for destroyed leaf
object.

Patch #2 fixes incorect eswitch mode check.

Patch #3 protects list traversing with a lock.

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 15:46:25 -07:00
Dmytro Linkin
a3e5e5797f devlink: Protect rate list with lock while switching modes
Devlink eswitch set command doesn't hold devlink->lock, which makes
possible race condition between rate list traversing and others devlink
rate KAPI calls, like devlink_rate_nodes_destroy().
Hold devlink lock while traversing the list.

Fixes: a8ecb93ef03d ("devlink: Introduce rate nodes")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 15:46:25 -07:00
Dmytro Linkin
ff99324ded devlink: Remove eswitch mode check for mode set call
When eswitch is disabled, querying its current mode results in error.
Due to this when trying to set the eswitch mode for mlx5 devices, it
fails to set the eswitch switchdev mode.
Hence remove such check.

Fixes: a8ecb93ef03d ("devlink: Introduce rate nodes")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 15:46:25 -07:00
Dmytro Linkin
1321ed5e76 devlink: Decrease refcnt of parent rate object on leaf destroy
Port functions, like SFs, can be deleted by the user when its leaf rate
object has parent node. In such case node refcnt won't be decreased
which blocks the node from deletion later.
Do simple refcnt decrease, since driver in cleanup stage. This:
1) assumes that driver took proper internal parent unset action;
2) allows to avoid nested callbacks call and deadlock.

Fixes: d75559845078 ("devlink: Allow setting parent node of rate objects")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 15:46:25 -07:00
Xianting Tian
a2f7dc00ea virtio_net: Use virtio_find_vqs_ctx() helper
virtio_find_vqs_ctx() is defined but never be called currently,
it is the right place to use it.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 13:52:22 -07:00
Kuniyuki Iwashima
10ed7ce42b net/tls: Remove the __TLS_DEC_STATS() macro.
The commit d26b698dd3cd ("net/tls: add skeleton of MIB statistics")
introduced __TLS_DEC_STATS(), but it is not used and __SNMP_DEC_STATS() is
not defined also. Let's remove it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 13:47:55 -07:00
Kuniyuki Iwashima
55d444b310 tcp: Add stats for socket migration.
This commit adds two stats for the socket migration feature to evaluate the
effectiveness: LINUX_MIB_TCPMIGRATEREQ(SUCCESS|FAILURE).

If the migration fails because of the own_req race in receiving ACK and
sending SYN+ACK paths, we do not increment the failure stat. Then another
CPU is responsible for the req.

Link: https://lore.kernel.org/bpf/CAK6E8=cgFKuGecTzSCSQ8z3YJ_163C0uwO9yRvfDSE7vOe9mJA@mail.gmail.com/
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:56:08 -07:00
David Wilder
7525de2516 ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM.
TCP checksums on received packets may be set to NULL by the sender if CSO
is enabled. The hypervisor flags these packets as check-sum-ok and the
skb is then flagged CHECKSUM_UNNECESSARY. If these packets are then
forwarded the sender will not request CSO due to the CHECKSUM_UNNECESSARY
flag. The result is a TCP packet sent with a bad checksum. This change
sets up CHECKSUM_PARTIAL on these packets causing the sender to correctly
request CSUM offload.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Tested-by: Cristobal Forno <cforno12@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:54:11 -07:00
David S. Miller
fe87797bf2 mlx5-net-next-2021-06-22
1) Various minor cleanups and fixes from net-next branch
 2) Optimize mlx5 feature check on tx and
    a fix to allow Vxlan with Ipsec offloads
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDSYyYACgkQSD+KveBX
 +j45Bgf+M7Vg3OK4fA7ZURrbxu2EiCQB4XprO/it7YSzpiZx634DNNRzWXQ2mLJD
 jx5TcmAFVUKkGmx/qrPYe/Y9c9l5s6JMjwACL5aEawXtvPzI/q1KBx/n5L+CoFYw
 lO1IpBPpkwqIbeIl9cwata7IJ6aTeOGjqfQ//Fodfwga063Aaggig6sEcPkr0Ewe
 wcuJmblnw/qOkSI2BlSOyixYiVjPRDF7cAVRTBK4/DCDFiGJTiaj8w0JgfdS2zVs
 3xMrYajdz7qArMcqbuQe59KojeYj4hxALUSs+s9ks8qeIrbM+hZ9sH5m0dJgM4P6
 7Pg87LD6PFDV31DWF0XM3KgdqfiZxw==
 =aszX
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-net-next-2021-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-net-next-2021-06-22

1) Various minor cleanups and fixes from net-next branch
2) Optimize mlx5 feature check on tx and
   a fix to allow Vxlan with Ipsec offloads
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:48:07 -07:00
David S. Miller
a7b62112f0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Skip non-SCTP packets in the new SCTP chunk support for nft_exthdr,
   from Phil Sutter.

2) Simplify TCP option sanity check for TCP packets, also from Phil.

3) Add a new expression to store when the rule has been used last time.

4) Pass the hook state object to log function, from Florian Westphal.

5) Document the new sysctl knobs to tune the flowtable timeouts,
   from Oz Shlomo.

6) Fix snprintf error check in the new nfnetlink_hook infrastructure,
   from Dan Carpenter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:31:28 -07:00
Andrea Righi
0a36a75c68 selftests: icmp_redirect: support expected failures
According to a comment in commit 99513cfa16c6 ("selftest: Fixes for
icmp_redirect test") the test "IPv6: mtu exception plus redirect" is
expected to fail, because of a bug in the IPv6 logic that hasn't been
fixed yet apparently.

We should probably consider this failure as an "expected failure",
therefore change the script to return XFAIL for that particular test and
also report the total amount of expected failures at the end of the run.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:22:30 -07:00
David S. Miller
e940eb3c1b Merge branch 'lockless-qdisc-opts'
Yunsheng Lin says:

====================
Some optimization for lockless qdisc

Patch 1: remove unnecessary seqcount operation.
Patch 2: implement TCQ_F_CAN_BYPASS.
Patch 3: remove qdisc->empty.

Performance data for pktgen in queue_xmit mode + dummy netdev
with pfifo_fast:

 threads    unpatched           patched             delta
    1       2.60Mpps            3.21Mpps             +23%
    2       3.84Mpps            5.56Mpps             +44%
    4       5.52Mpps            5.58Mpps             +1%
    8       2.77Mpps            2.76Mpps             -0.3%
   16       2.24Mpps            2.23Mpps             -0.4%

Performance for IP forward testing: 1.05Mpps increases to
1.16Mpps, about 10% improvement.

V3: Add 'Acked-by' from Jakub and 'Tested-by' from Vladimir,
    and resend based on latest net-next.
V2: Adjust the comment and commit log according to discussion
    in V1.
V1: Drop RFC tag, add nolock_qdisc_is_empty() and do the qdisc
    empty checking without the protection of qdisc->seqlock to
    aviod doing unnecessary spin_trylock() for contention case.
RFC v4: Use STATE_MISSED and STATE_DRAINING to indicate non-empty
        qdisc, and add patch 1 and 3.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:17:35 -07:00
Yunsheng Lin
d3e0f57501 net: sched: remove qdisc->empty for lockless qdisc
As MISSED and DRAINING state are used to indicate a non-empty
qdisc, qdisc->empty is not longer needed, so remove it.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:17:35 -07:00
Yunsheng Lin
c4fef01ba4 net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
flag set, but queue discipline by-pass does not work for lockless
qdisc because skb is always enqueued to qdisc even when the qdisc
is empty, see __dev_xmit_skb().

This patch calls sch_direct_xmit() to transmit the skb directly
to the driver for empty lockless qdisc, which aviod enqueuing
and dequeuing operation.

As qdisc->empty is not reliable to indicate a empty qdisc because
there is a time window between enqueuing and setting qdisc->empty.
So we use the MISSED state added in commit a90c57f2cedd ("net:
sched: fix packet stuck problem for lockless qdisc"), which
indicate there is lock contention, suggesting that it is better
not to do the qdisc bypass in order to avoid packet out of order
problem.

In order to make MISSED state reliable to indicate a empty qdisc,
we need to ensure that testing and clearing of MISSED state is
within the protection of qdisc->seqlock, only setting MISSED state
can be done without the protection of qdisc->seqlock. A MISSED
state testing is added without the protection of qdisc->seqlock to
aviod doing unnecessary spin_trylock() for contention case.

As the enqueuing is not within the protection of qdisc->seqlock,
there is still a potential data race as mentioned by Jakub [1]:

      thread1               thread2             thread3
qdisc_run_begin() # true
                        qdisc_run_begin(q)
                             set(MISSED)
pfifo_fast_dequeue
  clear(MISSED)
  # recheck the queue
qdisc_run_end()
                            enqueue skb1
                                             qdisc empty # true
                                          qdisc_run_begin() # true
                                          sch_direct_xmit() # skb2
                         qdisc_run_begin()
                            set(MISSED)

When above happens, skb1 enqueued by thread2 is transmited after
skb2 is transmited by thread3 because MISSED state setting and
enqueuing is not under the qdisc->seqlock. If qdisc bypass is
disabled, skb1 has better chance to be transmited quicker than
skb2.

This patch does not take care of the above data race, because we
view this as similar as below:
Even at the same time CPU1 and CPU2 write the skb to two socket
which both heading to the same qdisc, there is no guarantee that
which skb will hit the qdisc first, because there is a lot of
factor like interrupt/softirq/cache miss/scheduling afffecting
that.

There are below cases that need special handling:
1. When MISSED state is cleared before another round of dequeuing
   in pfifo_fast_dequeue(), and __qdisc_run() might not be able to
   dequeue all skb in one round and call __netif_schedule(), which
   might result in a non-empty qdisc without MISSED set. In order
   to avoid this, the MISSED state is set for lockless qdisc and
   __netif_schedule() will be called at the end of qdisc_run_end.

2. The MISSED state also need to be set for lockless qdisc instead
   of calling __netif_schedule() directly when requeuing a skb for
   a similar reason.

3. For netdev queue stopped case, the MISSED case need clearing
   while the netdev queue is stopped, otherwise there may be
   unnecessary __netif_schedule() calling. So a new DRAINING state
   is added to indicate this case, which also indicate a non-empty
   qdisc.

4. As there is already netif_xmit_frozen_or_stopped() checking in
   dequeue_skb() and sch_direct_xmit(), which are both within the
   protection of qdisc->seqlock, but the same checking in
   __dev_xmit_skb() is without the protection, which might cause
   empty indication of a lockless qdisc to be not reliable. So
   remove the checking in __dev_xmit_skb(), and the checking in
   the protection of qdisc->seqlock seems enough to avoid the cpu
   consumption problem for netdev queue stopped case.

1. https://lkml.org/lkml/2021/5/29/215

Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:17:35 -07:00
Yunsheng Lin
dd25296afa net: sched: avoid unnecessary seqcount operation for lockless qdisc
qdisc->running seqcount operation is mainly used to do heuristic
locking on q->busylock for locked qdisc, see qdisc_is_running()
and __dev_xmit_skb().

So avoid doing seqcount operation for qdisc with TCQ_F_NOLOCK
flag.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:17:35 -07:00
Kalle Valo
559c664751 iwlwifi patches for v5.14
* Some robustness improvements in the PCI code;
 * Remove some duplicate and unused declarations;
 * Improve PNVM load robustness by increasing the timeout a bit;
 * Support for a new HW;
 * Suport for BIOS control of 11ax enablement in Russia;
 * Support UNII4 enablement from BIOS;
 * Support LMR feedback;
 * Fix in TWT;
 * Some fixes in IML (image loader) DMA handling;
 * Fixes in WoWLAN;
 * Updates in the WoWLAN FW commands;
 * Add one new device to the PCI ID lists;
 * Support reading PNVM from a UEFI variable;
 * Bump the supported FW API version;
 * Some other small fixes, clean-ups and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF3LNfgb2BPWm68smoUecoho8xfoFAmDR8WcACgkQoUecoho8
 xfpF2BAAm4rm0cWL68nKh8J6AsZbPZeRrDyHVAPV8ThOqjXQ/gkx/tDf2Tnbcwes
 AsDEY9eIxti7xbWTzFVd7dj8/h05dcrdvrYx/HMXIKfGBMf6162mwur+uXx3TJ1v
 7F3R+CpmdnrkhcWfLgNRoYP4hCBcw4N3rtSJuJODChTfe6zr1BnnnIAws+GcoPzG
 wjg6/xUCHoB08Eo6ErzdGJR2Xe/kxyYKE87BOgvuFBV4rOVviRkTdO4Cst/0C+c1
 NmhNeqVjMGxmg842jRD/QTDu6z1ACb03NKR76wKnXRGP1I5CuyG7auU5apFM1B61
 GHlM5JnwNxvoCeOQ5OrPF/G/dIXugNFnInP7lQg2mbi2bs2BFJiDdyqCt53Y7lij
 pNXUlF1r6fwQxLaf26bX4+w6cnr8fjMp5iCrQDNu+2NBrv7IdntiUIB0BfPfHLqo
 mqKEagFju9UxhPBNfFrUIfv7ZMty00hNlJlikLUSRH2fqpHsKHNZzsE8fHZbGhys
 Gmcn3k9HTWqlkV+lNOSJKyn08t9fotc7GcBNlBkC8wH94Iv5ram4CNzWucc+nTo6
 MV8aLylyhEOUzJ/+kd3msXfyW6yia+Itr4Rbaw0qawZ6taIu+IXo0zEeAYILNAS5
 gkuF3eXakzChAlUIo/GNYNDcpB+dd+h38P9gkzN2MoKuWfUPitE=
 =HjQs
 -----END PGP SIGNATURE-----

Merge tag 'iwlwifi-next-for-kalle-2021-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

iwlwifi patches for v5.14

* Some robustness improvements in the PCI code;
* Remove some duplicate and unused declarations;
* Improve PNVM load robustness by increasing the timeout a bit;
* Support for a new HW;
* Suport for BIOS control of 11ax enablement in Russia;
* Support UNII4 enablement from BIOS;
* Support LMR feedback;
* Fix in TWT;
* Some fixes in IML (image loader) DMA handling;
* Fixes in WoWLAN;
* Updates in the WoWLAN FW commands;
* Add one new device to the PCI ID lists;
* Support reading PNVM from a UEFI variable;
* Bump the supported FW API version;
* Some other small fixes, clean-ups and improvements.

# gpg: Signature made Tue 22 Jun 2021 05:19:19 PM EEST
# gpg:                using RSA key 1772CD7E06F604F5A6EBCB26A1479CA21A3CC5FA
# gpg: Good signature from "Luciano Roth Coelho (Luca) <luca@coelho.fi>" [full]
# gpg:                 aka "Luciano Roth Coelho (Intel) <luciano.coelho@intel.com>" [full]
2021-06-23 20:48:56 +03:00