1309117 Commits

Author SHA1 Message Date
Paolo Abeni
b3ea416419 testing: net-drv: add basic shaper test
Leverage a basic/dummy netdevsim implementation to do functional
coverage for NL interface.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/43092afbf38365c796088bf8fc155e523ab434ae.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:23 -07:00
Paolo Abeni
ecd82cfee3 net-shapers: implement cap validation in the core
Use the device capabilities to reject invalid attribute values before
pushing them to the H/W.

Note that validating the metric explicitly avoids NL_SET_BAD_ATTR()
usage, to provide unambiguous error messages to the user.

Validating the nesting requires the knowledge of the new parent for
the given shaper; as such is a chicken-egg problem: to validate the
leaf nesting we need to know the node scope, to validate the node
nesting we need to know the leafs parent scope.

To break the circular dependency, place the leafs nesting validation
after the parsing.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/54667601813e4c0348f39bf8ad2446ffc9fcd383.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:23 -07:00
Paolo Abeni
553ea9f1ef net: shaper: implement introspection support
The netlink op is a simple wrapper around the device callback.

Extend the existing fetch_dev() helper adding an attribute argument
for the requested device. Reuse such helper in the newly implemented
operation.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/66eb62f22b3a5ba06ca91d01ae77515e5f447e15.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
14bba9285a netlink: spec: add shaper introspection support
Allow the user-space to fine-grain query the shaping features
supported by the NIC on each domain.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/3ddd10e450e3fe7d4b944c0d0b886d4483529ee6.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
ff7d4deb1f net-shapers: implement shaper cleanup on queue deletion
hook into netif_set_real_num_tx_queues() to cleanup any shaper
configured on top of the to-be-destroyed TX queues.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/6da4ee03cae2b2a757d7b59e88baf09cc94c5ef1.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
bf230c497d net-shapers: implement delete support for NODE scope shaper
Leverage the previously introduced group operation to implement
the removal of NODE scope shaper, re-linking its leaves under the
the parent node before actually deleting the specified NODE scope
shaper.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/763d484b5b69e365acccfd8031b183c647a367a4.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
5d5d4700e7 net-shapers: implement NL group operation
Allow grouping multiple leaves shaper under the given root.
The node and the leaves shapers are created, if needed, otherwise
the existing shapers are re-linked as requested.

Try hard to pre-allocated the needed resources, to avoid non
trivial H/W configuration rollbacks in case of any failure.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/8a721274fde18b872d1e3a61aaa916bb7b7996d3.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
93954b40f6 net-shapers: implement NL set and delete operations
Both NL operations directly map on the homonymous device shaper
callbacks, update accordingly the shapers cache and are serialized
via a per device lock.
Implement the cache modification helpers to additionally deal with
NODE scope shaper. That will be needed by the group() operation
implemented in the next patch.
The delete implementation is partial: does not handle NODE scope
shaper yet. Such support will require infrastructure from
the next patch and will be implemented later in the series.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/1e6a34a4095b35d773d2b9c476164671bbcf8397.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
4b623f9f0f net-shapers: implement NL get operation
Introduce the basic infrastructure to implement the net-shaper
core functionality. Each network devices carries a net-shaper cache,
the NL get() operation fetches the data from such cache.

The cache is initially empty, will be fill by the set()/group()
operation implemented later and is destroyed at device cleanup time.

The net_shaper_fill_handle(), net_shaper_ctx_init(), and
net_shaper_generic_pre() implementations handle generic index type
attributes, despite the current caller always pass a constant value
to avoid more noise in later patches using them with different
attributes.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:22 -07:00
Paolo Abeni
04e65df94b netlink: spec: add shaper YAML spec
Define the user-space visible interface to query, configure and delete
network shapers via yaml definition.

Add dummy implementations for the relevant NL callbacks.

set() and delete() operations touch a single shaper creating/updating or
deleting it.
The group() operation creates a shaper's group, nesting multiple input
shapers under the specified output shaper.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/7a33a1ff370bdbcd0cd3f909575c912cd56f41da.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:21 -07:00
Paolo Abeni
13d68a1643 genetlink: extend info user-storage to match NL cb ctx
This allows a more uniform implementation of non-dump and dump
operations, and will be used later in the series to avoid some
per-operation allocation.

Additionally rename the NL_ASSERT_DUMP_CTX_FITS macro, to
fit a more extended usage.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/1130cc2896626b84587a2a5f96a5c6829638f4da.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10 08:30:21 -07:00
Alexander Zubkov
80c549cd1a Fix misspelling of "accept*" in net
Several files have "accept*" misspelled as "accpet*" in the comments.
Fix all such occurrences.

Signed-off-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008162756.22618-2-green@qrator.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:55:40 -07:00
Eric Dumazet
e4650d7ae4 net_sched: sch_sfq: handle bigger packets
SFQ has an assumption on dealing with packets smaller than 64KB.

Even before BIG TCP, TCA_STAB can provide arbitrary big values
in qdisc_pkt_len(skb)

It is time to switch (struct sfq_slot)->allot to a 32bit field.

sizeof(struct sfq_slot) is now 64 bytes, giving better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241008111603.653140-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:50:31 -07:00
Minda Chen
0a316b16a6 net: stmmac: Add DW QoS Eth v4/v5 ip payload error statistics
Add DW QoS Eth v4/v5 ip payload error statistics, and rename descriptor
bit macro because v4/v5 descriptor IPCE bit claims ip checksum
error or TCP/UDP/ICMP segment length error.

Here is bit description from DW QoS Eth data book(Part 19.6.2.2)

bit7 IPCE: IP Payload Error
When this bit is programmed, it indicates either of the following:
1).The 16-bit IP payload checksum (that is, the TCP, UDP, or ICMP
   checksum) calculated by the MAC does not match the corresponding
   checksum field in the received segment.
2).The TCP, UDP, or ICMP segment length does not match the payload
   length value in the IP  Header field.
3).The TCP, UDP, or ICMP segment length is less than minimum allowed
   segment length for TCP, UDP, or ICMP.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://patch.msgid.link/20241008111443.81467-1-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:48:58 -07:00
Tobias Klauser
3a1beabe11 ipv6: Remove redundant unlikely()
IS_ERR_OR_NULL() already implies unlikely().

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008085454.8087-1-tklauser@distanz.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:40:46 -07:00
Eric Dumazet
4daf4dc275 ipv6: switch inet6_acaddr_hash() to less predictable hash
commit 2384d02520ff ("net/ipv6: Add anycast addresses to a global hashtable")
added inet6_acaddr_hash(), using ipv6_addr_hash() and net_hash_mix()
to get hash spreading for typical users.

However ipv6_addr_hash() is highly predictable and a malicious user
could abuse a specific hash bucket.

Switch to __ipv6_addr_jhash(). We could use a dedicated
secret, or reuse net_hash_mix() as I did in this patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241008121307.800040-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:33:57 -07:00
Eric Dumazet
4a0ec2aa07 ipv6: switch inet6_addr_hash() to less predictable hash
In commit 3f27fb23219e ("ipv6: addrconf: add per netns perturbation
in inet6_addr_hash()"), I added net_hash_mix() in inet6_addr_hash()
to get better hash dispersion, at a time all netns were sharing the
hash table.

Since then, commit 21a216a8fc63 ("ipv6/addrconf: allocate a per
netns hash table") made the hash table per netns.

We could remove the net_hash_mix() from inet6_addr_hash(), but
there is still an issue with ipv6_addr_hash().

It is highly predictable and a malicious user can easily create
thousands of IPv6 addresses all stored in the same hash bucket.

Switch to __ipv6_addr_jhash(). We could use a dedicated
secret, or reuse net_hash_mix() as I did in this patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241008120101.734521-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:33:46 -07:00
Lorenzo Bianconi
2518b11963 net: airoha: Fix EGRESS_RATE_METER_EN_MASK definition
Fix typo in EGRESS_RATE_METER_EN_MASK mask definition. This bus in not
introducing any user visible problem since, even if we are setting
EGRESS_RATE_METER_EN_MASK bit in REG_EGRESS_RATE_METER_CFG register,
egress QoS metering is not supported yet since we are missing some other
hw configurations (e.g token bucket rate, token bucket size).

Introduced by commit 23020f049327 ("net: airoha: Introduce ethernet support
for EN7581 SoC")

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-airoha-fixes-v2-1-18af63ec19bf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:29:11 -07:00
Dr. David Alan Gilbert
3325964e99 net: liquidio: Remove unused cn23xx_dump_pf_initialized_regs
cn23xx_dump_pf_initialized_regs() was added in 2016's commit
72c0091293c0 ("liquidio: CN23XX device init and sriov config")

but hasn't been used.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009003841.254853-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:25:34 -07:00
Jakub Kicinski
652c5017e2 Merge branch 'qca_spi-improvements-to-qca7000-sync'
Stefan Wahren says:

====================
qca_spi: Improvements to QCA7000 sync

This series contains patches which improve the QCA7000 sync behavior.
====================

Link: https://patch.msgid.link/20241007113312.38728-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 18:00:29 -07:00
Stefan Wahren
c81cdba640 qca_spi: Improve reset mechanism
The commit 92717c2356cb ("net: qca_spi: Avoid high load if QCA7000 is not
available") fixed the high load in case the QCA7000 is not available
but introduced sync delays for some corner cases like buffer errors.

So add the reset requests to the atomics flags, which are polled by
the SPI thread. As a result reset requests and sync state are now
separated. This has the nice benefit to make the code easier to
understand.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20241007113312.38728-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 18:00:27 -07:00
Stefan Wahren
234b526896 qca_spi: Count unexpected WRBUF_SPC_AVA after reset
After a reset of the QCA7000, the amount of available write buffer
space should match QCASPI_HW_BUF_LEN. If this is not the case
this error should be counted as such.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20241007113312.38728-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 18:00:26 -07:00
xin.guo
d35bd24cea tcp: remove unnecessary update for tp->write_seq in tcp_connect()
Commit 783237e8daf13 ("net-tcp: Fast Open client - sending SYN-data")
introduces tcp_connect_queue_skb() and it would overwrite tcp->write_seq,
so it is no need to update tp->write_seq before invoking
tcp_connect_queue_skb().

Signed-off-by: xin.guo <guoxin0309@gmail.com>
Link: https://patch.msgid.link/1728289544-4611-1-git-send-email-guoxin0309@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:58:49 -07:00
Donald Hunter
54b771e6c6 doc: net: Fix .rst rendering of net_cachelines pages
The doc pages under /networking/net_cachelines are unreadable because
they lack .rst formatting for the tabular text.

Add simple table markup and tidy up the table contents:

- remove dashes that represent empty cells because they render
  as bullets and are not needed
- replace 'struct_*' with 'struct *' in the first column so that
  sphinx can render links for any structs that appear in the docs

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008165329.45647-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:34:49 -07:00
Jakub Kicinski
c786a2a8bc Merge branch 'ipv4-convert-__fib_validate_source-and-its-callers-to-dscp_t'
Guillaume Nault says:

====================
ipv4: Convert __fib_validate_source() and its callers to dscp_t.

This patch series continues to prepare users of ->flowi4_tos to a
future conversion of this field (__u8 to dscp_t). This time, we convert
__fib_validate_source() and its call chain.

The objective is to eventually make all users of ->flowi4_tos use a
dscp_t value. Making ->flowi4_tos a dscp_t field will help avoiding
regressions where ECN bits are erroneously interpreted as DSCP bits.
====================

Link: https://patch.msgid.link/cover.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
3768b40273 ipv4: Convert __fib_validate_source() to dscp_t.
Pass a dscp_t variable to __fib_validate_source(), instead of a plain
u8, to prevent accidental setting of ECN bits in ->flowi4_tos.

Only fib_validate_source() actually calls __fib_validate_source().
Since it already has a dscp_t variable to pass as parameter, we only
need to remove the inet_dscp_to_dsfield() conversion.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/8206b0a64a21a208ed94774e261a251c8d7bc251.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
d36236ab52 ipv4: Convert fib_validate_source() to dscp_t.
Pass a dscp_t variable to fib_validate_source(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos.

All callers of fib_validate_source() already have a dscp_t variable to
pass as parameter. We just need to remove the inet_dscp_to_dsfield()
conversions.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/08612a4519bc5a3578bb493fbaad82437ebb73dc.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
d329764087 ipv4: Convert ip_mc_validate_source() to dscp_t.
Pass a dscp_t variable to ip_mc_validate_source(), instead of a plain
u8, to prevent accidental setting of ECN bits in ->flowi4_tos.

Callers of ip_mc_validate_source() to consider are:

  * ip_route_input_mc() which already has a dscp_t variable to pass as
    parameter. We just need to remove the inet_dscp_to_dsfield()
    conversion.

  * udp_v4_early_demux() which gets the DSCP directly from the IPv4
    header and can simply use the ip4h_dscp() helper.

Also, stop including net/inet_dscp.h in udp.c as we don't use any of
its declarations anymore.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/c91b2cca04718b7ee6cf5b9c1d5b40507d65a8d4.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
1a7c292617 ipv4: Convert ip_route_input_mc() to dscp_t.
Pass a dscp_t variable to ip_route_input_mc(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos.

Only ip_route_input_rcu() actually calls ip_route_input_mc(). Since it
already has a dscp_t variable to pass as parameter, we only need to
remove the inet_dscp_to_dsfield() conversion.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/0cc653ef59bbc0a28881f706d34896c61eba9e01.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
0936c67191 ipv4: Convert __mkroute_input() to dscp_t.
Pass a dscp_t variable to __mkroute_input(), instead of a plain u8, to
prevent accidental setting of ECN bits in ->flowi4_tos.

Only ip_mkroute_input() actually calls __mkroute_input(). Since it
already has a dscp_t variable to pass as parameter, we only need to
remove the inet_dscp_to_dsfield() conversion.

While there, reorganise the function parameters to fill up horizontal
space.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/40853c720aee4d608e6b1b204982164c3b76697d.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
34f28ffd62 ipv4: Convert ip_mkroute_input() to dscp_t.
Pass a dscp_t variable to ip_mkroute_input(), instead of a plain u8, to
prevent accidental setting of ECN bits in ->flowi4_tos.

Only ip_route_input_slow() actually calls ip_mkroute_input(). Since it
already has a dscp_t variable to pass as parameter, we only need to
remove the inet_dscp_to_dsfield() conversion.

While there, reorganise the function parameters to fill up horizontal
space.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/6aa71e28f9ff681cbd70847080e1ab6b526f94f1.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Guillaume Nault
2b78d30620 ipv4: Convert ip_route_use_hint() to dscp_t.
Pass a dscp_t variable to ip_route_use_hint(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos.

Only ip_rcv_finish_core() actually calls ip_route_use_hint(). Use the
ip4h_dscp() helper to get the DSCP from the IPv4 header.

While there, modify the declaration of ip_route_use_hint() in
include/net/route.h so that it matches the prototype of its
implementation in net/ipv4/route.c.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/c40994fdf804db7a363d04fdee01bf48dddda676.1728302212.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 17:31:40 -07:00
Shradha Gupta
6607c17c6c net: mana: Enable debugfs files for MANA device
Implement debugfs in MANA driver to be able to view RX,TX,EQ queue
specific attributes and dump their gdma queues.
These dumps can be used by other userspace utilities to improve
debuggability and troubleshooting

Following files are added in debugfs:

/sys/kernel/debug/mana/
|-------------- 1
    |--------------- EQs
    |                 |------- eq0
    |                 |          |---head
    |                 |          |---tail
    |                 |          |---eq_dump
    |                 |------- eq1
    |                 .
    |                 .
    |
    |--------------- adapter-MTU
    |--------------- vport0
                      |------- RX-0
                      |          |---cq_budget
                      |          |---cq_dump
                      |          |---cq_head
                      |          |---cq_tail
                      |          |---rq_head
                      |          |---rq_nbuf
                      |          |---rq_tail
                      |          |---rxq_dump
                      |------- RX-1
                      .
                      .
                      |------- TX-0
                      |          |---cq_budget
                      |          |---cq_dump
                      |          |---cq_head
                      |          |---cq_tail
                      |          |---sq_head
                      |          |---sq_pend_skb_qlen
                      |          |---sq_tail
                      |          |---txq_dump
                      |------- TX-1
                      .
                      .

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 13:42:04 +01:00
Heiner Kallweit
1ffcc8d413 r8169: add support for the temperature sensor being available from RTL8125B
This adds support for the temperature sensor being available from
RTL8125B. Register information was taken from r8125 vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 13:38:19 +01:00
David S. Miller
2a80d89256 Merge branch 'net-improve-multicast-group-join-performance'
Jonas Rebmann says:

====================
improve multicast join group performance

This series seeks to improve performance on updating igmp group
memberships such as with IP_ADD_MEMBERSHIP or MCAST_JOIN_SOURCE_GROUP.

Our use case was to add 2000 multicast memberships on a TQMLS1046A which
took about 3.6 seconds for the membership additions alone. Our userspace
reproducer tool was instrumented to log runtimes of the individual
setsockopt invocations which clearly indicated quadratic complexity of
setting up the membership with regard to the total number of multicast
groups to be joined. We used perf to locate the hotspots and
subsequently optimized the most costly sections of code.

This series includes a patch to Linux igmp handling as well as a patch
to the DPAA/Freescale driver. With both patches applied, our memberships can
be set up in only about 87 miliseconds, which corresponds to a speedup
of around 40.

While we have acheived practically linear run-time complexity on the
kernel side, a small quadratic factor remains in parts of the freescale
driver code which we haven't yet optimized. We have by now payed little
attention to the optimization potential in dropping group memberships,
yet the dpaa patch applies to joining and leaving groups alike.

Overall, this patch series brings great improvements in use cases
involving large numbers of multicast groups, particularly when using the
fsl_dpa driver, without noteworthy drawbacks in other scenarios.
====================

Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:50:11 +01:00
Jonas Rebmann
298f70b371 net: dpaa: use __dev_mc_sync in dpaa_set_rx_mode()
The original driver first unregisters then re-registers all multicast
addresses in the struct net_device_ops::ndo_set_rx_mode() callback.

As the networking stack calls ndo_set_rx_mode() if a single multicast
address change occurs, a significant amount of time may be used to first
unregister and then re-register unchanged multicast addresses. This
leads to performance issues when tracking large numbers of multicast
addresses.

Replace the unregister and register loop and the hand crafted
mc_addr_list list handling with __dev_mc_sync(), to only update entries
which have changed.

On profiling with an fsl_dpa NIC, this patch presented a speedup of
around 40 when successively setting up 2000 multicast groups using
setsockopt(), without drawbacks on smaller numbers of multicast groups.

Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:50:11 +01:00
Jonas Rebmann
69a3272d78 net: ipv4: igmp: optimize ____ip_mc_inc_group() using mc_hash
The runtime cost of joining a single multicast group in the current
implementation of ____ip_mc_inc_group grows linearly with the number of
existing memberships. This is caused by the linear search for an
existing group record in the multicast address list.

This linear complexity results in quadratic complexity when successively
adding memberships, which becomes a performance bottleneck when setting
up large numbers of multicast memberships.

If available, use the existing multicast hash map mc_hash to quickly
search for an existing group membership record. This leads to
near-constant complexity on the addition of a new multicast record,
significantly improving performance for workloads involving many
multicast memberships.

On profiling with a loopback device, this patch presented a speedup of
around 6 when successively setting up 2000 multicast groups using
setsockopt without measurable drawbacks on smaller numbers of
multicast groups.

Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:50:11 +01:00
David Woodhouse
2050327242 ptp: Add support for the AMZNC10C 'vmclock' device
The vmclock device addresses the problem of live migration with
precision clocks. The tolerances of a hardware counter (e.g. TSC) are
typically around ±50PPM. A guest will use NTP/PTP/PPS to discipline that
counter against an external source of 'real' time, and track the precise
frequency of the counter as it changes with environmental conditions.

When a guest is live migrated, anything it knows about the frequency of
the underlying counter becomes invalid. It may move from a host where
the counter running at -50PPM of its nominal frequency, to a host where
it runs at +50PPM. There will also be a step change in the value of the
counter, as the correctness of its absolute value at migration is
limited by the accuracy of the source and destination host's time
synchronization.

In its simplest form, the device merely advertises a 'disruption_marker'
which indicates that the guest should throw away any NTP synchronization
it thinks it has, and start again.

Because the shared memory region can be exposed all the way to userspace
through the /dev/vmclock0 node, applications can still use time from a
fast vDSO 'system call', and check the disruption marker to be sure that
their timestamp is indeed truthful.

The structure also allows for the precise time, as known by the host, to
be exposed directly to guests so that they don't have to wait for NTP to
resync from scratch. The PTP driver consumes this information if present.
Like the KVM PTP clock, this PTP driver can convert TSC-based cross
timestamps into KVM clock values. Unlike the KVM PTP clock, it does so
only when such is actually helpful.

The values and fields are based on the nascent virtio-rtc specification,
and the intent is that a version (hopefully precisely this version) of
this structure will be included as an optional part of that spec. In the
meantime, this driver supports the simple ACPI form of the device which
is being shipped in certain commercial hypervisors (and submitted for
inclusion in QEMU).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:16:18 +01:00
David S. Miller
f31fd0b3b2 Merge branch 'pcs-xpcs-cleanups-batch-2'
Russell King says:

====================
net: pcs: xpcs: cleanups batch 2

This is the second cleanup series for XPCS.

Patch 1 removes the enum indexing the dw_xpcs_compat array. The index is
never used except to place entries in the array and to size the array.

Patch 2 removes the interface arrays - each of which only contain one
interface.

Patch 3 makes xpcs_find_compat() take the xpcs structure rather than the
ID - the previous series removed the reason for xpcs_find_compat needing
to take the ID.

Patch 4 provides a helper to convert xpcs structure to a regular
phylink_pcs structure, which leads to patch 5.

Patch 5 moves the definition of struct dw_xpcs to the private xpcs
header - with patch 4 in place, nothing outside of the xpcs driver
accesses the contents of the dw_xpcs structure.

Patch 6 renames xpcs_get_id() to xpcs_read_id() since it's reading the
ID, rather than doing anything further with it. (Prior versions of this
series renamed it to xpcs_read_phys_id() since that more accurately
described that it was reading the physical ID registers.)

Patch 7 moves the searching of the ID list out of line as this is a
separate functional block.

Patch 8 converts xpcs to use the bitmap macros, which eliminates the
need for _SHIFT definitions.

Patch 9 adds and uses _modify() accessors as there are a large amount
of read-modify-write operations in this driver. This conversion found
a bug in xpcs-wx code that has been reported and already fixed.

Patch 10 converts xpcs to use read_poll_timeout() rather than open
coding that.

Patch 11 converts all printed messages to use the dev_*() functions so
the driver and devie name are always printed.

Patch 12 moves DW_VR_MII_DIG_CTRL1_2G5_EN to the correct place in the
header file, rather than amongst another register's definitions.

Patch 13 moves the Wangxun workaround to a common location rather than
duplicating it in two places. We also reformat this to fit within
80 columns.

====================

Tested-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:12 +01:00
Russell King (Oracle)
bb0b8aeca6 net: pcs: xpcs: move Wangxun VR_XS_PCS_DIG_CTRL1 configuration
According to commits 2a22b7ae2fa3 ("net: pcs: xpcs: adapt Wangxun NICs
for SGMII mode") and 2deea43f386d ("net: pcs: xpcs: add 1000BASE-X AN
interrupt support"), Wangxun devices need special VR_XS_PCS_DIG_CTRL1
settings for SGMII and 1000BASE-X. Both SGMII and 1000BASE-X use the
same settings.

Rather than placing these in the individual xpcs_config_*() functions,
move it to where we already test for the Wangxun devices in
xpcs_do_config().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:12 +01:00
Russell King (Oracle)
5ba5619303 net: pcs: xpcs: correctly place DW_VR_MII_DIG_CTRL1_2G5_EN
Place DW_VR_MII_DIG_CTRL1_2G5_EN with the other DW_VR_MII_DIG_CTRL1
definitions rather than in the middle of a register list.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
acb5fb5a42 net: pcs: xpcs: use dev_*() to print messages
Use the dev_*() family of functions to print all messages from the XPCS
driver so we know which instance issues the messages.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
d69908faf1 net: pcs: xpcs: convert to use read_poll_timeout()
Convert the xpcs driver to use read_poll_timeout() when waiting for
reset to complete, rather than open-coding this.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
ce8d6081fc net: pcs: xpcs: add _modify() accessors
The xpcs driver does a lot of read-modify-write operations on
registers, which leads to long-winded code to read the register, check
whether the read was successful, modify the value in some way, and then
write it back.

We have a mdiodev _modify() accessor that encapsulates this, and does
the register modification under the MDIO bus lock ensuring that the
modification is atomic with respect to other bus operations. Convert
the xpcs driver to use this accessor.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
f681891810 net: pcs: xpcs: use FIELD_PREP() and FIELD_GET()
Convert xpcs to use the bitfield macros rather than definining the
bitfield shifts and open-coding the insertion and extraction of these
bitfields.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
7921d3e602 net: pcs: xpcs: move searching ID list out of line
Move the searching of the physical ID out of xpcs_create() and into
its own xpcs_identify() function, which makes it self contained.
This reduces the complexity in xpcs_craete(), making it easier to
follow, rather than having a lot of once-run code in the big for()
loop.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
135d118bfd net: pcs: xpcs: rename xpcs_get_id()
Rename xpcs_get_id() to xpcs_read_id() which more closely reflects
the purpose of this function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
accd5f5cd2 net: pcs: xpcs: move definition of struct dw_xpcs to private header
There should be no reason for anything outside the XPCS code to know
the contents of struct dw_xpcs - this is a private structure to XPCS.
Move the definition to the private pcs-xpcs.h header, leaving a
declaration in the global pcs/pcs-xpcs.h

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
f042365a26 net: pcs: xpcs: provide a helper to get the phylink pcs given xpcs
Provide a helper to provide the pointer to the phylink_pcs struct
given a valid xpcs pointer. This will be necessary when we make
struct dw_xpcs private to pcs-xpcs.c

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
4490f5669b net: pcs: xpcs: pass xpcs instead of xpcs->id to xpcs_find_compat()
xpcs_find_compat() is now always passed xpcs->id. Rather than always
dereferencing this in the caller, move it into xpcs_find_compat(),
thus making this function consistent with most of the other xpcs
functions in taking an xpcs pointer.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00