74839 Commits

Author SHA1 Message Date
Arseniy Krasnov
e2fcc326b4 vsock/virtio: support MSG_ZEROCOPY for transport
Add 'msgzerocopy_allow()' callback for virtio transport.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Arseniy Krasnov
dcc55d7bb2 vsock: enable SOCK_SUPPORT_ZC bit
This bit is used by io_uring in case of zerocopy tx mode. io_uring code
checks, that socket has this feature. This patch sets it in two places:
1) For socket in 'connect()' call.
2) For new socket which is returned by 'accept()' call.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Arseniy Krasnov
5fbfc7d243 vsock: check for MSG_ZEROCOPY support on send
This feature totally depends on transport, so if transport doesn't
support it, return error.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Arseniy Krasnov
49dbe25ada vsock: read from socket's error queue
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Arseniy Krasnov
0064cfb440 vsock: set EPOLLERR on non-empty error queue
If socket's error queue is not empty, EPOLLERR must be set. Otherwise,
reader of error queue won't detect data in it using EPOLLERR bit.
Currently for AF_VSOCK this is actual only with MSG_ZEROCOPY, as this
feature is the only user of an error queue of the socket.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Lukas Bulwahn
85605fb694 appletalk: remove special handling code for ipddp
After commit 1dab47139e61 ("appletalk: remove ipddp driver") removes the
config IPDDP, there is some minor code clean-up possible in the appletalk
network layer.

Remove some code in appletalk layer after the ipddp driver is gone.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231012063443.22368-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 17:59:32 -07:00
Heng Guo
cf8b49fbd0 net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check
Reproduce environment:
network with 3 VM linuxs is connected as below:
VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
VM1: eth0 ip: 192.168.122.207 MTU 1800
VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
VM3: eth0 ip: 192.168.123.240 MTU 1800

Reproduce:
VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'.
scapy command:
send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1,
inter=1.000000)

Result:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
    ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
    OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
    ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
    OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 7 0 2 2 0 0 2 5 0 0 0 0 0 0 0 1 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
ForwDatagrams is always keeping 2 without increment.

Issue description and patch:
ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len
(1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set.
According to RFC 4293 "3.2.3. IP Statistics Tables",
  +-------+------>------+----->-----+----->-----+
  | InForwDatagrams (6) | OutForwDatagrams (6)  |
  |                     V                       +->-+ OutFragReqds
  |                 InNoRoutes                  |   | (packets)
  / (local packet (3)                           |   |
  |  IF is that of the address                  |   +--> OutFragFails
  |  and may not be the receiving IF)           |   |    (packets)
the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment
check.
The existing implementation, instead, would incease the counter after
fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6.
So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward()
for ipv4 and ip6_forward() for ipv6.

Test result with patch:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
    ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
    OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
    ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
    OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
ForwDatagrams is updated from 2 to 3.

Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231011015137.27262-1-heng.guo@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 09:58:45 -07:00
Sabrina Dubroca
9f0c824551 tls: use fixed size for tls_offload_context_{tx,rx}.driver_state
driver_state is a flex array, but is always allocated by the tls core
to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by
making that size explicit so that sizeof(struct
tls_offload_context_{tx,rx}) works.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
1cf7fbcee6 tls: validate crypto_info in a separate helper
Simplify do_tls_setsockopt_conf a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
4f48669918 tls: remove tls_context argument from tls_set_device_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.

While at it, fix up the reverse xmas tree ordering.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
b6a30ec923 tls: remove tls_context argument from tls_set_sw_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
0137407999 tls: add a helper to allocate/initialize offload_ctx_tx
Simplify tls_set_device_offload a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
1a074f7618 tls: also use init_prot_info in tls_set_device_offload
Most values are shared. Nonce size turns out to be equal to IV size
for all offloadable ciphers.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
a9937816ed tls: move tls_prot_info initialization out of tls_set_sw_offload
Simplify tls_set_sw_offload, and allow reuse for the tls_device code.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
615580cbc9 tls: extract context alloc/initialization out of tls_set_sw_offload
Simplify tls_set_sw_offload a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
1c1cb3110d tls: store iv directly within cipher_context
TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit
in cipher_context's size and can simplify the init code a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:10 +01:00
Sabrina Dubroca
bee6b7b307 tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZE
It's defined in include/net/tls.h, avoid using an overly generic name.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Sabrina Dubroca
6d5029e547 tls: store rec_seq directly within cipher_context
TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Sabrina Dubroca
8f1d532b4a tls: drop unnecessary cipher_type checks in tls offload
We should never reach tls_device_reencrypt, tls_enc_record, or
tls_enc_skb with a cipher_type that can't be offloaded. Replace those
checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of
hard-coding offloadable cipher types.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Sabrina Dubroca
3bab3ee0f9 tls: get salt using crypto_info_salt in tls_enc_skb
I skipped this conversion in my previous series.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 11:26:09 +01:00
Amit Cohen
38985e8c27 net: Handle bulk delete policy in bridge driver
The merge commit 92716869375b ("Merge branch 'br-flush-filtering'")
added support for FDB flushing in bridge driver. The following patches
will extend VXLAN driver to support FDB flushing as well. The netlink
message for bulk delete is shared between the drivers. With the existing
implementation, there is no way to prevent user from flushing with
attributes that are not supported per driver. For example, when VNI will
be added, user will not get an error for flush FDB entries in bridge
with VNI, although this attribute is not relevant for bridge.

As preparation for support of FDB flush in VXLAN driver, move the policy
to be handled in bridge driver, later a new policy for VXLAN will be
added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(),
as this field is relevant only for bridge.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 10:00:30 +01:00
Jakub Kicinski
0e6bb5b7f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

kernel/bpf/verifier.c
  829955981c55 ("bpf: Fix verifier log for async callback return values")
  a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-12 17:07:34 -07:00
Florian Westphal
2f0968a030 net: gso_test: fix build with gcc-12 and earlier
gcc 12 errors out with:
net/core/gso_test.c:58:48: error: initializer element is not constant
   58 |                 .segs = (const unsigned int[]) { gso_size },

This version isn't old (2022), so switch to preprocessor-bsaed constant
instead of 'static const int'.

Cc: Willem de Bruijn <willemb@google.com>
Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com>
Closes: https://lore.kernel.org/netdev/79fbe35c-4dd1-4f27-acb2-7a60794bc348@linux.vnet.ibm.com/
Fixes: 1b4fa28a8b07 ("net: parametrize skb_segment unit test to expand coverage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231012120901.10765-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12 15:34:08 +02:00
Jeremy Cline
354a6e707e nfc: nci: assert requested protocol is valid
The protocol is used in a bit mask to determine if the protocol is
supported. Assert the provided protocol is less than the maximum
defined so it doesn't potentially perform a shift-out-of-bounds and
provide a clearer error for undefined protocols vs unsupported ones.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reported-and-tested-by: syzbot+0839b78e119aae1fec78@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0839b78e119aae1fec78
Signed-off-by: Jeremy Cline <jeremy@jcline.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231009200054.82557-1-jeremy@jcline.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12 09:32:10 +02:00
Kuniyuki Iwashima
e2bca4870f af_packet: Fix fortified memcpy() without flex array.
Sergei Trofimovich reported a regression [0] caused by commit a0ade8404c3b
("af_packet: Fix warning of fortified memcpy() in packet_getname().").

It introduced a flex array sll_addr_flex in struct sockaddr_ll as a
union-ed member with sll_addr to work around the fortified memcpy() check.

However, a userspace program uses a struct that has struct sockaddr_ll in
the middle, where a flex array is illegal to exist.

  include/linux/if_packet.h:24:17: error: flexible array member 'sockaddr_ll::<unnamed union>::<unnamed struct>::sll_addr_flex' not at end of 'struct packet_info_t'
     24 |                 __DECLARE_FLEX_ARRAY(unsigned char, sll_addr_flex);
        |                 ^~~~~~~~~~~~~~~~~~~~

To fix the regression, let's go back to the first attempt [1] telling
memcpy() the actual size of the array.

Reported-by: Sergei Trofimovich <slyich@gmail.com>
Closes: https://github.com/NixOS/nixpkgs/pull/252587#issuecomment-1741733002 [0]
Link: https://lore.kernel.org/netdev/20230720004410.87588-3-kuniyu@amazon.com/ [1]
Fixes: a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20231009153151.75688-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12 09:15:15 +02:00
Jakub Kicinski
21b2e2624d netfilter net-next pull request 2023-10-10
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUlYWENHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AC68D/0WnxpQHiCMl7rUWrDJjmhYenUY1uM9VZb1fpd7
 hH543aVbJmWNQj9uzWmKceuLKtIV5jCgsgzqHcxq5e0oHLc3IjIOX1j6L6Dtv7Wf
 8obzfCZOYvp0rfaB/2W8d+ehet69i8UQlWM6uKVD/vEedAynfu8AaCgaNy6WwgrK
 ElXa3lZzflfoeqvwZkndxIkeuoXQB4mYCcuIXuseB+pYv5pz2th2OsGNxeeae0Q0
 XlOPY7LNrAWMbUFfJUiRMb7bEXINOcmZMGC/9OTpYTJe3j32ybi0pZY91AV8HEQY
 xuQY2k4dVs4bSfxPFkrXlansPh7fbtl/EF6LNoDbCobV6RAZrc0SEM7eSybRoigm
 FrsWdms4bp5RRaBWxi3MRj1ir8nBHjLJ3AHlec1h6EMcXOWsF91u8mk0+0i9eF8E
 htw7Rxc7ZR3xRUKFxA5+53oYwMYnst9PdZ54DKrAMJOfykU6pGx2hJbsmTfodeGl
 zAT1GXPjEDejbXdmD84CzIs+bCmuGNrQQZ6o9gAaYVm9DJJMEXUpp1yK+/ApEnsz
 pcZNi7NeZRlwJhayLCYdvTT3sUlcDYqGuJoDE/krKNH6Aq0iAb/zvHxbOAn7R550
 UFTVpB78W+0G0eufpgCIsxGHIgAZOGYWTe/TcrQIghWPNxfrt6p5XZVh2N4g8n9C
 nPiOdg==
 =pmbB
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for next

First 5 patches, from Phil Sutter, clean up nftables dumpers to
use the context buffer in the netlink_callback structure rather
than a kmalloc'd buffer.

Patch 6, from myself, zaps dead code and replaces the helper function
with a small inlined helper.

Patch 7, also from myself, removes another pr_debug and replaces it
with the existing nf_log-based debug helpers.

Last patch, from George Guo, gets nft_table comments back in
sync with the structure members.

* tag 'nf-next-23-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: cleanup struct nft_table
  netfilter: conntrack: prefer tcp_error_log to pr_debug
  netfilter: conntrack: simplify nf_conntrack_alter_reply
  netfilter: nf_tables: Don't allocate nft_rule_dump_ctx
  netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx
  netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx
  netfilter: nf_tables: Drop pointless memset when dumping rules
  netfilter: nf_tables: Always allocate nft_rule_dump_ctx
====================

Link: https://lore.kernel.org/r/20231010145343.12551-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11 17:37:58 -07:00
Jakub Kicinski
71c299c711 net: tcp: fix crashes trying to free half-baked MTU probes
tcp_stream_alloc_skb() initializes the skb to use tcp_tsorted_anchor
which is a union with the destructor. We need to clean that
TCP-iness up before freeing.

Fixes: 736013292e3c ("tcp: let tcp_mtu_probe() build headless packets")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231010173651.3990234-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11 17:24:46 -07:00
Willem de Bruijn
4688ecb138 net: expand skb_segment unit test with frag_list coverage
Expand the test with these variants that use skb frag_list:

- GSO_TEST_FRAG_LIST:             frag_skb length is gso_size
- GSO_TEST_FRAG_LIST_PURE:        same, data exclusively in frag skbs
- GSO_TEST_FRAG_LIST_NON_UNIFORM: frag_skb length may vary
- GSO_TEST_GSO_BY_FRAGS:          frag_skb length defines gso_size,
                                  i.e., segs may have varying sizes.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11 10:39:01 +01:00
Willem de Bruijn
1b4fa28a8b net: parametrize skb_segment unit test to expand coverage
Expand the test with variants

- GSO_TEST_NO_GSO:      payload size less than or equal to gso_size
- GSO_TEST_FRAGS:       payload in both linear and page frags
- GSO_TEST_FRAGS_PURE:  payload exclusively in page frags
- GSO_TEST_GSO_PARTIAL: produce one gso segment of multiple of gso_size,
                        plus optionally one non-gso trailer segment

Define a test struct that encodes the input gso skb and output segs.
Input in terms of linear and fragment lengths. Output as length of
each segment.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11 10:39:01 +01:00
Willem de Bruijn
b3098d32ed net: add skb_segment kunit test
Add unit testing for skb segment. This function is exercised by many
different code paths, such as GSO_PARTIAL or GSO_BY_FRAGS, linear
(with or without head_frag), frags or frag_list skbs, etc.

It is infeasible to manually run tests that cover all code paths when
making changes. The long and complex function also makes it hard to
establish through analysis alone that a patch has no unintended
side-effects.

Add code coverage through kunit regression testing. Introduce kunit
infrastructure for tests under net/core, and add this first test.

This first skb_segment test exercises a simple case: a linear skb.
Follow-on patches will parametrize the test and add more variants.

Tested: Built and ran the test with

    make ARCH=um mrproper

    ./tools/testing/kunit/kunit.py run \
        --kconfig_add CONFIG_NET=y \
        --kconfig_add CONFIG_DEBUG_KERNEL=y \
        --kconfig_add CONFIG_DEBUG_INFO=y \
        --kconfig_add=CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y \
        net_core_gso

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11 10:39:01 +01:00
Nils Hoppmann
a950a5921d net/smc: Fix pos miscalculation in statistics
SMC_STAT_PAYLOAD_SUB(_smc_stats, _tech, key, _len, _rc) will calculate
wrong bucket positions for payloads of exactly 4096 bytes and
(1 << (m + 12)) bytes, with m == SMC_BUF_MAX - 1.

Intended bucket distribution:
Assume l == size of payload, m == SMC_BUF_MAX - 1.

Bucket 0                : 0 < l <= 2^13
Bucket n, 1 <= n <= m-1 : 2^(n+12) < l <= 2^(n+13)
Bucket m                : l > 2^(m+12)

Current solution:
_pos = fls64((l) >> 13)
[...]
_pos = (_pos < m) ? ((l == 1 << (_pos + 12)) ? _pos - 1 : _pos) : m

For l == 4096, _pos == -1, but should be _pos == 0.
For l == (1 << (m + 12)), _pos == m, but should be _pos == m - 1.

In order to avoid special treatment of these corner cases, the
calculation is adjusted. The new solution first subtracts the length by
one, and then calculates the correct bucket by shifting accordingly,
i.e. _pos = fls64((l - 1) >> 13), l > 0.
This not only fixes the issues named above, but also makes the whole
bucket assignment easier to follow.

Same is done for SMC_STAT_RMB_SIZE_SUB(_smc_stats, _tech, k, _len),
where the calculation of the bucket position is similar to the one
named above.

Fixes: e0e4b8fa5338 ("net/smc: Add SMC statistics support")
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Nils Hoppmann <niho@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11 10:36:35 +01:00
Yajun Deng
5247dbf16c net/core: Introduce netdev_core_stats_inc()
Although there is a kfree_skb_reason() helper function that can be used to
find the reason why this skb is dropped, but most callers didn't increase
one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.

For the users, people are more concerned about why the dropped in ip
is increasing.

Introduce netdev_core_stats_inc() for trace the caller of
dev_core_stats_*_inc().

Also, add __code to netdev_core_stats_alloc(), as it's called with small
probability. And add noinline make sure netdev_core_stats_inc was never
inlined.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11 10:07:40 +01:00
Russell King (Oracle)
63b9f7a19f net: dsa: remove dsa_port_phylink_validate()
As all drivers now provide phylink capabilities (including MAC), the
if() condition in dsa_port_phylink_validate() will always be true. We
will always use the generic validator, which phylink will call itself
if the .validate method isn't populated. Thus, there is now no need to
implement the .validate method, so this implementation can be removed.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11 10:06:05 +01:00
Jakub Kicinski
ad98426a88 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZSXLzwAKCRDbK58LschI
 g1wuAQDTT1mrUmRqrpPob/U3HCcTg64hgdRwyF+6IU39/+neGwEAoP0FKZoy3DDf
 C8FOdVChBjapPsp9zTeYPv0nlZMITAE=
 =1Shl
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-10-11

We've added 14 non-merge commits during the last 5 day(s) which contain
a total of 12 files changed, 398 insertions(+), 104 deletions(-).

The main changes are:

1) Fix s390 JIT backchain issues in the trampoline code generation which
   previously clobbered the caller's backchain, from Ilya Leoshkevich.

2) Fix zero-size allocation warning in xsk sockets when the configured
   ring size was close to SIZE_MAX, from Andrew Kanner.

3) Fixes for bpf_mprog API that were found when implementing support
   in the ebpf-go library along with selftests, from Daniel Borkmann
   and Lorenz Bauer.

4) Fix riscv JIT to properly sign-extend the return register in programs.
   This fixes various test_progs selftests on riscv, from Björn Töpel.

5) Fix verifier log for async callback return values where the allowed
   range was displayed incorrectly, from David Vernet.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  s390/bpf: Fix unwinding past the trampoline
  s390/bpf: Fix clobbering the caller's backchain in the trampoline
  selftests/bpf: Add testcase for async callback return value failure
  bpf: Fix verifier log for async callback return values
  xdp: Fix zero-size allocation warning in xskq_create()
  riscv, bpf: Track both a0 (RISC-V ABI) and a5 (BPF) return values
  riscv, bpf: Sign-extend return values
  selftests/bpf: Make seen_tc* variable tests more robust
  selftests/bpf: Test query on empty mprog and pass revision into attach
  selftests/bpf: Adapt assert_mprog_count to always expect 0 count
  selftests/bpf: Test bpf_mprog query API via libbpf and raw syscall
  bpf: Refuse unused attributes in bpf_prog_{attach,detach}
  bpf: Handle bpf_mprog_query with NULL entry
  bpf: Fix BPF_PROG_QUERY last field check
====================

Link: https://lore.kernel.org/r/20231010223610.3984-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10 19:59:49 -07:00
Kory Maincent
108a36d07c ethtool: Fix mod state of verbose no_mask bitset
A bitset without mask in a _SET request means we want exactly the bits in
the bitset to be set. This works correctly for compact format but when
verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
bits present in the request bitset but does not clear the rest. The commit
6699170376ab fixes this issue by clearing the whole target bitmap before we
start iterating. The solution proposed brought an issue with the behavior
of the mod variable. As the bitset is always cleared the old val will
always differ to the new val.

Fix it by adding a new temporary variable which save the state of the old
bitmap.

Fixes: 6699170376ab ("ethtool: fix application of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231009133645.44503-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10 19:48:15 -07:00
Jakub Kicinski
b52acd02c1 linux-can-fixes-for-6.6-20231009
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmUjqCwTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRC+UBxKKooE6HJUB/sGBLojDlbGAqMFwhCmZ6ZNLg3xQcrB
 SNgIxA87jsMfSCGX9vkhkaXfNLOgDE2zYe4i2QB4M1iMatVY4MSY2vtJbw8oL6dr
 X6zT9STwFPBVlH/CIqfCq9eQNhKrIQ65khmYg2DtFJCBuZniBrhfZLwVROUj3FXr
 FUIAMNjn9Xtj2R5JwtOtn5hvdzO8z3dCQMtzqFVm9pSm5LJVkTGaDe85t/mkLdS2
 stwlbGPVz+WElHueBDEjfbxiWnPgpEVSbuThTRxS0M5+a96uVHa4F+SFGgkSdYlI
 2MQUGiJ797qZTy2MvkGaqa/1/uqcmNOWNm8NqzLfg4LQMvnFW8/qAaV8
 =9CD6
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.6-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2023-10-09

Lukas Magel's patch for the CAN ISO-TP protocol fixes the TX state
detection and wait behavior.

John Watts contributes a patch to only show the sun4i_can Kconfig
option on ARCH_SUNXI.

A patch by Miquel Raynal fixes the soft-reset workaround for Renesas
SoCs in the sja1000 driver.

Markus Schneider-Pargmann's patch for the tcan4x5x m_can glue driver
fixes the id2 register for the tcan4553.

2 patches by Haibo Chen fix the flexcan stop mode for the imx93 SoC.

* tag 'linux-can-fixes-for-6.6-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: tcan4x5x: Fix id2_register for tcan4553
  can: flexcan: remove the auto stop mode for IMX93
  can: sja1000: Always restart the Tx queue after an overrun
  arm64: dts: imx93: add the Flex-CAN stop mode by GPR
  can: sun4i_can: Only show Kconfig if ARCH_SUNXI is set
  can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior
====================

Link: https://lore.kernel.org/r/20231009085256.693378-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10 19:46:00 -07:00
Eric Dumazet
31c07dffaf net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn()
Sili Luo reported a race in nfc_llcp_sock_get(), leading to UAF.

Getting a reference on the socket found in a lookup while
holding a lock should happen before releasing the lock.

nfc_llcp_sock_get_sn() has a similar problem.

Finally nfc_llcp_recv_snl() needs to make sure the socket
found by nfc_llcp_sock_from_sn() does not disappear.

Fixes: 8f50020ed9b8 ("NFC: LLCP late binding")
Reported-by: Sili Luo <rootlab@huawei.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231009123110.3735515-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10 19:44:44 -07:00
Jeremy Kerr
5093bbfc10 mctp: perform route lookups under a RCU read-side lock
Our current route lookups (mctp_route_lookup and mctp_route_lookup_null)
traverse the net's route list without the RCU read lock held. This means
the route lookup is subject to preemption, resulting in an potential
grace period expiry, and so an eventual kfree() while we still have the
route pointer.

Add the proper read-side critical section locks around the route
lookups, preventing premption and a possible parallel kfree.

The remaining net->mctp.routes accesses are already under a
rcu_read_lock, or protected by the RTNL for updates.

Based on an analysis from Sili Luo <rootlab@huawei.com>, where
introducing a delay in the route lookup could cause a UAF on
simultaneous sendmsg() and route deletion.

Reported-by: Sili Luo <rootlab@huawei.com>
Fixes: 889b7da23abf ("mctp: Add initial routing framework")
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/29c4b0e67dc1bf3571df3982de87df90cae9b631.1696837310.git.jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10 19:43:22 -07:00
Arnd Bergmann
1dab47139e appletalk: remove ipddp driver
After the cops driver is removed, ipddp is now the only
CONFIG_DEV_APPLETALK but as far as I can tell, this also has no users
and can be removed, making appletalk support purely based on ethertalk,
using ethernet hardware.

Link: https://lore.kernel.org/netdev/e490dd0c-a65d-4acf-89c6-c06cb48ec880@app.fastmail.com/
Link: https://lore.kernel.org/netdev/9cac4fbd-9557-b0b8-54fa-93f0290a6fb8@schmorgal.com/
Cc: Doug Brown <doug@schmorgal.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20231009141139.1766345-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10 17:49:00 -07:00
Florian Westphal
6ac9c51eeb netfilter: conntrack: prefer tcp_error_log to pr_debug
pr_debug doesn't provide any information other than that a packet
did not match existing state but also was found to not create a new
connection.

Replaces this with tcp_error_log, which will also dump packets'
content so one can see if this is a stray FIN or RST.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Florian Westphal
8a23f4ab92 netfilter: conntrack: simplify nf_conntrack_alter_reply
nf_conntrack_alter_reply doesn't do helper reassignment anymore.
Remove the comments that make this claim.

Furthermore, remove dead code from the function and place ot
in nf_conntrack.h.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Phil Sutter
99ab9f84b8 netfilter: nf_tables: Don't allocate nft_rule_dump_ctx
Since struct netlink_callback::args is not used by rule dumpers anymore,
use it to hold nft_rule_dump_ctx. Add a build-time check to make sure it
won't ever exceed the available space.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Phil Sutter
8194d599bc netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx
In order to move the context into struct netlink_callback's scratch
area, the latter must be unused first.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Phil Sutter
405c8fd62d netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx
This relieves the dump callback from having to check nlmsg_type upon
each call and instead performs the check once in .start callback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:27 +02:00
Phil Sutter
30fa41a0f6 netfilter: nf_tables: Drop pointless memset when dumping rules
None of the dump callbacks uses netlink_callback::args beyond the first
element, no need to zero the data.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:20 +02:00
Phil Sutter
afed2b54c5 netfilter: nf_tables: Always allocate nft_rule_dump_ctx
It will move into struct netlink_callback's scratch area later, just put
nf_tables_dump_rules_start in shape to reduce churn later.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:01:42 +02:00
Gerd Bayer
a72178cfe8 net/smc: Fix dependency of SMC on ISM
When the SMC protocol is built into the kernel proper while ISM is
configured to be built as module, linking the kernel fails due to
unresolved dependencies out of net/smc/smc_ism.o to
ism_get_smcd_ops, ism_register_client, and ism_unregister_client
as reported via the linux-next test automation (see link).
This however is a bug introduced a while ago.

Correct the dependency list in ISM's and SMC's Kconfig to reflect the
dependencies that are actually inverted. With this you cannot build a
kernel with CONFIG_SMC=y and CONFIG_ISM=m. Either ISM needs to be 'y',
too - or a 'n'. That way, SMC can still be configured on non-s390
architectures that do not have (nor need) an ISM driver.

Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/linux-next/d53b5b50-d894-4df8-8969-fd39e63440ae@infradead.org/
Co-developed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20231006125847.1517840-1-gbayer@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10 11:51:41 +02:00
David Morley
939463016b tcp: change data receiver flowlabel after one dup
This commit changes the data receiver repath behavior to occur after
receiving a single duplicate. This can help recover ACK connectivity
quicker if a TLP was sent along a nonworking path.

For instance, consider the case where we have an initially nonworking
forward path and reverse path and subsequently switch to only working
forward paths. Before this patch we would have the following behavior.

+---------+--------+--------+----------+----------+----------+
| Event   | For FL | Rev FL | FP Works | RP Works | Data Del |
+---------+--------+--------+----------+----------+----------+
| Initial | A      | 1      | N        | N        | 0        |
+---------+--------+--------+----------+----------+----------+
| TLP     | A      | 1      | N        | N        | 0        |
+---------+--------+--------+----------+----------+----------+
| RTO 1   | B      | 1      | Y        | N        | 1        |
+---------+--------+--------+----------+----------+----------+
| RTO 2   | C      | 1      | Y        | N        | 2        |
+---------+--------+--------+----------+----------+----------+
| RTO 3   | D      | 2      | Y        | Y        | 3        |
+---------+--------+--------+----------+----------+----------+

This patch gets rid of at least RTO 3, avoiding additional unnecessary
repaths of a working forward path to a (potentially) nonworking one.

In addition, this commit changes the behavior to avoid repathing upon
rx of duplicate data if the local endpoint is in CA_Loss (in which
case the RTOs will already be changing the outgoing flowlabel).

Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10 10:02:59 +02:00
David Morley
95b9a87c6a tcp: record last received ipv6 flowlabel
In order to better estimate whether a data packet has been
retransmitted or is the result of a TLP, we save the last received
ipv6 flowlabel.

To make space for this field we resize the "ato" field in
inet_connection_sock as the current value of TCP_DELACK_MAX can be
fully contained in 8 bits and add a compile_time_assert ensuring this
field is the required size.

v2: addressed kernel bot feedback about dccp_delack_timer()
v3: addressed build error introduced by commit bbf80d713fe7 ("tcp:
derive delack_max from rto_min")

Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10 10:02:59 +02:00
Eric Dumazet
26c29961b1 net: refine debug info in skb_checksum_help()
syzbot uses panic_on_warn.

This means that the skb_dump() I added in the blamed commit are
not even called.

Rewrite this so that we get the needed skb dump before syzbot crashes.

Fixes: eeee4b77dc52 ("net: add more debug info in skb_checksum_help()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231006173355.2254983-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-09 19:46:41 -07:00