1320358 Commits

Author SHA1 Message Date
Michael Chan
bda2e63a50 bnxt_en: Add a new ethtool -W dump flag
Add a new ethtool -W dump flag (2) to include driver coredump segments.
This patch adds the host backing store context memory pages used by the
chip and FW to store various states to the coredump.  The pages for
each context memory type is dumped into a separate coredump segment.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Shruti Parab
a854a17097 bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
Pass the component ID and segment ID to this function to create
the coredump segment header.  This will be needed in the next
patches to create more segments for the coredump.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Sreekanth Reddy
23a18b91b6 bnxt_en: Add functions to copy host context memory
Host context memory is used by the newer chips to store context
information for various L2 and RoCE states and FW logs.  This
information will be useful for debugging.  This patch adds the
functions to copy all pages of a context memory type to a contiguous
buffer.  The next patches will include the context memory dump
during ethtool -w coredump.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Co-developed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Hongguang Gao
de999362ad bnxt_en: Do not free FW log context memory
If FW supports appending new FW logs to an offset in the context
memory after FW reset, then do not free this type of context memory
during reset.  The driver will provide the initial offset to the FW
when configuring this type of context memory.  This way, we don't lose
the older FW logs after reset.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
84fcd9449f bnxt_en: Manage the FW trace context memory
The FW trace memory pages will be added to the ethtool -w coredump
in later patches.  In addition to the raw data, the driver has to
add a header to provide the head and tail information on each FW
trace log segment when creating the coredump.  The FW sends an async
message to the driver after DMAing a chunk of logs to the context
memory to indicate the last offset containing the tail of the logs.
The driver needs to keep track of that.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
24d694aec1 bnxt_en: Allocate backing store memory for FW trace logs
Allocate the new FW trace log backing store context memory types
if they are supported by the FW.  FW debug logs are DMA'ed to the host
backing store memory when the on-chip buffers are full.  If host
memory cannot be allocated for these memory types, the driver
will not abort.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Hongguang Gao
46010d43ab bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.

This patch always passes true for the 'force' parameter so there is
no change in behavior.  Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Hongguang Gao
968d2cc07c bnxt_en: Refactor bnxt_free_ctx_mem()
Add a new function bnxt_free_one_ctx_mem() to free one context
memory type.  bnxt_free_ctx_mem() now calls the new function in
the loop to free each context memory type.  There is no change in
behavior.  Later patches will further make use of the new function.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
0b350b4927 bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging.  If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-of-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Michael Chan
ff00bcc9ec bnxt_en: Update firmware interface spec to 1.10.3.85
The major change is the new firmware command to flush the FW debug
logs to the host backing store context memory buffers.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Stephen Boyd
4adb920188 Qualcomm clock updates for v6.13
Add global clock controllers for QCS8300, and IPQ5424.
 Add camera, display and video clock controllers for SA8775P.
 Add global, display, gpu, tcsr, and rpmh clock controllers for SAR2130P.
 Add global, camera, display, gpu, and video clock controllers for
 SM8475.
 
 Support for IPQ9574 is added to the Alpha PLL clock driver, and the
 checks for already configured PLL at boot are cleaned up.
 
 QCS404 GPLL3 initial rate is corrected.
 
 A new ops for shared rcg2 floor_ops is introduced, for dealing with
 shared SDCC clocks.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmc4ykgVHGFuZGVyc3Nv
 bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3Fu08QAM7Hgh75SqfeVVFZYz7parQmG29t
 xQtEtbNOVvcRjxiZK94/QnZwcEyCi9OJbikV7o7Fo+GBYI09dSCnoZ9FDyeJnXGg
 6beYrvna3wIENYbrKEpJW4tBBWC6WI5Rxc6GU6SHQIx1kAXKxzAlTyRnCM/UlBBD
 0qg7Pm+SYif+gSoNhNN1Sx4PJGGffNzZnFX1Ft13AN+t3scIZKPV7xxWFE/qUoI4
 SmixfvDdURPPsiG7P6MS9rDg81wnwgqB/iwYFtytCVBkLc6tYyWCKmRtpv4iXAUc
 U8YO3UWXPyvpgFlGEF16wZ4/WA2dtgfrunk/v0yyxmky5e5grBRJrqe7SZ4sDUUe
 a9cSTnlou3t0aK1LS0e7xW2HOMUxwd4SqlnijDFBPxSZZ4gK5Oq1Sx025gu7kIyR
 GX9bqULYGlvJgHjtpNjXX0IhVhx9sH4NWLJqr27wYwEGGbnx1JkoTEWZaRbpYB2d
 hhVQ4uO4ZRpTEf3p0+fN8poE8nH1sHmhi829ic3wGyFitIYp94KDNTxQ11lkcUxM
 BxXhoNTdh9E0cuWDn0Ittdlfvp7QZlRhxaUL0i0ocrSSkQjKIK2/KTnm+1sBJiJs
 7DAImseFUUNwLcV5RlVoAFT0nOUB2W+lZSIVRziDI/5FLbCvxjFYtUhHusmfpDnk
 nlE/Yh43Dw7ldnSc
 =ULNj
 -----END PGP SIGNATURE-----

Merge tag 'qcom-clk-for-6.13' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into clk-qcom

Pull Qualcomm clk driver updates from Bjorn Andersson:

 - Global clock controllers for Qualcomm QCS8300 and IPQ5424 SoCs
 - Camera, display and video clock controllers for Qualcomm SA8775P SoCs
 - Global, display, GPU, TCSR, and RPMh clock controllers for Qualcomm SAR2130P
 - Global, camera, display, GPU, and video clock controllers for
   Qualcomm SM8475 SoCs
 - Support for Qualcomm IPQ9574 in the Alpha PLL clock driver
 - Cleanup checks for already configured PLLs at boot in the Qualcomm
   Alpha PLL driver
 - Fix the initial rate for Qualcomm QCS404 GPLL3
 - Add shared rcg2 floor clk_ops for shared SDCC clks on Qualcomm SoCs

* tag 'qcom-clk-for-6.13' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (43 commits)
  clk: qcom: remove unused data from gcc-ipq5424.c
  clk: qcom: Add support for Global Clock Controller on QCS8300
  dt-bindings: clock: qcom: Add GCC clocks for QCS8300
  clk: qcom: add Global Clock controller (GCC) driver for IPQ5424 SoC
  clk: qcom: clk-alpha-pll: Add NSS HUAYRA ALPHA PLL support for ipq9574
  dt-bindings: clock: Add Qualcomm IPQ5424 GCC binding
  clk: qcom: add SAR2130P GPU Clock Controller support
  clk: qcom: dispcc-sm8550: enable support for SAR2130P
  clk: qcom: tcsrcc-sm8550: add SAR2130P support
  clk: qcom: add support for GCC on SAR2130P
  clk: qcom: rpmh: add support for SAR2130P
  clk: qcom: rcg2: add clk_rcg2_shared_floor_ops
  dt-bindings: clk: qcom,sm8450-gpucc: add SAR2130P compatibles
  dt-bindings: clock: qcom,sm8550-dispcc: Add SAR2130P compatible
  dt-bindings: clock: qcom,sm8550-tcsr: Add SAR2130P compatible
  dt-bindings: clock: qcom: document SAR2130P Global Clock Controller
  dt-bindings: clock: qcom,rpmhcc: Add SAR2130P compatible
  clk: qcom: Make GCC_6125 depend on QCOM_GDSC
  dt-bindings: clock: qcom: gcc-ipq9574: remove q6 bring up clock macros
  dt-bindings: clock: qcom: gcc-ipq5332: remove q6 bring up clock macros
  ...
2024-11-18 19:41:46 -08:00
Jakub Kicinski
66418447d2 Merge branch 'bpf-fix-recursive-lock-and-add-test'
Jiayuan Chen says:

====================
bpf: fix recursive lock and add test

1. fix recursive lock when ebpf prog return SK_PASS.
2. add selftest to reproduce recursive lock.

Note that the test code can reproduce the 'dead-lock' and if just
the selftest merged without first patch, the test case will
definitely fail, because the issue of deadlock is inevitable.

v1: https://lore.kernel.org/55fc6114-7e64-4b65-86d2-92cfd1e9e92f@linux.dev/
====================

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241118030910.36230-1-mrpre@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:40:01 -08:00
Jiayuan Chen
0c4d5cb9a1 selftests/bpf: Add some tests with sockmap SK_PASS
Add a new tests in sockmap_basic.c to test SK_PASS for sockmap

Signed-off-by: Jiayuan Chen <mrpre@163.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241118030910.36230-3-mrpre@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:39:59 -08:00
Jiayuan Chen
8ca2a1eead bpf: fix recursive lock when verdict program return SK_PASS
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.

'''
sk_psock_strp_data_ready
    write_lock_bh(&sk->sk_callback_lock)
    strp_data_ready
      strp_read_sock
        read_sock -> tcp_read_sock
          strp_recv
            cb.rcv_msg -> sk_psock_strp_read
              # now stream_verdict return SK_PASS without peer sock assign
              __SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
              sk_psock_verdict_apply
                sk_psock_skb_ingress_self
                  sk_psock_skb_ingress_enqueue
                    sk_psock_data_ready
                      read_lock_bh(&sk->sk_callback_lock) <= dead lock

'''

This topic has been discussed before, but it has not been fixed.
Previous discussion:
https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch

Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241118030910.36230-2-mrpre@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:39:59 -08:00
Jakub Kicinski
84ad482560 Merge branch 'wireguard-updates-and-fixes-for-6-13'
Jason A. Donenfeld says:

====================
wireguard updates and fixes for 6.13

This tiny series (+3/-2) fixes one bug and has three small improvements.

1) Fix running the netns.sh test suite on systems that haven't yet
   inserted the nf_conntrack module.

2) Remove a stray useless function call in a selftest.

3) There's no need to zero out the netdev private data in recent
   kernels.

4) Set the TSO max size to be GSO_MAX_SIZE, so that we aggregate larger
   packets. Daniel reports seeing a 15% improvement in a simple load and
   suggested the speedups would be even better in more complex loads.
====================

Link: https://patch.msgid.link/20241117212030.629159-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:33 -08:00
Daniel Borkmann
06a34f7db7 wireguard: device: support big tcp GSO
Advertise GSO_MAX_SIZE as TSO max size in order support BIG TCP for wireguard.
This helps to improve wireguard performance a bit when enabled as it allows
wireguard to aggregate larger skbs in wg_packet_consume_data_done() via
napi_gro_receive(), but also allows the stack to build larger skbs on xmit
where the driver then segments them before encryption inside wg_xmit().
We've seen a 15% improvement in TCP stream performance.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Hangbin Liu
0290abc986 wireguard: selftests: load nf_conntrack if not present
Some distros may not load nf_conntrack by default, which will cause
subsequent nf_conntrack sets to fail. Load this module if it is not
already loaded.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
[ Jason: add [[ -e ... ]] check so this works in the qemu harness. ]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-4-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Dheeraj Reddy Jonnalagadda
c1822fb64f wireguard: allowedips: remove redundant selftest call
This commit fixes a useless call issue detected by Coverity (CID
1508092). The call to horrible_allowedips_lookup_v4 is unnecessary as
its return value is never checked.

Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Tobias Klauser
2c862914fb wireguard: device: omit unnecessary memset of netdev private data
The memory for netdev_priv is allocated using kvzalloc in
alloc_netdev_mqs before rtnl_link_ops->setup is called so there is no
need to zero it again in wg_setup.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Kajol Jain
f26f9933e3 powerpc/perf: Add per-task/process monitoring to vpa_pmu driver
Enhance the vpa_pmu driver with a feature to observe context switch
latency event for both per-task (tid) and per-pid (pid) option.
Couple of new helper functions are added to hide the abstraction of
reading the context switch latency counter from kvm_vcpu_arch struct
and these helper functions are defined in the "kvm/book3s_hv.c".

"PERF_ATTACH_TASK" flag is used to decide whether to read the counter
values from lppaca or kvm_vcpu_arch struct.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241118114114.208964-4-kjain@linux.ibm.com
2024-11-19 14:11:30 +11:00
Kajol Jain
5f0b48c6a1 powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch
Commit e1f288d2f9c69 ("KVM: PPC: Book3S HV nestedv2: Add support
for reading VPA counters for pseries guests") introduced support for new
Virtual Process Area(VPA) based software counters. These counters are
useful when observing context switch latency of L1 <-> L2. It also
added access to counters in lppaca, which is good enough to understand
latency details per-cpu level. But to extend and aggregate
per-process level(qemu) or per-pid/tid level(vcpu), these
counters also needs to be added as part of kvm_vcpu_arch struct.
Additional code added to update these new kvm_vcpu_arch variables
in do_trace_nested_cs_time function.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241118114114.208964-3-kjain@linux.ibm.com
2024-11-19 14:11:21 +11:00
Kajol Jain
4ae0b32ece docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu
Details are added for the vpa_pmu event and format
attributes in the ABI documentation.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241118114114.208964-2-kjain@linux.ibm.com
2024-11-19 14:11:07 +11:00
Kajol Jain
176cda0619 powerpc/perf: Add perf interface to expose vpa counters
To support performance measurement for KVM on PowerVM(KoP)
feature, PowerVM hypervisor has added couple of new software
counters in Virtual Process Area(VPA) of the partition.

Commit e1f288d2f9c69 ("KVM: PPC: Book3S HV nestedv2: Add
support for reading VPA counters for pseries guests")
have updated the paca fields with corresponding changes.

Proposed perf interface is to expose these new software
counters for monitoring of context switch latencies and
runtime aggregate. Perf interface driver is called
"vpa_pmu" and it has dependency on KVM and perf, hence
added new config called "VPA_PMU" which depends on
"CONFIG_KVM_BOOK3S_64_HV" and "CONFIG_HV_PERF_CTRS".
Since, kvm and kvm_host are currently compiled as built-in
modules, this perf interface takes the same path and
registered as a module.

vpa_pmu perf interface needs access to some of the kvm
functions and structures like kvmhv_get_l2_counters_status(),
hence kvm_book3s_64.h and kvm_ppc.h are included.
Below are the events added to monitor KoP:

  vpa_pmu/l1_to_l2_lat/
  vpa_pmu/l2_to_l1_lat/
  vpa_pmu/l2_runtime_agg/

and vpa_pmu driver supports only per-cpu monitoring with this patch.
Example usage:

[command]# perf stat -e vpa_pmu/l1_to_l2_lat/ -a -I 1000
     1.001017682            727,200      vpa_pmu/l1_to_l2_lat/
     2.003540491          1,118,824      vpa_pmu/l1_to_l2_lat/
     3.005699458          1,919,726      vpa_pmu/l1_to_l2_lat/
     4.007827011          2,364,630      vpa_pmu/l1_to_l2_lat/

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241118114114.208964-1-kjain@linux.ibm.com
2024-11-19 14:11:06 +11:00
Jakub Kicinski
21742be898 Merge branch 'netpoll-use-rcu-primitives-for-npinfo-pointer-access'
Breno Leitao says:

====================
netpoll: Use RCU primitives for npinfo pointer access

The net_device->npinfo pointer is marked with __rcu, indicating it requires
proper RCU access primitives:

  struct net_device {
	...
	struct netpoll_info __rcu *npinfo;
	...
  };

Direct access to this pointer can lead to issues such as:
- Compiler incorrectly caching/reusing stale pointer values
- Missing memory ordering guarantees
- Non-atomic pointer loads

Replace direct NULL checks of npinfo with rcu_access_pointer(),
which provides the necessary memory ordering guarantees without the
overhead of a full RCU dereference, since we only need to verify
if the pointer is NULL.

In both cases, the RCU read lock is not held when the function is being
called. I checked that by using lockdep_assert_in_rcu_read_lock(), and
seeing the warning on both cases.
====================

Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-0-a1888dcb4a02@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:01:47 -08:00
Breno Leitao
a57d5a72f8 netpoll: Use rcu_access_pointer() in netpoll_poll_lock
The ndev->npinfo pointer in netpoll_poll_lock() is RCU-protected but is
being accessed directly for a NULL check. While no RCU read lock is held
in this context, we should still use proper RCU primitives for
consistency and correctness.

Replace the direct NULL check with rcu_access_pointer(), which is the
appropriate primitive when only checking for NULL without dereferencing
the pointer. This function provides the necessary ordering guarantees
without requiring RCU read-side protection.

Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-2-a1888dcb4a02@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:01:39 -08:00
Breno Leitao
c69c5e10ad netpoll: Use rcu_access_pointer() in __netpoll_setup
The ndev->npinfo pointer in __netpoll_setup() is RCU-protected but is being
accessed directly for a NULL check. While no RCU read lock is held in this
context, we should still use proper RCU primitives for consistency and
correctness.

Replace the direct NULL check with rcu_access_pointer(), which is the
appropriate primitive when only checking for NULL without dereferencing
the pointer. This function provides the necessary ordering guarantees
without requiring RCU read-side protection.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-1-a1888dcb4a02@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:01:36 -08:00
Menglong Dong
85c7975acd net: ip: fix unexpected return in fib_validate_source()
The errno should be replaced with drop reasons in fib_validate_source(),
and the "-EINVAL" shouldn't be returned. And this causes a warning, which
is reported by syzkaller:

netlink: 'syz-executor371': attribute type 4 has an invalid length.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Modules linked in:
CPU: 0 UID: 0 PID: 5842 Comm: syz-executor371 Not tainted 6.12.0-rc6-syzkaller-01362-ga58f00ed24b8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
RIP: 0010:sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Code: 00 00 00 fc ff df 41 8d 9e 00 00 fc ff bf 01 00 fc ff 89 de e8 ea 9f 08 f8 81 fb 00 00 fc ff 77 3a 4c 89 e5 e8 9a 9b 08 f8 90 <0f> 0b 90 eb 5e bf 01 00 00 00 89 ee e8 c8 9f 08 f8 85 ed 0f 8e 49
RSP: 0018:ffffc90003d57078 EFLAGS: 00010293
RAX: ffffffff898c3ec6 RBX: 00000000fffbffea RCX: ffff8880347a5a00
RDX: 0000000000000000 RSI: 00000000fffbffea RDI: 00000000fffc0001
RBP: dffffc0000000000 R08: ffffffff898c3eb6 R09: 1ffff110023eb7d4
R10: dffffc0000000000 R11: ffffed10023eb7d5 R12: dffffc0000000000
R13: ffff888011f5bdc0 R14: 00000000ffffffea R15: 0000000000000000
FS:  000055557d41e380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056519d31d608 CR3: 000000007854e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 kfree_skb_reason include/linux/skbuff.h:1263 [inline]
 ip_rcv_finish_core+0xfde/0x1b50 net/ipv4/ip_input.c:424
 ip_list_rcv_finish net/ipv4/ip_input.c:610 [inline]
 ip_sublist_rcv+0x3b1/0xab0 net/ipv4/ip_input.c:636
 ip_list_rcv+0x42b/0x480 net/ipv4/ip_input.c:670
 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
 __netif_receive_skb_list_core+0x94e/0x980 net/core/dev.c:5762
 __netif_receive_skb_list net/core/dev.c:5814 [inline]
 netif_receive_skb_list_internal+0xa51/0xe30 net/core/dev.c:5905
 netif_receive_skb_list+0x55/0x4b0 net/core/dev.c:5957
 xdp_recv_frames net/bpf/test_run.c:280 [inline]
 xdp_test_run_batch net/bpf/test_run.c:361 [inline]
 bpf_test_run_xdp_live+0x1b5e/0x21b0 net/bpf/test_run.c:390
 bpf_prog_test_run_xdp+0x805/0x11e0 net/bpf/test_run.c:1318
 bpf_prog_test_run+0x2e4/0x360 kernel/bpf/syscall.c:4266
 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5671
 __do_sys_bpf kernel/bpf/syscall.c:5760 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5758 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5758
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f18af25a8e9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffee4090af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18af25a8e9
RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fix it by returning "-SKB_DROP_REASON_IP_LOCAL_SOURCE" instead of
"-EINVAL" in fib_validate_source().

Reported-by: syzbot+52fbd90f020788ec7709@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6738e539.050a0220.e1c64.0002.GAE@google.com/
Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:57:00 -08:00
Dr. David Alan Gilbert
78a36139fc net/fungible: Remove unused fun_create_queue
fun_create_queue was added in 2022 by
commit e1ffcc66818f ("net/fungible: Add service module for Fungible
drivers")
but hasn't been used.

Remove it.

Also remove the static helper functions it was the only user of.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:54:11 -08:00
Jakub Kicinski
a537cfdaa7 Merge branch 'uapi-ethtool-avoid-flex-array-in-struct-ethtool_link_settings'
Kees Cook says:

====================
UAPI: ethtool: Avoid flex-array in struct ethtool_link_settings

This reverts the tagged struct group in struct ethtool_link_settings and
instead just removes the flexible array member from Linux's view as it
is entirely unused.
====================

Link: https://patch.msgid.link/20241115204115.work.686-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:15 -08:00
Kees Cook
96c677fca5 UAPI: ethtool: Avoid flex-array in struct ethtool_link_settings
struct ethtool_link_settings tends to be used as a header for other
structures that have trailing bytes[1], but has a trailing flexible array
itself. Using this overlapped with other structures leads to ambiguous
object sizing in the compiler, so we want to avoid such situations (which
have caused real bugs in the past). Detecting this can be done with
-Wflex-array-member-not-at-end, which will need to be enabled globally.

Using a tagged struct_group() to create a new ethtool_link_settings_hdr
structure isn't possible as it seems we cannot use the tagged variant of
struct_group() due to syntax issues from C++'s perspective (even within
"extern C")[2]. Instead, we can just leave the offending member defined
in UAPI and remove it from the kernel's view of the structure, as Linux
doesn't actually use this member at all. There is also no change in
size since it was already a flexible array that didn't contribute to
size returned by any use of sizeof().

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/lkml/20241109100213.262a2fa0@kernel.org/ [2]
Link: https://lore.kernel.org/lkml/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org/ [1]
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Kees Cook
ebda123fe7 Revert "UAPI: ethtool: Use __struct_group() in struct ethtool_link_settings"
This reverts commit 43d3487035e9a86fad952de4240a518614240d43. We cannot
use tagged struct groups in UAPI because C++ will throw syntax errors
even under "extern C".

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Kees Cook
1cfb5e5788 Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings"
This reverts commit 3bd9b9abdf1563a22041b7255baea6d449902f1a. We cannot
use the new tagged struct group because it throws C++ errors even under
"extern C".

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Jakub Kicinski
920efe3e13 selftests: net: add more info to error in bpf_offload
bpf_offload caught a spurious warning in TC recently, but the error
message did not provide enough information to know what the problem
is:

  FAIL: Found 'netdevsim' in command output, leaky extack?

Add the extack to the output:

  FAIL: Unexpected command output, leaky extack? ('netdevsim', 'Warning: Filter with specified priority/protocol not found.')

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:51:41 -08:00
Jakub Kicinski
4262bacb74 MAINTAINERS: exclude can core, drivers and DT bindings from netdev ML
CAN networking and drivers are maintained by Marc, Oliver and Vincent.
Marc sends us already pull requests with reviewed and validated code.
Exclude the CAN patch postings from the netdev@ mailing list to lower
the patch volume there.

Link: https://lore.kernel.org/20241113193709.395c18b0@kernel.org
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://patch.msgid.link/20241115195609.981049-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:51:04 -08:00
Gerd Bayer
16a04d043b net/smc: Run patches also by RDMA ML
Commits for the SMC protocol usually get carried through the netdev
mailing list. Some portions use InfiniBand verbs that are discussed on
the RDMA mailing list. So run patches by that list too to increase the
likelihood that all interested parties can see them.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:39 -08:00
Jakub Kicinski
5b7cfe0633 Merge branch 'mptcp-pm-lockless-list-traversal-and-cleanup'
Matthieu Baerts says:

====================
mptcp: pm: lockless list traversal and cleanup

Here are two patches improving the MPTCP in-kernel path-manager.

- Patch 1: the get and dump endpoints operations are iterating over the
  endpoints list in a lockless way.

- Patch 2: reduce the code duplication to lookup an endpoint.
====================

Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-0-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:15 -08:00
Geliang Tang
1d7fa6ceb9 mptcp: pm: avoid code duplication to lookup endp
The helper __lookup_addr() can be used in mptcp_pm_nl_get_local_id()
and mptcp_pm_nl_is_backup() to simplify the code, and avoid code
duplication.

Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-2-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:13 -08:00
Matthieu Baerts (NGI0)
3fbb27b7f8 mptcp: pm: lockless list traversal to dump endp
To return an endpoint to the userspace via Netlink, and to dump all of
them, the endpoint list was iterated while holding the pernet->lock, but
only to read the content of the list.

In these cases, the spin locks can be replaced by RCU read ones, and use
the _rcu variants to iterate over the entries list in a lockless way.

Note that the __lookup_addr_by_id() helper has been modified to use the
_rcu variants of list_for_each_entry(), but with an extra conditions, so
it can be called either while the RCU read lock is held, or when the
associated pernet->lock is held.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-1-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:13 -08:00
Vitalii Mordan
cc84d89ad8 stmmac: dwmac-intel-plat: remove redundant dwmac->data check in probe
The driver’s compatibility with devices is confirmed earlier in
platform_match(). Since reaching probe means the device is valid,
the extra check can be removed to simplify the code.

Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:49:53 -08:00
Jiawen Wu
2160428bcb net: txgbe: fix null pointer to pcs
For 1000BASE-X or SGMII interface mode, the PCS also need to be selected.
Only return null pointer when there is a copper NIC with external PHY.

Fixes: 02b2a6f91b90 ("net: txgbe: support copper NIC with external PHY")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20241115073508.1130046-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:45:18 -08:00
Jiawen Wu
e867ed3ac8 net: txgbe: remove GPIO interrupt controller
Since the GPIO interrupt controller is always not working properly, we need
to constantly add workaround to cope with hardware deficiencies. So just
remove GPIO interrupt controller, and let the SFP driver poll the GPIO
status.

Fixes: b4a2496c17ed ("net: txgbe: fix GPIO interrupt blocking")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20241115071527.1129458-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:44:31 -08:00
Jakub Kicinski
4be4a91d53 Merge branch 'eth-fbnic-cleanup-and-add-a-few-stats'
Jakub Kicinski says:

====================
eth: fbnic: cleanup and add a few stats

Cleanup trival problems with fbnic and add the PCIe and RPC (Rx parser)
stats.

All stats are read under rtnl_lock for now, so the code is pretty
trivial. We'll need to add more locking when we start gathering
drops used by .ndo_get_stats64.
====================

Link: https://patch.msgid.link/20241115015344.757567-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:45 -08:00
Sanman Pradhan
79da2aaa08 eth: fbnic: add RPC hardware statistics
Report Rx parser statistics via ethtool -S.

The parser stats are 32b, so we need to add refresh to the service
task to make sure we don't miss overflows.

Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Sanman Pradhan
25ba596d13 eth: fbnic: add PCIe hardware statistics
Add PCIe hardware statistics support to the fbnic driver. These stats
provide insight into PCIe transaction performance and error conditions.

Which includes, read/write and completion TLP counts and DWORD counts and
debug counters for tag, completion credit and NP credit exhaustion

The stats are exposed via debugfs and can be used to monitor PCIe
performance and debug PCIe issues.

Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
08606cb528 eth: fbnic: add basic debugfs structure
Add the usual debugfs structure:

 fbnic/
   $pci-id/
     device-fileA
     device-fileB

This patch only adds the directories, subsequent changes
will add files.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241115015344.757567-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
2a0d6c1705 eth: fbnic: add missing header guards
While adding the SPDX headers I noticed we're also missing
a header guard.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241115015344.757567-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
e1a897ef4e eth: fbnic: add missing SPDX headers
Paolo noticed that we are missing SPDX headers, add them.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241115015344.757567-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
62e9c00ea8 eth: fbnic: don't disable the PCI device twice
We use pcim_enable_device(), there is no need to call pci_disable_device().

Fixes: 546dd90be979 ("eth: fbnic: Add scaffolding for Meta's NIC driver")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241115014809.754860-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:11 -08:00
Jakub Kicinski
357c52ff86 selftests: net: netlink-dumps: validation checks
The sanity checks are going to get silently cast to unsigned
and always pass. Cast the sizeof to signed size.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241115003248.733862-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:42:44 -08:00
Jakub Kicinski
0de6a472c3 net/neighbor: clear error in case strict check is not set
Commit 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict
data checking") added strict checking. The err variable is not cleared,
so if we find no table to dump we will return the validation error even
if user did not want strict checking.

I think the only way to hit this is to send an buggy request, and ask
for a table which doesn't exist, so there's no point treating this
as a real fix. I only noticed it because a syzbot repro depended on it
to trigger another bug.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241115003221.733593-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:42:21 -08:00