Commit Graph

129151 Commits

Author SHA1 Message Date
Linus Torvalds
65ae975e97 Including fixes from bluetooth.
Current release - regressions:
 
   - rtnetlink: fix rtnl_dump_ifinfo() error path
 
   - bluetooth: remove the redundant sco_conn_put
 
 Previous releases - regressions:
 
   - netlink: fix false positive warning in extack during dumps
 
   - sched: sch_fq: don't follow the fast path if Tx is behind now
 
   - ipv6: delete temporary address if mngtmpaddr is removed or unmanaged
 
   - tcp: fix use-after-free of nreq in reqsk_timer_handler().
 
   - bluetooth: fix slab-use-after-free Read in set_powered_sync
 
   - l2tp: fix warning in l2tp_exit_net found
 
   - eth: bnxt_en: fix receive ring space parameters when XDP is active
 
   - eth: lan78xx: fix double free issue with interrupt buffer allocation
 
   - eth: tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets
 
 Previous releases - always broken:
 
   - ipmr: fix tables suspicious RCU usage
 
   - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
 
   - eth: octeontx2-af: fix low network performance
 
   - eth: stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
 
   - eth: rtase: correct the speed for RTL907XD-V1
 
 Misc:
 
   - some documentation fixup
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmdIolwSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOk/fEP/01Nuobq5teEiJgfV25xMqKT8EtvtrTk
 QatoPMD4UrpxbTBlA6wc23wBewBCVHG6IKVTVH00mUsWbZv561PNnXexD5yTLlor
 p4XSyaUwXeUzD+9LsxlTJGyp2gKGrir6NY6R/pYaJJ7pjxuRQKOl+qXf7s7IjIye
 Fnh8LAxIhr/LdBCJBV4tajS5VfCB6svT+uFCflbOw0Ng/quGfKchTHGTBxyHr3Ef
 mw0XsFew+6hDt72l9u0BNUewsSNfcfxSR343Z/DCaS03ZRQxhsB9I2v0WfgteO+U
 3xdRG1WvphfYsN/C/zJ19OThAmbKE+u4gz8Z07yebpgFN5jbe5Rcf7IVcXiexd0Y
 2fivK7DFU06TLukqBkUqqwPzAgh1w/KA+ia119WteYKxxTchu9td7+L4pr9qU4Tg
 Nipq0MYaj0cEebf+DdlG+2UFjMzaTiN/Ph1Cdh15bqMaVhn/eOk+L959y/XUlBm0
 vpNL2SaFg8ki1N3SyTCFvmS3w8P+jM/KaA3fQv8hfG9Ceab5NKEoUff1VdjDBh9X
 sS7I15rg8s0CV1DWDJn6Mvex30e2+/yesjJbD/D9HDcb1y2vmbwz9t5L3yFpoNbc
 +qxRawoxj+Vi/4DZNnZKHvTkc0+hOm4f+BtUGiGBfBnIIrqvYh3DnQTc5res6l0e
 ZdG0B4yEZedj
 =7dW1
 -----END PGP SIGNATURE-----

Merge tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth.

  Current release - regressions:

   - rtnetlink: fix rtnl_dump_ifinfo() error path

   - bluetooth: remove the redundant sco_conn_put

  Previous releases - regressions:

   - netlink: fix false positive warning in extack during dumps

   - sched: sch_fq: don't follow the fast path if Tx is behind now

   - ipv6: delete temporary address if mngtmpaddr is removed or
     unmanaged

   - tcp: fix use-after-free of nreq in reqsk_timer_handler().

   - bluetooth: fix slab-use-after-free Read in set_powered_sync

   - l2tp: fix warning in l2tp_exit_net found

   - eth:
       - bnxt_en: fix receive ring space parameters when XDP is active
       - lan78xx: fix double free issue with interrupt buffer allocation
       - tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets

  Previous releases - always broken:

   - ipmr: fix tables suspicious RCU usage

   - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()

   - eth:
       - octeontx2-af: fix low network performance
       - stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
       - rtase: correct the speed for RTL907XD-V1

  Misc:

   - some documentation fixup"

* tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  ipmr: fix build with clang and DEBUG_NET disabled.
  Documentation: tls_offload: fix typos and grammar
  Fix spelling mistake
  ipmr: fix tables suspicious RCU usage
  ip6mr: fix tables suspicious RCU usage
  ipmr: add debug check for mr table cleanup
  selftests: rds: move test.py to TEST_FILES
  net_sched: sch_fq: don't follow the fast path if Tx is behind now
  tcp: Fix use-after-free of nreq in reqsk_timer_handler().
  net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
  net: Comment copy_from_sockptr() explaining its behaviour
  rxrpc: Improve setsockopt() handling of malformed user input
  llc: Improve setsockopt() handling of malformed user input
  Bluetooth: SCO: remove the redundant sco_conn_put
  Bluetooth: MGMT: Fix possible deadlocks
  Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
  bnxt_en: Unregister PTP during PCI shutdown and suspend
  bnxt_en: Refactor bnxt_ptp_init()
  bnxt_en: Fix receive ring space parameters when XDP is active
  bnxt_en: Fix queue start to update vnic RSS table
  ...
2024-11-28 10:15:20 -08:00
Russell King (Oracle)
e2668c34b7 net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
When phy_ethtool_set_eee_noneg() detects a change in the LPI
parameters, it attempts to update phylib state and trigger the link
to cycle so the MAC sees the updated parameters.

However, in doing so, it sets phydev->enable_tx_lpi depending on
whether the EEE configuration allows the MAC to generate LPI without
taking into account the result of negotiation.

This can be demonstrated with a 1000base-T FD interface by:

 # ethtool --set-eee eno0 advertise 8   # cause EEE to be not negotiated
 # ethtool --set-eee eno0 tx-lpi off
 # ethtool --set-eee eno0 tx-lpi on

This results in being true, despite EEE not having been negotiated and:
 # ethtool --show-eee eno0
	EEE status: enabled - inactive
	Tx LPI: 250 (us)
	Supported EEE link modes:  100baseT/Full
	                           1000baseT/Full
	Advertised EEE link modes:  100baseT/Full
	                                         1000baseT/Full

Fix this by keeping track of whether EEE was negotiated via a new
eee_active member in struct phy_device, and include this state in
the decision whether phydev->enable_tx_lpi should be set.

Fixes: 3e43b903da ("net: phy: Immediately call adjust_link if only tx_lpi_enabled changes")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tErSe-005RhB-2R@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-28 09:26:13 +01:00
Linus Torvalds
1746db26f8 pci-v6.13-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmdE14wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxMPRAAslaEhHZ06cU/I+BA0UrMJBbzOw+/
 XM2XUojxWaNMYSBPVXbtSBrfFMnox4G3hFBPK0T0HiWoc7wGx/TUVJk65ioqM8ug
 gS/U3NjSlqlnH8NHxKrb/2t0tlMvSll9WwumOD9pMFeMGFOS3fAgUk+fBqXFYsI/
 RsVRMavW9BucZ0yMHpgr0KGLPSt3HK/E1h0NLO+TN6dpFcoIq3XimKFyk1QQQgiR
 V3W21JMwjw+lDnUAsijU+RBYi5Fj6Rpqig/biRnzagVE6PJOci3ZJEBE7dGqm4LM
 UlgG6Ql/eK+bb3fPhcXxVmscj5XlEfbesX5PUzTmuj79Wq5l9hpy+0c654G79y8b
 rGiEVGM0NxmRdbuhWQUM2EsffqFlkFu7MN3gH0tP0Z0t3VTXfBcGrQJfqCcSCZG3
 5IwGdEE2kmGb5c3RApZrm+HCXdxhb3Nwc3P8c27eXDT4eqHWDJag4hzLETNBdIrn
 Rsbgry6zzAVA6lLT0uasUlWerq/I6OrueJvnEKRGKDtbw/JL6PLveR1Rvsc//cQD
 Tu4FcG81bldQTUOdHEgFyJgmSu77Gvfs5RZBV0cEtcCBc33uGJne08kOdGD4BwWJ
 dqN3wJFh5yX4jlMGmBDw0KmFIwKstfUCIoDE4Kjtal02CURhz5ZCDVGNPnSUKN0C
 hflVX0//cRkHc5g=
 =2Otz
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Make pci_stop_dev() and pci_destroy_dev() safe so concurrent
     callers can't stop a device multiple times, even as we migrate from
     the global pci_rescan_remove_lock to finer-grained locking (Keith
     Busch)

   - Improve pci_walk_bus() implementation by making it recursive and
     moving locking up to avoid need for a 'locked' parameter (Keith
     Busch)

   - Unexport pci_walk_bus_locked(), which is only used internally by
     the PCI core (Keith Busch)

   - Detect some Thunderbolt chips that are built-in and hence
     'trustworthy' by a heuristic since the 'ExternalFacingPort' and
     'usb4-host-interface' ACPI properties are not quite enough (Esther
     Shimanovich)

  Resource management:

   - Use PCI bus addresses (not CPU addresses) in 'ranges' properties
     when building dynamic DT nodes so systems where PCI and CPU
     addresses differ work correctly (Andrea della Porta)

   - Tidy resource sizing and assignment with helpers to reduce
     redundancy (Ilpo Järvinen)

   - Improve pdev_sort_resources() 'bogus alignment' warning to be more
     specific (Ilpo Järvinen)

  Driver binding:

   - Convert driver .remove_new() callbacks to .remove() again to finish
     the conversion from returning 'int' to being 'void' (Sergio
     Paracuellos)

   - Export pcim_request_all_regions(), a managed interface to request
     all BARs (Philipp Stanner)

   - Replace pcim_iomap_regions_request_all() with
     pcim_request_all_regions(), and pcim_iomap_table()[n] with
     pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto
     octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212
     (Philipp Stanner)

   - Remove the now unused pcim_iomap_regions_request_all() (Philipp
     Stanner)

   - Export pcim_iounmap_region(), a managed interface to unmap and
     release a PCI BAR (Philipp Stanner)

   - Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and
     pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the
     following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield,
     cavium (Philipp Stanner)

  Error handling:

   - Add sysfs 'reset_subordinate' to reset the entire hierarchy below a
     bridge; previously Secondary Bus Reset could only be used when
     there was a single device below a bridge (Keith Busch)

   - Warn if we reset a running device where the driver didn't register
     pci_error_handlers notification callbacks (Keith Busch)

  ASPM:

   - Disable ASPM L1 before touching L1 PM Substates to follow the spec
     closer and avoid a CPU load timeout on some platforms (Ajay
     Agarwal)

   - Set devices below Intel VMD to D0 before enabling ASPM L1 Substates
     as required per spec for all L1 Substates changes (Jian-Hong Pan)

  Power management:

   - Enable starfive controller runtime PM before probing host bridge
     (Mayank Rana)

   - Enable runtime power management for host bridges (Krishna chaitanya
     chundru)

  Power control:

   - Use of_platform_device_create() instead of of_platform_populate()
     to create pwrctl platform devices so we can control it based on the
     child nodes (Manivannan Sadhasivam)

   - Create pwrctrl platform devices only if there's a relevant power
     supply property (Manivannan Sadhasivam)

   - Add device link from the pwrctl supplier to the PCI dev to ensure
     pwrctl drivers are probed before the PCI dev driver; this avoids a
     race where pwrctl could change device power state while the PCI
     driver was active (Manivannan Sadhasivam)

   - Find pwrctl device for removal with of_find_device_by_node()
     instead of searching all children of the parent (Manivannan
     Sadhasivam)

   - Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller
     ('bwctrl') and hotplug files (Bjorn Helgaas)

  Bandwidth control:

   - Add read/modify/write locking for Link Control 2, which is used to
     manage Link speed (Ilpo Järvinen)

   - Extract Link Bandwidth Management Status check into
     pcie_lbms_seen(), where it can be shared between the bandwidth
     controller and quirks that use it to help retrain failed links
     (Ilpo Järvinen)

   - Re-add Link Bandwidth notification support with updates to address
     the reasons it was previously reverted (Alexandru Gagniuc, Ilpo
     Järvinen)

   - Add pcie_set_target_speed() and related functionality so drivers
     can manage PCIe Link speed based on thermal or other constraints
     (Ilpo Järvinen)

   - Add a thermal cooling driver to throttle PCIe Links via the
     existing thermal management framework (Ilpo Järvinen)

   - Add a userspace selftest for the PCIe bandwidth controller (Ilpo
     Järvinen)

  PCI device hotplug:

   - Add hotplug controller driver for Marvell OCTEON multi-function
     device where function 0 has a management console interface to
     enable/disable and provision various personalities for the other
     functions (Shijith Thotton)

   - Retain a reference to the pci_bus for the lifetime of a pci_slot to
     avoid a use-after-free when the thunderbolt driver resets USB4 host
     routers on boot, causing hotplug remove/add of downstream docks or
     other devices (Lukas Wunner)

   - Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test
     (Guilherme Giacomo Simoes)

   - Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET)

   - Use pci_bus_read_dev_vendor_id() instead of hand-coded presence
     detection in cpqphp (Ilpo Järvinen)

   - Simplify cpqphp enumeration, which is already simple-minded and
     doesn't handle devices below hot-added bridges (Ilpo Järvinen)

  Virtualization:

   - Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS
     capability but do isolate functions as though PCI_ACS_RR and
     PCI_ACS_CR were set, so the functions can be in independent IOMMU
     groups (Mengyuan Lou)

  TLP Processing Hints (TPH):

   - Add and document TLP Processing Hints (TPH) support so drivers can
     enable and disable TPH and the kernel can save/restore TPH
     configuration (Wei Huang)

   - Add TPH Steering Tag support so drivers can retrieve Steering Tag
     values associated with specific CPUs via an ACPI _DSM to improve
     performance by directing DMA writes closer to their consumers (Wei
     Huang)

  Data Object Exchange (DOE):

   - Wait up to 1 second for DOE Busy bit to clear before writing a
     request to the mailbox to avoid failures if the mailbox is still
     busy from a previous transfer (Gregory Price)

  Endpoint framework:

   - Skip attempts to allocate from endpoint controller memory window if
     the requested size is larger than the window (Damien Le Moal)

   - Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to
     handle controller-specific size and alignment constraints, and add
     test cases to the endpoint test driver (Damien Le Moal)

   - Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can
     observe DWC-specific alignment requirements (Damien Le Moal)

   - Synchronously cancel command handler work in endpoint test before
     cleaning up DMA and BARs (Damien Le Moal)

   - Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas
     Cassel)

   - Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and
     dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent
     (Niklas Cassel)

   - Avoid NULL dereference if Modem Host Interface Endpoint lacks
     'mmio' DT property (Zhongqiu Han)

   - Release PCI domain ID of Endpoint controller parent (not controller
     itself) and before unregistering the controller, to avoid
     use-after-free (Zijun Hu)

   - Clear secondary (not primary) EPC in pci_epc_remove_epf() when
     removing the secondary controller associated with an NTB (Zijun Hu)

  Cadence PCIe controller driver:

   - Lower severity of 'phy-names' message (Bartosz Wawrzyniak)

  Freescale i.MX6 PCIe controller driver:

   - Fix suspend/resume support on i.MX6QDL, which has a hardware
     erratum that prevents use of L2 (Stefan Eichenberger)

  Intel VMD host bridge driver:

   - Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel)

  MediaTek PCIe Gen3 controller driver:

   - Update mediatek-gen3 DT binding to require the exact number of
     clocks for each SoC (Fei Shao)

   - Add support for DT 'max-link-speed' and 'num-lanes' properties to
     restrict the link speed and width (AngeloGioacchino Del Regno)

  Microchip PolarFlare PCIe controller driver:

   - Add DT and driver support for using either of the two PolarFire
     Root Ports (Conor Dooley)

  NVIDIA Tegra194 PCIe controller driver:

   - Move endpoint controller cleanups that depend on refclk from the
     host to the notifier that tells us the host has deasserted PERST#,
     when refclk should be valid (Manivannan Sadhasivam)

  Qualcomm PCIe controller driver:

   - Add qcom SAR2130P DT binding with an additional clock (Dmitry
     Baryshkov)

   - Enable MSI interrupts if 'global' IRQ is supported, since a
     previous commit unintentionally masked them (Manivannan Sadhasivam)

   - Move endpoint controller cleanups that depend on refclk from the
     host to the notifier that tells us the host has deasserted PERST#,
     when refclk should be valid (Manivannan Sadhasivam)

   - Add DT binding and driver support for IPQ9574, with Synopsys IP
     v5.80a and Qcom IP 1.27.0 (devi priya)

   - Move the OPP "operating-points-v2" table from the
     qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it
     can be used by other Qcom platforms (Qiang Yu)

   - Add 'global' SPI interrupt for events like link-up, link-down to
     qcom,pcie-x1e80100 DT binding so we can start enumeration when the
     link comes up (Qiang Yu)

   - Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned
     to support this (Qiang Yu)

   - Add ops_1_21_0 for SC8280X family SoC, which doesn't use the
     'iommu-map' DT property and doesn't need BDF-to-SID translation
     (Qiang Yu)

  Rockchip PCIe controller driver:

   - Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint
     .align value (Damien Le Moal)

   - When unmapping an endpoint window, compute the region index instead
     of searching for it, and verify that the address was mapped (Damien
     Le Moal)

   - When mapping an endpoint window, verify that the address hasn't
     been mapped already (Damien Le Moal)

   - Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal)

   - Fix MSI IRQ data mapping to observe the alignment constraint, which
     fixes intermittent page faults in memcpy_toio() and memcpy_fromio()
     (Damien Le Moal)

   - Rename rockchip_pcie_parse_ep_dt() to
     rockchip_pcie_ep_get_resources() for consistency with similar DT
     interfaces (Damien Le Moal)

   - Skip the unnecessary link train in rockchip_pcie_ep_probe() and do
     it only in the endpoint start operation (Damien Le Moal)

   - Implement pci_epc_ops.stop_link() to disable link training and
     controller configuration (Damien Le Moal)

   - Attempt link training at 5 GT/s when both partners support it
     (Damien Le Moal)

   - Add a handler for PERST# signal so we can detect host-initiated
     resets and start link training after PERST# is deasserted (Damien
     Le Moal)

  Synopsys DesignWare PCIe controller driver:

   - Clear outbound address on unmap so dw_pcie_find_index() won't match
     an ATU index that was already unmapped (Damien Le Moal)

   - Use of_property_present() instead of of_property_read_bool() when
     testing for presence of non-boolean DT properties (Rob Herring)

   - Advertise 1MB size if endpoint supports Resizable BARs, which was
     inadvertently lost in v6.11 (Niklas Cassel)

  TI J721E PCIe driver:

   - Add PCIe support for J722S SoC (Siddharth Vadapalli)

   - Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100
     us), before deasserting PERST# to ensure power and refclk are
     stable (Siddharth Vadapalli)

  TI Keystone PCIe controller driver:

   - Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root
     Complex mode (Kishon Vijay Abraham I)

   - Try to avoid unrecoverable SError for attempts to issue config
     transactions when the link is down; this is racy but the best we
     can do (Kishon Vijay Abraham I)

  Miscellaneous:

   - Reorganize kerneldoc parameter names to match order in function
     signature (Julia Lawall)

   - Fix sysfs reset_method_store() memory leak (Todd Kjos)

   - Simplify pci_create_slot() (Ilpo Järvinen)

   - Fix incorrect printf format specifiers in pcitest (Luo Yifan)"

* tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (127 commits)
  PCI: rockchip-ep: Handle PERST# signal in EP mode
  PCI: rockchip-ep: Improve link training
  PCI: rockship-ep: Implement the pci_epc_ops::stop_link() operation
  PCI: rockchip-ep: Refactor endpoint link training enable
  PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
  PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
  PCI: rockchip-ep: Rename rockchip_pcie_parse_ep_dt()
  PCI: rockchip-ep: Fix MSI IRQ data mapping
  PCI: rockchip-ep: Implement the pci_epc_ops::align_addr() operation
  PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
  PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
  PCI: rockchip-ep: Use a macro to define EP controller .align feature
  PCI: rockchip-ep: Fix address translation unit programming
  PCI/pwrctrl: Rename pwrctrl functions and structures
  PCI/pwrctrl: Rename pwrctl files to pwrctrl
  PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent
  PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers
  PCI/pwrctl: Create pwrctl device only if at least one power supply is present
  PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices
  tools: PCI: Fix incorrect printf format specifiers
  ...
2024-11-26 18:05:44 -08:00
Michael Chan
3661c05c54 bnxt_en: Unregister PTP during PCI shutdown and suspend
If we go through the PCI shutdown or suspend path, we shutdown the
NIC but PTP remains registered.  If the kernel continues to run for
a little bit, the periodic PTP .do_aux_work() function may be called
and it will read the PHC from the BAR register.  Since the device
has already been disabled, it will cause a PCIe completion timeout.
Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend
handlers.  bnxt_ptp_clear() will unregister from PTP and
.do_aux_work() will be canceled.

In bnxt_resume(), we need to re-initialize PTP.

Fixes: a521c8a01d ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()")
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Michael Chan
1e9614cd95 bnxt_en: Refactor bnxt_ptp_init()
Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init().
Store it in bp->ptp_cfg so that the caller doesn't need to know what
the value should be.

In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume()
and this will make it easier.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Shravya KN
3051a77a09 bnxt_en: Fix receive ring space parameters when XDP is active
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler.  This is done in bnxt_set_rx_skb_mode().

If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode().  This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:

BUG: kernel NULL pointer dereference, address: 00000000000003c0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S         OE      6.1.0-226bf9805506 #1
Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021
RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en]
Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f
RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202
RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff
RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380
RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf
R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980
R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990
FS:  0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 __bnxt_poll_work+0x1c2/0x3e0 [bnxt_en]

To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.

Fixes: 08450ea98a ("bnxt_en: Fix max_mtu setting for multi-buf XDP")
Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Somnath Kotur
5ac066b7b0 bnxt_en: Fix queue start to update vnic RSS table
HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to
have the same FW ring ID as before.  So we must reinitialize the
RSS table with the correct ring IDs.  Otherwise, traffic may not
resume properly if the restarted ring ID is stale.  Since this
feature is only supported on P5_PLUS chips, we call
bnxt_vnic_set_rss_p5() to update the HW RSS table.

Fixes: 2d694c27d3 ("bnxt_en: implement netdev_queue_mgmt_ops")
Cc: David Wei <dw@davidwei.uk>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Shravya KN
5007991670 bnxt_en: Set backplane link modes correctly for ethtool
Use the return value from bnxt_get_media() to determine the port and
link modes.  bnxt_get_media() returns the proper BNXT_MEDIA_KR when
the PHY is backplane.  This will correct the ethtool settings for
backplane devices.

Fixes: 5d4e1bf606 ("bnxt_en: extend media types to supported and autoneg modes")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Saravanan Vajravel
5311598f7f bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down
After successful PCIe AER recovery, FW will reset all resource
reservations.  If it is IF_UP, the driver will call bnxt_open() and
all resources will be reserved again.  It it is IF_DOWN, we should
call bnxt_reserve_rings() so that we can reserve resources including
RoCE resources to allow RoCE to resume after AER.  Without this
patch, RoCE fails to resume in this IF_DOWN scenario.

Later, if it becomes IF_UP, bnxt_open() will see that resources have
been reserved and will not reserve again.

Fixes: fb1e6e562b ("bnxt_en: Fix AER recovery.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Hariprasad Kelam
762ca6eed0 octeontx2-af: Quiesce traffic before NIX block reset
During initialization, the AF driver resets all blocks. The RPM (MAC)
block and NIX block operate on a credit-based model. When the NIX block
resets during active traffic flow, it doesn't release credits to the RPM
block. This causes the RPM FIFO to overflow, leading to receive traffic
struck.

To address this issue, the patch introduces the following changes:
1. Stop receiving traffic at the MAC level during AF driver
   initialization.
2. Perform an X2P reset (prevents RXFIFO of all LMACS from pushing data)
3. Reset the NIX block.
4. Clear the X2P reset and re-enable receiving traffic.

Fixes: 54d557815e ("octeontx2-af: Reset all RVU blocks")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 12:09:41 +01:00
Hariprasad Kelam
6fc2164108 octeontx2-af: RPM: fix stale FCFEC counters
The corrected words register(FCFECX_VL0_CCW_LO)/Uncorrected words
register (FCFECX_VL0_NCCW_LO) of FCFEC counter has different LMAC
offset which needs to be accessed differently.

Fixes: 84ad364211 ("octeontx2-af: Add FEC stats for RPM/RPM_USX block")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 12:09:40 +01:00
Hariprasad Kelam
07cd1eb166 octeontx2-af: RPM: fix stale RSFEC counters
The earlier patch sets the 'Stats control register' for RPM
receive/transmit statistics instead of RSFEC statistics,
causing the driver to return stale FEC counters.

Fixes: 84ad364211 ("octeontx2-af: Add FEC stats for RPM/RPM_USX block")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 12:09:40 +01:00
Hariprasad Kelam
d1e8884e05 octeontx2-af: RPM: Fix low network performance
Low network performance is observed even on RPMs with larger
FIFO lengths.

The cn10kb silicon has three RPM blocks with the following
FIFO sizes:

         --------------------
         | RPM0  |   256KB  |
         | RPM1  |   256KB  |
         | RPM2  |   128KB  |
         --------------------

The current design stores the FIFO length in a common structure for all
RPMs (mac_ops). As a result, the FIFO length of the last RPM is applied
to all RPMs, leading to reduced network performance.

This patch resolved the problem by storing the fifo length in per MAC
structure (cgx).

Fixes: b9d0fedc62 ("octeontx2-af: cn10kb: Add RPM_USX MAC support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 12:09:40 +01:00
Hariprasad Kelam
7ebbbb23ea octeontx2-af: RPM: Fix mismatch in lmac type
Due to a bug in the previous patch, there is a mismatch
between the lmac type reported by the driver and the actual
hardware configuration.

Fixes: 3ad3f8f93c ("octeontx2-af: cn10k: MAC internal loopback support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 12:09:40 +01:00
Maxime Chevallier
407618d66d net: stmmac: dwmac-socfpga: Set RX watchdog interrupt as broken
On DWMAC3 and later, there's a RX Watchdog interrupt that's used for
interrupt coalescing. It's known to be buggy on some platforms, and
dwmac-socfpga appears to be one of them. Changing the interrupt
coalescing from ethtool doesn't appear to have any effect here.

Without disabling RIWT (Received Interrupt Watchdog Timer, I
believe...), we observe latencies while receiving traffic that amount to
around ~0.4ms. This was discovered with NTP but can be easily reproduced
with a simple ping. Without this patch :

64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.657 ms

With this patch :

64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.254 ms

Fixes: 801d233b73 ("net: stmmac: Add SOCFPGA glue driver")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241122141256.764578-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 11:58:12 +01:00
Vitalii Mordan
b032ae57d4 marvell: pxa168_eth: fix call balance of pep->clk handling routines
If the clock pep->clk was not enabled in pxa168_eth_probe,
it should not be disabled in any path.

Conversely, if it was enabled in pxa168_eth_probe, it must be disabled
in all error paths to ensure proper cleanup.

Use the devm_clk_get_enabled helper function to ensure proper call balance
for pep->clk.

Found by Linux Verification Center (linuxtesting.org) with Klever.

Fixes: a49f37eed2 ("net: add Fast Ethernet driver for PXA168.")
Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Link: https://patch.msgid.link/20241121200658.2203871-1-mordan@ispras.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 11:49:51 +01:00
Rosen Penev
9cc8d0ecdd net: mdio-ipq4019: add missing error check
If an optional resource is found but fails to remap, return on failure.
Avoids any potential problems when using the iomapped resource as the
assumption is that it's available.

Fixes: 23a890d493 ("net: mdio: Add the reset function for IPQ MDIO driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241121193152.8966-1-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 11:01:39 +01:00
Choong Yong Liang
59c5e1411a net: stmmac: set initial EEE policy configuration
Set the initial eee_cfg values to have 'ethtool --show-eee ' display
the initial EEE configuration.

Fixes: 49168d1980 ("net: phy: Add phy_support_eee() indicating MAC support EEE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241120083818.1079456-1-yong.liang.choong@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 10:17:00 +01:00
Justin Lai
a01cfcfda5 rtase: Corrects error handling of the rtase_check_mac_version_valid()
Previously, when the hardware version ID was determined to be invalid,
only an error message was printed without any further handling. Therefore,
this patch makes the necessary corrections to address this.

Fixes: a36e9f5cfe ("rtase: Add support for a pci table in this module")
Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 10:12:58 +01:00
Justin Lai
c1fc14c4df rtase: Correct the speed for RTL907XD-V1
Previously, the reported speed was uniformly set to SPEED_5000, but the
RTL907XD-V1 actually operates at a speed of SPEED_10000. Therefore, this
patch makes the necessary correction.

Fixes: dd7f17c40f ("rtase: Implement ethtool function")
Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 10:12:58 +01:00
Justin Lai
a1f8609ff1 rtase: Refactor the rtase_check_mac_version_valid() function
Different hardware requires different configurations, but this distinction
was not made previously. Additionally, the error message was not clear
enough. Therefore, this patch will address the issues mentioned above.

Fixes: a36e9f5cfe ("rtase: Add support for a pci table in this module")
Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 10:12:58 +01:00
Guenter Roeck
f164b29663 net: microchip: vcap: Add typegroup table terminators in kunit tests
VCAP API unit tests fail randomly with errors such as

   # vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:387
   Expected 134 + 7 == iter.offset, but
       134 + 7 == 141 (0x8d)
       iter.offset == 17214 (0x433e)
   # vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:388
   Expected 5 == iter.reg_idx, but
       iter.reg_idx == 702 (0x2be)
   # vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:389
   Expected 11 == iter.reg_bitpos, but
       iter.reg_bitpos == 15 (0xf)
   # vcap_api_iterator_init_test: pass:0 fail:1 skip:0 total:1

Comments in the code state that "A typegroup table ends with an all-zero
terminator". Add the missing terminators.

Some of the typegroups did have a terminator of ".offset = 0, .width = 0,
.value = 0,". Replace those terminators with "{ }" (no trailing ',') for
consistency and to excplicitly state "this is a terminator".

Fixes: 67d637516f ("net: microchip: sparx5: Adding KUNIT test for the VCAP API")
Cc: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241119213202.2884639-1-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-24 16:52:13 -08:00
Oleksij Rempel
e863ff806f net: usb: lan78xx: Fix refcounting and autosuspend on invalid WoL configuration
Validate Wake-on-LAN (WoL) options in `lan78xx_set_wol` before calling
`usb_autopm_get_interface`. This prevents USB autopm refcounting issues
and ensures the adapter can properly enter autosuspend when invalid WoL
options are provided.

Fixes: eb9ad088f9 ("lan78xx: Check for supported Wake-on-LAN modes")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://patch.msgid.link/20241118140351.2398166-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-24 16:50:55 -08:00
Pavan Chebbi
614f4d166e tg3: Set coherent DMA mask bits to 31 for BCM57766 chipsets
The hardware on Broadcom 1G chipsets have a known limitation
where they cannot handle DMA addresses that cross over 4GB.
When such an address is encountered, the hardware sets the
address overflow error bit in the DMA status register and
triggers a reset.

However, BCM57766 hardware is setting the overflow bit and
triggering a reset in some cases when there is no actual
underlying address overflow. The hardware team analyzed the
issue and concluded that it is happening when the status
block update has an address with higher (b16 to b31) bits
as 0xffff following a previous update that had lowest bits
as 0xffff.

To work around this bug in the BCM57766 hardware, set the
coherent dma mask from the current 64b to 31b. This will
ensure that upper bits of the status block DMA address are
always at most 0x7fff, thus avoiding the improper overflow
check described above. This work around is intended for only
status block and ring memories and has no effect on TX and
RX buffers as they do not require coherent memory.

Fixes: 72f2afb8a6 ("[TG3]: Add DMA address workaround")
Reported-by: Salam Noureddine <noureddine@arista.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20241119055741.147144-1-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-24 16:45:48 -08:00
Oleksij Rempel
ae7370e61c net: usb: lan78xx: Fix memory leak on device unplug by freeing PHY device
Add calls to `phy_device_free` after `fixed_phy_unregister` to fix a
memory leak that occurs when the device is unplugged. This ensures
proper cleanup of pseudo fixed-link PHYs.

Fixes: 89b36fb5e5 ("lan78xx: Lan7801 Support for Fixed PHY")
Cc: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241116130558.1352230-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-24 16:28:57 -08:00
Oleksij Rempel
03819abbeb net: usb: lan78xx: Fix double free issue with interrupt buffer allocation
In lan78xx_probe(), the buffer `buf` was being freed twice: once
implicitly through `usb_free_urb(dev->urb_intr)` with the
`URB_FREE_BUFFER` flag and again explicitly by `kfree(buf)`. This caused
a double free issue.

To resolve this, reordered `kmalloc()` and `usb_alloc_urb()` calls to
simplify the initialization sequence and removed the redundant
`kfree(buf)`.  Now, `buf` is allocated after `usb_alloc_urb()`, ensuring
it is correctly managed by  `usb_fill_int_urb()` and freed by
`usb_free_urb()` as intended.

Fixes: a6df95cae4 ("lan78xx: Fix memory allocation bug")
Cc: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241116130558.1352230-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-24 16:28:57 -08:00
Heiner Kallweit
f26a29a038 net: phy: ensure that genphy_c45_an_config_eee_aneg() sees new value of phydev->eee_cfg.eee_enabled
This is a follow-up to 41ffcd9501 ("net: phy: fix phylib's dual
eee_enabled") and resolves an issue with genphy_c45_an_config_eee_aneg()
(called from genphy_c45_ethtool_set_eee) not seeing the new value of
phydev->eee_cfg.eee_enabled.

Fixes: 49168d1980 ("net: phy: Add phy_support_eee() indicating MAC support EEE")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-24 14:49:41 +00:00
Linus Torvalds
2a163a4cea RDMA v6.13 merge window pull request
Seveal fixes scattered across the drivers and a few new features:
 
 - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns
 
 - Force disassociate the userspace FD when hns does an async reset
 
 - bnxt new features for optimized modify QP to skip certain stayes, CQ
   coalescing, better debug dumping
 
 - mlx5 new data placement ordering feature
 
 - Faster destruction of mlx5 devx HW objects
 
 - Improvements to RDMA CM mad handling
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZz4ENwAKCRCFwuHvBreF
 YQYQAP9R54r5J1Iylg+zqhCc+e/9oveuuZbfLvy/EJiEpmdprQEAgPs1RrB0z7U6
 1xrVStUKNPhGd5XeVVZGkIV0zYv6Tw4=
 =V5xI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Seveal fixes scattered across the drivers and a few new features:

   - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns

   - Force disassociate the userspace FD when hns does an async reset

   - bnxt new features for optimized modify QP to skip certain stayes,
     CQ coalescing, better debug dumping

   - mlx5 new data placement ordering feature

   - Faster destruction of mlx5 devx HW objects

   - Improvements to RDMA CM mad handling"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits)
  RDMA/bnxt_re: Correct the sequence of device suspend
  RDMA/bnxt_re: Use the default mode of congestion control
  RDMA/bnxt_re: Support different traffic class
  IB/cm: Rework sending DREQ when destroying a cm_id
  IB/cm: Do not hold reference on cm_id unless needed
  IB/cm: Explicitly mark if a response MAD is a retransmission
  RDMA/mlx5: Move events notifier registration to be after device registration
  RDMA/bnxt_re: Cache MSIx info to a local structure
  RDMA/bnxt_re: Refurbish CQ to NQ hash calculation
  RDMA/bnxt_re: Refactor NQ allocation
  RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved
  RDMA/hns: Fix different dgids mapping to the same dip_idx
  RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters
  RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design
  bnxt_en: Add support for RoCE sriov configuration
  RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()
  RDMA/hns: Fix out-of-order issue of requester when setting FENCE
  RDMA/nldev: Add IB device and net device rename events
  RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
  RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
  ...
2024-11-22 20:03:57 -08:00
Linus Torvalds
fcc79e1714 Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new
 behavior is disabled by default, regression risk should be contained.
 
 Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
 default value from PTP_1588_CLOCK_KVM, as the first is intended to be
 a more reliable replacement for the latter.
 
 Core
 ----
 
  - Started a very large, in-progress, effort to make the RTNL lock
    scope per network-namespace, thus reducing the lock contention
    significantly in the containerized use-case, comprising:
    - RCU-ified some relevant slices of the FIB control path
    - introduce basic per netns locking helpers
    - namespacified the IPv4 address hash table
    - remove rtnl_register{,_module}() in favour of rtnl_register_many()
    - refactor rtnl_{new,del,set}link() moving as much validation as
      possible out of RTNL lock
    - convert all phonet doit() and dumpit() handlers to RCU
    - convert IPv4 addresses manipulation to per-netns RTNL
    - convert virtual interface creation to per-netns RTNL
    the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
    knob, disabled by default ad interim.
 
  - Introduce NAPI suspension, to efficiently switching between busy
    polling (NAPI processing suspended) and normal processing.
 
  - Migrate the IPv4 routing input, output and control path from direct
    ToS usage to DSCP macros. This is a work in progress to make ECN
    handling consistent and reliable.
 
  - Add drop reasons support to the IPv4 rotue input path, allowing
    better introspection in case of packets drop.
 
  - Make FIB seqnum lockless, dropping RTNL protection for read
    access.
 
  - Make inet{,v6} addresses hashing less predicable.
 
  - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
    and timestamps
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add small file operations for debugfs, to reduce the struct ops size.
 
  - Refactoring and optimization for the implementation of page_frag API,
    This is a preparatory work to consolidate the page_frag
    implementation.
 
 Netfilter
 ---------
 
  - Optimize set element transactions to reduce memory consumption
 
  - Extended netlink error reporting for attribute parser failure.
 
  - Make legacy xtables configs user selectable, giving users
    the option to configure iptables without enabling any other config.
 
  - Address a lot of false-positive RCU issues, pointed by recent
    CI improvements.
 
 BPF
 ---
 
  - Put xsk sockets on a struct diet and add various cleanups. Overall,
    this helps to bump performance by 12% for some workloads.
 
  - Extend BPF selftests to increase coverage of XDP features in
    combination with BPF cpumap.
 
  - Optimize and homogenize bpf_csum_diff helper for all archs and also
    add a batch of new BPF selftests for it.
 
  - Extend netkit with an option to delegate skb->{mark,priority}
    scrubbing to its BPF program.
 
  - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
    programs.
 
 Protocols
 ---------
 
  - Introduces 4-tuple hash for connected udp sockets, speeding-up
    significantly connected sockets lookup.
 
  - Add a fastpath for some TCP timers that usually expires after close,
    the socket lock contention.
 
  - Add inbound and outbound xfrm state caches to speed up state lookups.
 
  - Avoid sending MPTCP advertisements on stale subflows, reducing
    risks on loosing them.
 
  - Make neighbours table flushing more scalable, maintaining per device
    neigh lists.
 
 Driver API
 ----------
 
  - Introduce a unified interface to configure transmission H/W shaping,
    and expose it to user-space via generic-netlink.
 
  - Add support for per-NAPI config via netlink. This makes napi
    configuration persistent across queues removal and re-creation.
    Requires driver updates, currently supported drivers are:
    nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
 
  - Add ethtool support for writing SFP / PHY firmware blocks.
 
  - Track RSS context allocation from ethtool core.
 
  - Implement support for mirroring to DSA CPU port, via TC mirror
    offload.
 
  - Consolidate FDB updates notification, to avoid duplicates on
    device-specific entries.
 
  - Expose DPLL clock quality level to the user-space.
 
  - Support master-slave PHY config via device tree.
 
 Tests and tooling
 -----------------
 
  - forwarding: introduce deferred commands, to simplify
    the cleanup phase
 
 Drivers
 -------
 
  - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
    Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
    IRQs and queues to NAPI IDs, allowing busy polling and better
    introspection.
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - mlx5:
        - a large refactor to implement support for cross E-Switch
          scheduling
        - refactor H/W conter management to let it scale better
        - H/W GRO cleanups
    - Intel (100G, ice)::
      - adds support for ethtool reset
      - implement support for per TX queue H/W shaping
    - AMD/Solarflare:
      - implement per device queue stats support
    - Broadcom (bnxt):
      - improve wildcard l4proto on IPv4/IPv6 ntuple rules
    - Marvell Octeon:
      - Adds representor support for each Resource Virtualization Unit
        (RVU) device.
    - Hisilicon:
      - adds support for the BMC Gigabit Ethernet
    - IBM (EMAC):
      - driver cleanup and modernization
    - Cisco (VIC):
      - raise the queues number limit to 256
 
  - Ethernet virtual:
    - Google vNIC:
      - implements page pool support
    - macsec:
      - inherit lower device's features and TSO limits when offloading
    - virtio_net:
      - enable premapped mode by default
      - support for XDP socket(AF_XDP) zerocopy TX
    - wireguard:
      - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
        packets.
 
  - Ethernet NICs embedded and virtual:
    - Broadcom ASP:
      - enable software timestamping
    - Freescale:
      - add enetc4 PF driver
    - MediaTek: Airoha SoC:
      - implement BQL support
    - RealTek r8169:
      - enable TSO by default on r8168/r8125
      - implement extended ethtool stats
    - Renesas AVB:
      - enable TX checksum offload
    - Synopsys (stmmac):
      - support header splitting for vlan tagged packets
      - move common code for DWMAC4 and DWXGMAC into a separate FPE
        module.
      - Add the dwmac driver support for T-HEAD TH1520 SoC
    - Synopsys (xpcs):
      - driver refactor and cleanup
    - TI:
      - icssg_prueth: add VLAN offload support
    - Xilinx emaclite:
      - adds clock support
 
  - Ethernet switches:
    - Microchip:
      - implement support for the lan969x Ethernet switch family
      - add LAN9646 switch support to KSZ DSA driver
 
  - Ethernet PHYs:
    - Marvel: 88q2x: enable auto negotiation
    - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
 
  - PTP:
    - Add support for the Amazon virtual clock device
    - Add PtP driver for s390 clocks
 
  - WiFi:
    - mac80211
      - EHT 1024 aggregation size for transmissions
      - new operation to indicate that a new interface is to be added
      - support radio separation of multi-band devices
      - move wireless extension spy implementation to libiw
    - Broadcom:
      - brcmfmac: optional LPO clock support
    - Microchip:
      - add support for Atmel WILC3000
    - Qualcomm (ath12k):
      - firmware coredump collection support
      - add debugfs support for a multitude of statistics
    - Qualcomm (ath5k):
      -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
    - Realtek:
      - rtw88: 8821au and 8812au USB adapters support
      - rtw89: add thermal protection
      - rtw89: fine tune BT-coexsitence to improve user experience
      - rtw89: firmware secure boot for WiFi 6 chip
 
  - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
 wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
 vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
 jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
 b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
 kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
 BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
 lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
 Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
 eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
 THcvVM+bupEZ
 =GzPr
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most significant set of changes is the per netns RTNL. The new
  behavior is disabled by default, regression risk should be contained.

  Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
  default value from PTP_1588_CLOCK_KVM, as the first is intended to be
  a more reliable replacement for the latter.

  Core:

   - Started a very large, in-progress, effort to make the RTNL lock
     scope per network-namespace, thus reducing the lock contention
     significantly in the containerized use-case, comprising:
       - RCU-ified some relevant slices of the FIB control path
       - introduce basic per netns locking helpers
       - namespacified the IPv4 address hash table
       - remove rtnl_register{,_module}() in favour of
         rtnl_register_many()
       - refactor rtnl_{new,del,set}link() moving as much validation as
         possible out of RTNL lock
       - convert all phonet doit() and dumpit() handlers to RCU
       - convert IPv4 addresses manipulation to per-netns RTNL
       - convert virtual interface creation to per-netns RTNL
     the per-netns lock infrastructure is guarded by the
     CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.

   - Introduce NAPI suspension, to efficiently switching between busy
     polling (NAPI processing suspended) and normal processing.

   - Migrate the IPv4 routing input, output and control path from direct
     ToS usage to DSCP macros. This is a work in progress to make ECN
     handling consistent and reliable.

   - Add drop reasons support to the IPv4 rotue input path, allowing
     better introspection in case of packets drop.

   - Make FIB seqnum lockless, dropping RTNL protection for read access.

   - Make inet{,v6} addresses hashing less predicable.

   - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
     and timestamps

  Things we sprinkled into general kernel code:

   - Add small file operations for debugfs, to reduce the struct ops
     size.

   - Refactoring and optimization for the implementation of page_frag
     API, This is a preparatory work to consolidate the page_frag
     implementation.

  Netfilter:

   - Optimize set element transactions to reduce memory consumption

   - Extended netlink error reporting for attribute parser failure.

   - Make legacy xtables configs user selectable, giving users the
     option to configure iptables without enabling any other config.

   - Address a lot of false-positive RCU issues, pointed by recent CI
     improvements.

  BPF:

   - Put xsk sockets on a struct diet and add various cleanups. Overall,
     this helps to bump performance by 12% for some workloads.

   - Extend BPF selftests to increase coverage of XDP features in
     combination with BPF cpumap.

   - Optimize and homogenize bpf_csum_diff helper for all archs and also
     add a batch of new BPF selftests for it.

   - Extend netkit with an option to delegate skb->{mark,priority}
     scrubbing to its BPF program.

   - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
     programs.

  Protocols:

   - Introduces 4-tuple hash for connected udp sockets, speeding-up
     significantly connected sockets lookup.

   - Add a fastpath for some TCP timers that usually expires after
     close, the socket lock contention.

   - Add inbound and outbound xfrm state caches to speed up state
     lookups.

   - Avoid sending MPTCP advertisements on stale subflows, reducing
     risks on loosing them.

   - Make neighbours table flushing more scalable, maintaining per
     device neigh lists.

  Driver API:

   - Introduce a unified interface to configure transmission H/W
     shaping, and expose it to user-space via generic-netlink.

   - Add support for per-NAPI config via netlink. This makes napi
     configuration persistent across queues removal and re-creation.
     Requires driver updates, currently supported drivers are:
     nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.

   - Add ethtool support for writing SFP / PHY firmware blocks.

   - Track RSS context allocation from ethtool core.

   - Implement support for mirroring to DSA CPU port, via TC mirror
     offload.

   - Consolidate FDB updates notification, to avoid duplicates on
     device-specific entries.

   - Expose DPLL clock quality level to the user-space.

   - Support master-slave PHY config via device tree.

  Tests and tooling:

   - forwarding: introduce deferred commands, to simplify the cleanup
     phase

  Drivers:

   - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
     Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
     IRQs and queues to NAPI IDs, allowing busy polling and better
     introspection.

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - mlx5:
           - a large refactor to implement support for cross E-Switch
             scheduling
           - refactor H/W conter management to let it scale better
           - H/W GRO cleanups
      - Intel (100G, ice)::
         - add support for ethtool reset
         - implement support for per TX queue H/W shaping
      - AMD/Solarflare:
         - implement per device queue stats support
      - Broadcom (bnxt):
         - improve wildcard l4proto on IPv4/IPv6 ntuple rules
      - Marvell Octeon:
         - Add representor support for each Resource Virtualization Unit
           (RVU) device.
      - Hisilicon:
         - add support for the BMC Gigabit Ethernet
      - IBM (EMAC):
         - driver cleanup and modernization
      - Cisco (VIC):
         - raise the queues number limit to 256

   - Ethernet virtual:
      - Google vNIC:
         - implement page pool support
      - macsec:
         - inherit lower device's features and TSO limits when
           offloading
      - virtio_net:
         - enable premapped mode by default
         - support for XDP socket(AF_XDP) zerocopy TX
      - wireguard:
         - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
           packets.

   - Ethernet NICs embedded and virtual:
      - Broadcom ASP:
         - enable software timestamping
      - Freescale:
         - add enetc4 PF driver
      - MediaTek: Airoha SoC:
         - implement BQL support
      - RealTek r8169:
         - enable TSO by default on r8168/r8125
         - implement extended ethtool stats
      - Renesas AVB:
         - enable TX checksum offload
      - Synopsys (stmmac):
         - support header splitting for vlan tagged packets
         - move common code for DWMAC4 and DWXGMAC into a separate FPE
           module.
         - add dwmac driver support for T-HEAD TH1520 SoC
      - Synopsys (xpcs):
         - driver refactor and cleanup
      - TI:
         - icssg_prueth: add VLAN offload support
      - Xilinx emaclite:
         - add clock support

   - Ethernet switches:
      - Microchip:
         - implement support for the lan969x Ethernet switch family
         - add LAN9646 switch support to KSZ DSA driver

   - Ethernet PHYs:
      - Marvel: 88q2x: enable auto negotiation
      - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2

   - PTP:
      - Add support for the Amazon virtual clock device
      - Add PtP driver for s390 clocks

   - WiFi:
      - mac80211
         - EHT 1024 aggregation size for transmissions
         - new operation to indicate that a new interface is to be added
         - support radio separation of multi-band devices
         - move wireless extension spy implementation to libiw
      - Broadcom:
         - brcmfmac: optional LPO clock support
      - Microchip:
         - add support for Atmel WILC3000
      - Qualcomm (ath12k):
         - firmware coredump collection support
         - add debugfs support for a multitude of statistics
      - Qualcomm (ath5k):
         -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
      - Realtek:
         - rtw88: 8821au and 8812au USB adapters support
         - rtw89: add thermal protection
         - rtw89: fine tune BT-coexsitence to improve user experience
         - rtw89: firmware secure boot for WiFi 6 chip

   - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature"

* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
  mm: page_frag: fix a compile error when kernel is not compiled
  Documentation: tipc: fix formatting issue in tipc.rst
  selftests: nic_performance: Add selftest for performance of NIC driver
  selftests: nic_link_layer: Add selftest case for speed and duplex states
  selftests: nic_link_layer: Add link layer selftest for NIC driver
  bnxt_en: Add FW trace coredump segments to the coredump
  bnxt_en: Add a new ethtool -W dump flag
  bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
  bnxt_en: Add functions to copy host context memory
  bnxt_en: Do not free FW log context memory
  bnxt_en: Manage the FW trace context memory
  bnxt_en: Allocate backing store memory for FW trace logs
  bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
  bnxt_en: Refactor bnxt_free_ctx_mem()
  bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
  bnxt_en: Update firmware interface spec to 1.10.3.85
  selftests/bpf: Add some tests with sockmap SK_PASS
  bpf: fix recursive lock when verdict program return SK_PASS
  wireguard: device: support big tcp GSO
  wireguard: selftests: load nf_conntrack if not present
  ...
2024-11-21 08:28:08 -08:00
Linus Torvalds
bf9aa14fc5 A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
 
     posix-timers are currently auto-rearmed by the kernel when the signal
     of the timer is ignored so that the timer signal can be delivered once
     the corresponding signal is unignored.
 
     This requires to throttle the timer to prevent a DoS by small intervals
     and keeps the system pointlessly out of low power states for no value.
     This is a long standing non-trivial problem due to the lock order of
     posix-timer lock and the sighand lock along with life time issues as
     the timer and the sigqueue have different life time rules.
 
     Cure this by:
 
      * Embedding the sigqueue into the timer struct to have the same life
        time rules. Aside of that this also avoids the lookup of the timer
        in the signal delivery and rearm path as it's just a always valid
        container_of() now.
 
      * Queuing ignored timer signals onto a seperate ignored list.
 
      * Moving queued timer signals onto the ignored list when the signal is
        switched to SIG_IGN before it could be delivered.
 
      * Walking the ignored list when SIG_IGN is lifted and requeue the
        signals to the actual signal lists. This allows the signal delivery
        code to rearm the timer.
 
     This also required to consolidate the signal delivery rules so they are
     consistent across all situations. With that all self test scenarios
     finally succeed.
 
   - Core infrastructure for VFS multigrain timestamping
 
     This is required to allow the kernel to use coarse grained time stamps
     by default and switch to fine grained time stamps when inode attributes
     are actively observed via getattr().
 
     These changes have been provided to the VFS tree as well, so that the
     VFS specific infrastructure could be built on top.
 
   - Cleanup and consolidation of the sleep() infrastructure
 
     * Move all sleep and timeout functions into one file
 
     * Rework udelay() and ndelay() into proper documented inline functions
       and replace the hardcoded magic numbers by proper defines.
 
     * Rework the fsleep() implementation to take the reality of the timer
       wheel granularity on different HZ values into account. Right now the
       boundaries are hard coded time ranges which fail to provide the
       requested accuracy on different HZ settings.
 
     * Update documentation for all sleep/timeout related functions and fix
       up stale documentation links all over the place
 
     * Fixup a few usage sites
 
   - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks
 
     A system can have multiple PTP clocks which are participating in
     seperate and independent PTP clock domains. So far the kernel only
     considers the PTP clock which is based on CLOCK TAI relevant as that's
     the clock which drives the timekeeping adjustments via the various user
     space daemons through adjtimex(2).
 
     The non TAI based clock domains are accessible via the file descriptor
     based posix clocks, but their usability is very limited. They can't be
     accessed fast as they always go all the way out to the hardware and
     they cannot be utilized in the kernel itself.
 
     As Time Sensitive Networking (TSN) gains traction it is required to
     provide fast user and kernel space access to these clocks.
 
     The approach taken is to utilize the timekeeping and adjtimex(2)
     infrastructure to provide this access in a similar way how the kernel
     provides access to clock MONOTONIC, REALTIME etc.
 
     Instead of creating a duplicated infrastructure this rework converts
     timekeeping and adjtimex(2) into generic functionality which operates
     on pointers to data structures instead of using static variables.
 
     This allows to provide time accessors and adjtimex(2) functionality for
     the independent PTP clocks in a subsequent step.
 
   - Consolidate hrtimer initialization
 
     hrtimers are set up by initializing the data structure and then
     seperately setting the callback function for historical reasons.
 
     That's an extra unnecessary step and makes Rust support less straight
     forward than it should be.
 
     Provide a new set of hrtimer_setup*() functions and convert the core
     code and a few usage sites of the less frequently used interfaces over.
 
     The bulk of the htimer_init() to hrtimer_setup() conversion is already
     prepared and scheduled for the next merge window.
 
   - Drivers:
 
     * Ensure that the global timekeeping clocksource is utilizing the
       cluster 0 timer on MIPS multi-cluster systems.
 
       Otherwise CPUs on different clusters use their cluster specific
       clocksource which is not guaranteed to be synchronized with other
       clusters.
 
     * Mostly boring cleanups, fixes, improvements and code movement
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kPITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZKkD/9OUL6fOJrDUmOYBa4QVeMyfTef4EaL
 tvwIMM/29XQFeiq3xxCIn+EMnHjXn2lvIhYGQ7GKsbKYwvJ7ZBDpQb+UMhZ2nKI9
 6D6BP6WomZohKeH2fZbJQAdqOi3KRYdvQdIsVZUexkqiaVPphRvOH9wOr45gHtZM
 EyMRSotPlQTDqcrbUejDMEO94GyjDCYXRsyATLxjmTzL/N4xD4NRIiotjM2vL/a9
 8MuCgIhrKUEyYlFoOxxeokBsF3kk3/ez2jlG9b/N8VLH3SYIc2zgL58FBgWxlmgG
 bY71nVG3nUgEjxBd2dcXAVVqvb+5widk8p6O7xxOAQKTLMcJ4H0tQDkMnzBtUzvB
 DGAJDHAmAr0g+ja9O35Pkhunkh4HYFIbq0Il4d1HMKObhJV0JumcKuQVxrXycdm3
 UZfq3seqHsZJQbPgCAhlFU0/2WWScocbee9bNebGT33KVwSp5FoVv89C/6Vjb+vV
 Gusc3thqrQuMAZW5zV8g4UcBAA/xH4PB0I+vHib+9XPZ4UQ7/6xKl2jE0kd5hX7n
 AAUeZvFNFqIsY+B6vz+Jx/yzyM7u5cuXq87pof5EHVFzv56lyTp4ToGcOGYRgKH5
 JXeYV1OxGziSDrd5vbf9CzdWMzqMvTefXrHbWrjkjhNOe8E1A8O88RZ5uRKZhmSw
 hZZ4hdM9+3T7cg==
 =2VC6
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A rather large update for timekeeping and timers:

   - The final step to get rid of auto-rearming posix-timers

     posix-timers are currently auto-rearmed by the kernel when the
     signal of the timer is ignored so that the timer signal can be
     delivered once the corresponding signal is unignored.

     This requires to throttle the timer to prevent a DoS by small
     intervals and keeps the system pointlessly out of low power states
     for no value. This is a long standing non-trivial problem due to
     the lock order of posix-timer lock and the sighand lock along with
     life time issues as the timer and the sigqueue have different life
     time rules.

     Cure this by:

       - Embedding the sigqueue into the timer struct to have the same
         life time rules. Aside of that this also avoids the lookup of
         the timer in the signal delivery and rearm path as it's just a
         always valid container_of() now.

       - Queuing ignored timer signals onto a seperate ignored list.

       - Moving queued timer signals onto the ignored list when the
         signal is switched to SIG_IGN before it could be delivered.

       - Walking the ignored list when SIG_IGN is lifted and requeue the
         signals to the actual signal lists. This allows the signal
         delivery code to rearm the timer.

     This also required to consolidate the signal delivery rules so they
     are consistent across all situations. With that all self test
     scenarios finally succeed.

   - Core infrastructure for VFS multigrain timestamping

     This is required to allow the kernel to use coarse grained time
     stamps by default and switch to fine grained time stamps when inode
     attributes are actively observed via getattr().

     These changes have been provided to the VFS tree as well, so that
     the VFS specific infrastructure could be built on top.

   - Cleanup and consolidation of the sleep() infrastructure

       - Move all sleep and timeout functions into one file

       - Rework udelay() and ndelay() into proper documented inline
         functions and replace the hardcoded magic numbers by proper
         defines.

       - Rework the fsleep() implementation to take the reality of the
         timer wheel granularity on different HZ values into account.
         Right now the boundaries are hard coded time ranges which fail
         to provide the requested accuracy on different HZ settings.

       - Update documentation for all sleep/timeout related functions
         and fix up stale documentation links all over the place

       - Fixup a few usage sites

   - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
     clocks

     A system can have multiple PTP clocks which are participating in
     seperate and independent PTP clock domains. So far the kernel only
     considers the PTP clock which is based on CLOCK TAI relevant as
     that's the clock which drives the timekeeping adjustments via the
     various user space daemons through adjtimex(2).

     The non TAI based clock domains are accessible via the file
     descriptor based posix clocks, but their usability is very limited.
     They can't be accessed fast as they always go all the way out to
     the hardware and they cannot be utilized in the kernel itself.

     As Time Sensitive Networking (TSN) gains traction it is required to
     provide fast user and kernel space access to these clocks.

     The approach taken is to utilize the timekeeping and adjtimex(2)
     infrastructure to provide this access in a similar way how the
     kernel provides access to clock MONOTONIC, REALTIME etc.

     Instead of creating a duplicated infrastructure this rework
     converts timekeeping and adjtimex(2) into generic functionality
     which operates on pointers to data structures instead of using
     static variables.

     This allows to provide time accessors and adjtimex(2) functionality
     for the independent PTP clocks in a subsequent step.

   - Consolidate hrtimer initialization

     hrtimers are set up by initializing the data structure and then
     seperately setting the callback function for historical reasons.

     That's an extra unnecessary step and makes Rust support less
     straight forward than it should be.

     Provide a new set of hrtimer_setup*() functions and convert the
     core code and a few usage sites of the less frequently used
     interfaces over.

     The bulk of the htimer_init() to hrtimer_setup() conversion is
     already prepared and scheduled for the next merge window.

   - Drivers:

       - Ensure that the global timekeeping clocksource is utilizing the
         cluster 0 timer on MIPS multi-cluster systems.

         Otherwise CPUs on different clusters use their cluster specific
         clocksource which is not guaranteed to be synchronized with
         other clusters.

       - Mostly boring cleanups, fixes, improvements and code movement"

* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
  posix-timers: Fix spurious warning on double enqueue versus do_exit()
  clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
  clocksource/drivers/gpx: Remove redundant casts
  clocksource/drivers/timer-ti-dm: Fix child node refcount handling
  dt-bindings: timer: actions,owl-timer: convert to YAML
  clocksource/drivers/ralink: Add Ralink System Tick Counter driver
  clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
  clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
  clocksource/drivers:sp804: Make user selectable
  clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
  hrtimers: Delete hrtimer_init_on_stack()
  alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
  io_uring: Switch to use hrtimer_setup_on_stack()
  sched/idle: Switch to use hrtimer_setup_on_stack()
  hrtimers: Delete hrtimer_init_sleeper_on_stack()
  wait: Switch to use hrtimer_setup_sleeper_on_stack()
  timers: Switch to use hrtimer_setup_sleeper_on_stack()
  net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
  futex: Switch to use hrtimer_setup_sleeper_on_stack()
  fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
  ...
2024-11-19 16:35:06 -08:00
Linus Torvalds
5c2b050848 A set of updates for the interrupt subsystem:
- Tree wide:
 
     * Make nr_irqs static to the core code and provide accessor functions
       to remove existing and prevent future aliasing problems with local
       variables or function arguments of the same name.
 
   - Core code:
 
     * Prevent freeing an interrupt in the devres code which is not managed
       by devres in the first place.
 
     * Use seq_put_decimal_ull_width() for decimal values output in
       /proc/interrupts which increases performance significantly as it
       avoids parsing the format strings over and over.
 
     * Optimize raising the timer and hrtimer soft interrupts by using the
       'set bit only' variants instead of the combined version which checks
       whether ksoftirqd should be woken up. The latter is a pointless
       exercise as both soft interrupts are raised in the context of the
       timer interrupt and therefore never wake up ksoftirqd.
 
     * Delegate timer/hrtimer soft interrupt processing to a dedicated thread
       on RT.
 
       Timer and hrtimer soft interrupts are always processed in ksoftirqd
       on RT enabled kernels. This can lead to high latencies when other
       soft interrupts are delegated to ksoftirqd as well.
 
       The separate thread allows to run them seperately under a RT
       scheduling policy to reduce the latency overhead.
 
   - Drivers:
 
     * New drivers or extensions of existing drivers to support Renesas
       RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
       chips
 
     * Support for multi-cluster GICs on MIPS.
 
       MIPS CPUs can come with multiple CPU clusters, where each CPU cluster
       has its own GIC (Generic Interrupt Controller). This requires to
       access the GIC of a remote cluster through a redirect register block.
 
       This is encapsulated into a set of helper functions to keep the
       complexity out of the actual code paths which handle the GIC details.
 
     * Support for encrypted guests in the ARM GICV3 ITS driver
 
       The ITS page needs to be shared with the hypervisor and therefore
       must be decrypted.
 
     * Small cleanups and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7ggcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaf7D/9G6FgJXx/60zqnpnOr9Yx0hxjaI47x
 PFyCd3P05qyVMBYXfI99vrSKuVdMZXJ/fH5L83y+sOaTASyLTzg37igZycIDJzLI
 FnHh/m/+UA8k2aIC5VUiNAjne2RLaTZiRN15uEHFVjByC5Y+YTlCNUE4BBhg5RfQ
 hKmskeffWdtui3ou13CSNvbFn+pmqi4g6n1ysUuLhiwM2E5b1rZMprcCOnun/cGP
 IdUQsODNWTTv9eqPJez985M6A1x2SCGNv7Z73h58B9N0pBRPEC1xnhUnCJ1sA0cJ
 pnfde2C1lztEjYbwDngy0wgq0P6LINjQ5Ma2YY2F2hTMsXGJxGPDZm24/u5uR46x
 N/gsOQMXqw6f5yvbiS7Asx9WzR6ry8rJl70QRgTyozz7xxJTaiNm2HqVFe2wc+et
 Q/BzaKdhmUJj1GMZmqD2rrgwYeDcb4wWYNtwjM4PVHHxYlJVq0mEF1kLLS8YDyjf
 HuGPVqtSkt3E0+Br3FKcv5ltUQP8clXbudc6L1u98YBfNK12hW8L+c3YSvIiFoYM
 ZOAeANPM7VtQbP2Jg2q81Dd3CShImt5jqL2um+l8g7+mUE7l9gyuO/w/a5dQ57+b
 kx7mHHIW2zCeHrkZZbRUYzI2BJfMCCOVN4Ax5OZxTLnLsL9VEehy8NM8QYT4TS8R
 XmTOYW3U9XR3gw==
 =JqxC
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:
 "Tree wide:

   - Make nr_irqs static to the core code and provide accessor functions
     to remove existing and prevent future aliasing problems with local
     variables or function arguments of the same name.

  Core code:

   - Prevent freeing an interrupt in the devres code which is not
     managed by devres in the first place.

   - Use seq_put_decimal_ull_width() for decimal values output in
     /proc/interrupts which increases performance significantly as it
     avoids parsing the format strings over and over.

   - Optimize raising the timer and hrtimer soft interrupts by using the
     'set bit only' variants instead of the combined version which
     checks whether ksoftirqd should be woken up. The latter is a
     pointless exercise as both soft interrupts are raised in the
     context of the timer interrupt and therefore never wake up
     ksoftirqd.

   - Delegate timer/hrtimer soft interrupt processing to a dedicated
     thread on RT.

     Timer and hrtimer soft interrupts are always processed in ksoftirqd
     on RT enabled kernels. This can lead to high latencies when other
     soft interrupts are delegated to ksoftirqd as well.

     The separate thread allows to run them seperately under a RT
     scheduling policy to reduce the latency overhead.

  Drivers:

   - New drivers or extensions of existing drivers to support Renesas
     RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
     chips

   - Support for multi-cluster GICs on MIPS.

     MIPS CPUs can come with multiple CPU clusters, where each CPU
     cluster has its own GIC (Generic Interrupt Controller). This
     requires to access the GIC of a remote cluster through a redirect
     register block.

     This is encapsulated into a set of helper functions to keep the
     complexity out of the actual code paths which handle the GIC
     details.

   - Support for encrypted guests in the ARM GICV3 ITS driver

     The ITS page needs to be shared with the hypervisor and therefore
     must be decrypted.

   - Small cleanups and fixes all over the place"

* tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  irqchip/riscv-aplic: Prevent crash when MSI domain is missing
  genirq/proc: Use seq_put_decimal_ull_width() for decimal values
  softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
  timers: Use __raise_softirq_irqoff() to raise the softirq.
  hrtimer: Use __raise_softirq_irqoff() to raise the softirq
  riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers
  irqchip: Add T-HEAD C900 ACLINT SSWI driver
  dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device
  irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties
  irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK
  irqchip/mips-gic: Prevent indirect access to clusters without CPU cores
  irqchip/mips-gic: Multi-cluster support
  irqchip/mips-gic: Setup defaults in each cluster
  irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()
  irqchip/mips-gic: Replace open coded online CPU iterations
  genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show()
  genirq/devres: Don't free interrupt which is not managed by devres
  irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
  irqchip/aspeed-intc: Add AST27XX INTC support
  dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC
  ...
2024-11-19 15:54:19 -08:00
Paolo Abeni
dd7207838d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.13 net-next PR.

Conflicts:

include/linux/phy.h
  41ffcd9501 net: phy: fix phylib's dual eee_enabled
  721aa69e70 net: phy: convert eee_broken_modes to a linkmode bitmap
https://lore.kernel.org/all/20241118135512.1039208b@canb.auug.org.au/

drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c
  2160428bcb net: txgbe: fix null pointer to pcs
  2160428bcb net: txgbe: remove GPIO interrupt controller

Adjacent commits:

include/linux/phy.h
  41ffcd9501 net: phy: fix phylib's dual eee_enabled
  516a5f11eb net: phy: respect cached advertising when re-enabling EEE

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-19 13:56:02 +01:00
Shruti Parab
3c2179e663 bnxt_en: Add FW trace coredump segments to the coredump
The FW trace coredump segments are very similar to the context
memory segments in the previous patch.  The main difference is
to call HWRM_DBG_LOG_BUFFER_FLUSH to flush the FW data to host
memory and to include an additional record in the coredump that
contains the head and tail information of the trace data.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-12-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Michael Chan
bda2e63a50 bnxt_en: Add a new ethtool -W dump flag
Add a new ethtool -W dump flag (2) to include driver coredump segments.
This patch adds the host backing store context memory pages used by the
chip and FW to store various states to the coredump.  The pages for
each context memory type is dumped into a separate coredump segment.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Shruti Parab
a854a17097 bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
Pass the component ID and segment ID to this function to create
the coredump segment header.  This will be needed in the next
patches to create more segments for the coredump.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Sreekanth Reddy
23a18b91b6 bnxt_en: Add functions to copy host context memory
Host context memory is used by the newer chips to store context
information for various L2 and RoCE states and FW logs.  This
information will be useful for debugging.  This patch adds the
functions to copy all pages of a context memory type to a contiguous
buffer.  The next patches will include the context memory dump
during ethtool -w coredump.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Co-developed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:55 -08:00
Hongguang Gao
de999362ad bnxt_en: Do not free FW log context memory
If FW supports appending new FW logs to an offset in the context
memory after FW reset, then do not free this type of context memory
during reset.  The driver will provide the initial offset to the FW
when configuring this type of context memory.  This way, we don't lose
the older FW logs after reset.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
84fcd9449f bnxt_en: Manage the FW trace context memory
The FW trace memory pages will be added to the ethtool -w coredump
in later patches.  In addition to the raw data, the driver has to
add a header to provide the head and tail information on each FW
trace log segment when creating the coredump.  The FW sends an async
message to the driver after DMAing a chunk of logs to the context
memory to indicate the last offset containing the tail of the logs.
The driver needs to keep track of that.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
24d694aec1 bnxt_en: Allocate backing store memory for FW trace logs
Allocate the new FW trace log backing store context memory types
if they are supported by the FW.  FW debug logs are DMA'ed to the host
backing store memory when the on-chip buffers are full.  If host
memory cannot be allocated for these memory types, the driver
will not abort.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Hongguang Gao
46010d43ab bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.

This patch always passes true for the 'force' parameter so there is
no change in behavior.  Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Hongguang Gao
968d2cc07c bnxt_en: Refactor bnxt_free_ctx_mem()
Add a new function bnxt_free_one_ctx_mem() to free one context
memory type.  bnxt_free_ctx_mem() now calls the new function in
the loop to free each context memory type.  There is no change in
behavior.  Later patches will further make use of the new function.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
0b350b4927 bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging.  If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-of-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Michael Chan
ff00bcc9ec bnxt_en: Update firmware interface spec to 1.10.3.85
The major change is the new firmware command to flush the FW debug
logs to the host backing store context memory buffers.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Daniel Borkmann
06a34f7db7 wireguard: device: support big tcp GSO
Advertise GSO_MAX_SIZE as TSO max size in order support BIG TCP for wireguard.
This helps to improve wireguard performance a bit when enabled as it allows
wireguard to aggregate larger skbs in wg_packet_consume_data_done() via
napi_gro_receive(), but also allows the stack to build larger skbs on xmit
where the driver then segments them before encryption inside wg_xmit().
We've seen a 15% improvement in TCP stream performance.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Dheeraj Reddy Jonnalagadda
c1822fb64f wireguard: allowedips: remove redundant selftest call
This commit fixes a useless call issue detected by Coverity (CID
1508092). The call to horrible_allowedips_lookup_v4 is unnecessary as
its return value is never checked.

Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com>
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Tobias Klauser
2c862914fb wireguard: device: omit unnecessary memset of netdev private data
The memory for netdev_priv is allocated using kvzalloc in
alloc_netdev_mqs before rtnl_link_ops->setup is called so there is no
need to zero it again in wg_setup.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Dr. David Alan Gilbert
78a36139fc net/fungible: Remove unused fun_create_queue
fun_create_queue was added in 2022 by
commit e1ffcc6681 ("net/fungible: Add service module for Fungible
drivers")
but hasn't been used.

Remove it.

Also remove the static helper functions it was the only user of.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:54:11 -08:00
Kees Cook
1cfb5e5788 Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings"
This reverts commit 3bd9b9abdf. We cannot
use the new tagged struct group because it throws C++ errors even under
"extern C".

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Vitalii Mordan
cc84d89ad8 stmmac: dwmac-intel-plat: remove redundant dwmac->data check in probe
The driver’s compatibility with devices is confirmed earlier in
platform_match(). Since reaching probe means the device is valid,
the extra check can be removed to simplify the code.

Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:49:53 -08:00
Jiawen Wu
2160428bcb net: txgbe: fix null pointer to pcs
For 1000BASE-X or SGMII interface mode, the PCS also need to be selected.
Only return null pointer when there is a copper NIC with external PHY.

Fixes: 02b2a6f91b ("net: txgbe: support copper NIC with external PHY")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20241115073508.1130046-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:45:18 -08:00