69885 Commits

Author SHA1 Message Date
Eric Dumazet
f6e0a4984c net: move dev->state into net_device_read_txrx group
dev->state can be read in rx and tx fast paths.

netif_running() which needs dev->state is called from
- enqueue_to_backlog() [RX path]
- __dev_direct_xmit()  [TX path]

Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19 10:47:47 +01:00
Jakub Kicinski
1c63686799 docs: networking: fix indentation errors in multi-pf-netdev
Stephen reports new warnings in the docs:

Documentation/networking/multi-pf-netdev.rst:94: ERROR: Unexpected indentation.
Documentation/networking/multi-pf-netdev.rst:106: ERROR: Unexpected indentation.

Fixes: 77d9ec3f6c8c ("Documentation: networking: Add description for multi-pf netdev")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/all/20240312153304.0ef1b78e@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240313032329.3919036-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-14 13:19:27 +01:00
Linus Torvalds
9187210eee Networking changes for 6.9.
Core & protocols
 ----------------
 
  - Large effort by Eric to lower rtnl_lock pressure and remove locks:
 
    - Make commonly used parts of rtnetlink (address, route dumps etc.)
      lockless, protected by RCU instead of rtnl_lock.
 
    - Add a netns exit callback which already holds rtnl_lock,
      allowing netns exit to take rtnl_lock once in the core
      instead of once for each driver / callback.
 
    - Remove locks / serialization in the socket diag interface.
 
    - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
 
    - Remove the dev_base_lock, depend on RCU where necessary.
 
  - Support busy polling on a per-epoll context basis. Poll length
    and budget parameters can be set independently of system defaults.
 
  - Introduce struct net_hotdata, to make sure read-mostly global config
    variables fit in as few cache lines as possible.
 
  - Add optional per-nexthop statistics to ease monitoring / debug
    of ECMP imbalance problems.
 
  - Support TCP_NOTSENT_LOWAT in MPTCP.
 
  - Ensure that IPv6 temporary addresses' preferred lifetimes are long
    enough, compared to other configured lifetimes, and at least 2 sec.
 
  - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
 
  - Add support for the independent control state machine for bonding
    per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
    control state machine.
 
  - Add "network ID" to MCTP socket APIs to support hosts with multiple
    disjoint MCTP networks.
 
  - Re-use the mono_delivery_time skbuff bit for packets which user
    space wants to be sent at a specified time. Maintain the timing
    information while traversing veth links, bridge etc.
 
  - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
 
  - Simplify many places iterating over netdevs by using an xarray
    instead of a hash table walk (hash table remains in place, for
    use on fastpaths).
 
  - Speed up scanning for expired routes by keeping a dedicated list.
 
  - Speed up "generic" XDP by trying harder to avoid large allocations.
 
  - Support attaching arbitrary metadata to netconsole messages.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
    VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena).
 
  - Rework selftest harness to enable the use of the full range of
    ksft exit code (pass, fail, skip, xfail, xpass).
 
 Netfilter
 ---------
 
  - Allow userspace to define a table that is exclusively owned by a daemon
    (via netlink socket aliveness) without auto-removing this table when
    the userspace program exits. Such table gets marked as orphaned and
    a restarting management daemon can re-attach/regain ownership.
 
  - Speed up element insertions to nftables' concatenated-ranges set type.
    Compact a few related data structures.
 
 BPF
 ---
 
  - Add BPF token support for delegating a subset of BPF subsystem
    functionality from privileged system-wide daemons such as systemd
    through special mount options for userns-bound BPF fs to a trusted
    & unprivileged application.
 
  - Introduce bpf_arena which is sparse shared memory region between BPF
    program and user space where structures inside the arena can have
    pointers to other areas of the arena, and pointers work seamlessly
    for both user-space programs and BPF programs.
 
  - Introduce may_goto instruction that is a contract between the verifier
    and the program. The verifier allows the program to loop assuming it's
    behaving well, but reserves the right to terminate it.
 
  - Extend the BPF verifier to enable static subprog calls in spin lock
    critical sections.
 
  - Support registration of struct_ops types from modules which helps
    projects like fuse-bpf that seeks to implement a new struct_ops type.
 
  - Add support for retrieval of cookies for perf/kprobe multi links.
 
  - Support arbitrary TCP SYN cookie generation / validation in the TC
    layer with BPF to allow creating SYN flood handling in BPF firewalls.
 
  - Add code generation to inline the bpf_kptr_xchg() helper which
    improves performance when stashing/popping the allocated BPF objects.
 
 Wireless
 --------
 
  - Add SPP (signaling and payload protected) AMSDU support.
 
  - Support wider bandwidth OFDMA, as required for EHT operation.
 
 Driver API
 ----------
 
  - Major overhaul of the Energy Efficient Ethernet internals to support
    new link modes (2.5GE, 5GE), share more code between drivers
    (especially those using phylib), and encourage more uniform behavior.
    Convert and clean up drivers.
 
  - Define an API for querying per netdev queue statistics from drivers.
 
  - IPSec: account in global stats for fully offloaded sessions.
 
  - Create a concept of Ethernet PHY Packages at the Device Tree level,
    to allow parameterizing the existing PHY package code.
 
  - Enable Rx hashing (RSS) on GTP protocol fields.
 
 Misc
 ----
 
  - Improvements and refactoring all over networking selftests.
 
  - Create uniform module aliases for TC classifiers, actions,
    and packet schedulers to simplify creating modprobe policies.
 
  - Address all missing MODULE_DESCRIPTION() warnings in networking.
 
  - Extend the Netlink descriptions in YAML to cover message encapsulation
    or "Netlink polymorphism", where interpretation of nested attributes
    depends on link type, classifier type or some other "class type".
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
    - Intel (100G, ice, idpf):
      - support E825-C devices
    - nVidia/Mellanox:
      - support devices with one port and multiple PCIe links
    - Broadcom (bnxt):
      - support n-tuple filters
      - support configuring the RSS key
    - Wangxun (ngbe/txgbe):
      - implement irq_domain for TXGBE's sub-interrupts
    - Pensando/AMD:
      - support XDP
      - optimize queue submission and wakeup handling (+17% bps)
      - optimize struct layout, saving 28% of memory on queues
 
  - Ethernet NICs embedded and virtual:
    - Google cloud vNIC:
      - refactor driver to perform memory allocations for new queue
        config before stopping and freeing the old queue memory
    - Synopsys (stmmac):
      - obey queueMaxSDU and implement counters required by 802.1Qbv
    - Renesas (ravb):
      - support packet checksum offload
      - suspend to RAM and runtime PM support
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - support for nexthop group statistics
    - Microchip:
      - ksz8: implement PHY loopback
      - add support for KSZ8567, a 7-port 10/100Mbps switch
 
  - PTP:
    - New driver for RENESAS FemtoClock3 Wireless clock generator.
    - Support OCP PTP cards designed and built by Adva.
 
  - CAN:
    - Support recvmsg() flags for own, local and remote traffic
      on CAN BCM sockets.
    - Support for esd GmbH PCIe/402 CAN device family.
    - m_can:
      - Rx/Tx submission coalescing
      - wake on frame Rx
 
  - WiFi:
    - Intel (iwlwifi):
      - enable signaling and payload protected A-MSDUs
      - support wider-bandwidth OFDMA
      - support for new devices
      - bump FW API to 89 for AX devices; 90 for BZ/SC devices
    - MediaTek (mt76):
      - mt7915: newer ADIE version support
      - mt7925: radio temperature sensor support
    - Qualcomm (ath11k):
      - support 6 GHz station power modes: Low Power Indoor (LPI),
        Standard Power) SP and Very Low Power (VLP)
      - QCA6390 & WCN6855: support 2 concurrent station interfaces
      - QCA2066 support
    - Qualcomm (ath12k):
      - refactoring in preparation for Multi-Link Operation (MLO) support
      - 1024 Block Ack window size support
      - firmware-2.bin support
      - support having multiple identical PCI devices (firmware needs to
        have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
      - QCN9274: support split-PHY devices
      - WCN7850: enable Power Save Mode in station mode
      - WCN7850: P2P support
    - RealTek:
      - rtw88: support for more rtw8811cu and rtw8821cu devices
      - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
      - rtlwifi: speed up USB firmware initialization
      - rtwl8xxxu:
        - RTL8188F: concurrent interface support
        - Channel Switch Announcement (CSA) support in AP mode
    - Broadcom (brcmfmac):
      - per-vendor feature support
      - per-vendor SAE password setup
      - DMI nvram filename quirk for ACEPC W5 Pro
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXv0mgACgkQMUZtbf5S
 IrtgMxAAuRd+WJW++SENr4KxIWhYO1q6Xcxnai43wrNkan9swD24icG8TYALt4f3
 yoT6idQvWReAb5JNlh9rUQz8R7E0nJXlvEFn5MtJwcthx2C6wFo/XkJlddlRrT+j
 c2xGILwLjRhW65LaC0MZ2ECbEERkFz8xcGfK2SWzUgh6KYvPjcRfKFxugpM7xOQK
 P/Wnqhs4fVRS/Mj/bCcXcO+yhwC121Q3qVeQVjGS0AzEC65hAW87a/kc2BfgcegD
 EyI9R7mf6criQwX+0awubjfoIdr4oW/8oDVNvUDczkJkbaEVaLMQk9P5x/0XnnVS
 UHUchWXyI80Q8Rj12uN1/I0h3WtwNQnCRBuLSmtm6GLfCAwbLvp2nGWDnaXiqryW
 DVKUIHGvqPKjkOOMOVfSvfB3LvkS3xsFVVYiQBQCn0YSs/gtu4CoF2Nty9CiLPbK
 tTuxUnLdPDZDxU//l0VArZmP8p2JM7XQGJ+JH8GFH4SBTyBR23e0iyPSoyaxjnYn
 RReDnHMVsrS1i7GPhbqDJWn+uqMSs7N149i0XmmyeqwQHUVSJN3J2BApP2nCaDfy
 H2lTuYly5FfEezt61NvCE4qr/VsWeEjm1fYlFQ9dFn4pGn+HghyCpw+xD1ZN56DN
 lujemau5B3kk1UTtAT4ypPqvuqjkRFqpNV2LzsJSk/Js+hApw8Y=
 =oY52
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Large effort by Eric to lower rtnl_lock pressure and remove locks:

      - Make commonly used parts of rtnetlink (address, route dumps
        etc) lockless, protected by RCU instead of rtnl_lock.

      - Add a netns exit callback which already holds rtnl_lock,
        allowing netns exit to take rtnl_lock once in the core instead
        of once for each driver / callback.

      - Remove locks / serialization in the socket diag interface.

      - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.

      - Remove the dev_base_lock, depend on RCU where necessary.

   - Support busy polling on a per-epoll context basis. Poll length and
     budget parameters can be set independently of system defaults.

   - Introduce struct net_hotdata, to make sure read-mostly global
     config variables fit in as few cache lines as possible.

   - Add optional per-nexthop statistics to ease monitoring / debug of
     ECMP imbalance problems.

   - Support TCP_NOTSENT_LOWAT in MPTCP.

   - Ensure that IPv6 temporary addresses' preferred lifetimes are long
     enough, compared to other configured lifetimes, and at least 2 sec.

   - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.

   - Add support for the independent control state machine for bonding
     per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
     control state machine.

   - Add "network ID" to MCTP socket APIs to support hosts with multiple
     disjoint MCTP networks.

   - Re-use the mono_delivery_time skbuff bit for packets which user
     space wants to be sent at a specified time. Maintain the timing
     information while traversing veth links, bridge etc.

   - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.

   - Simplify many places iterating over netdevs by using an xarray
     instead of a hash table walk (hash table remains in place, for use
     on fastpaths).

   - Speed up scanning for expired routes by keeping a dedicated list.

   - Speed up "generic" XDP by trying harder to avoid large allocations.

   - Support attaching arbitrary metadata to netconsole messages.

  Things we sprinkled into general kernel code:

   - Enforce VM_IOREMAP flag and range in ioremap_page_range and
     introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
     bpf_arena).

   - Rework selftest harness to enable the use of the full range of ksft
     exit code (pass, fail, skip, xfail, xpass).

  Netfilter:

   - Allow userspace to define a table that is exclusively owned by a
     daemon (via netlink socket aliveness) without auto-removing this
     table when the userspace program exits. Such table gets marked as
     orphaned and a restarting management daemon can re-attach/regain
     ownership.

   - Speed up element insertions to nftables' concatenated-ranges set
     type. Compact a few related data structures.

  BPF:

   - Add BPF token support for delegating a subset of BPF subsystem
     functionality from privileged system-wide daemons such as systemd
     through special mount options for userns-bound BPF fs to a trusted
     & unprivileged application.

   - Introduce bpf_arena which is sparse shared memory region between
     BPF program and user space where structures inside the arena can
     have pointers to other areas of the arena, and pointers work
     seamlessly for both user-space programs and BPF programs.

   - Introduce may_goto instruction that is a contract between the
     verifier and the program. The verifier allows the program to loop
     assuming it's behaving well, but reserves the right to terminate
     it.

   - Extend the BPF verifier to enable static subprog calls in spin lock
     critical sections.

   - Support registration of struct_ops types from modules which helps
     projects like fuse-bpf that seeks to implement a new struct_ops
     type.

   - Add support for retrieval of cookies for perf/kprobe multi links.

   - Support arbitrary TCP SYN cookie generation / validation in the TC
     layer with BPF to allow creating SYN flood handling in BPF
     firewalls.

   - Add code generation to inline the bpf_kptr_xchg() helper which
     improves performance when stashing/popping the allocated BPF
     objects.

  Wireless:

   - Add SPP (signaling and payload protected) AMSDU support.

   - Support wider bandwidth OFDMA, as required for EHT operation.

  Driver API:

   - Major overhaul of the Energy Efficient Ethernet internals to
     support new link modes (2.5GE, 5GE), share more code between
     drivers (especially those using phylib), and encourage more
     uniform behavior. Convert and clean up drivers.

   - Define an API for querying per netdev queue statistics from
     drivers.

   - IPSec: account in global stats for fully offloaded sessions.

   - Create a concept of Ethernet PHY Packages at the Device Tree level,
     to allow parameterizing the existing PHY package code.

   - Enable Rx hashing (RSS) on GTP protocol fields.

  Misc:

   - Improvements and refactoring all over networking selftests.

   - Create uniform module aliases for TC classifiers, actions, and
     packet schedulers to simplify creating modprobe policies.

   - Address all missing MODULE_DESCRIPTION() warnings in networking.

   - Extend the Netlink descriptions in YAML to cover message
     encapsulation or "Netlink polymorphism", where interpretation of
     nested attributes depends on link type, classifier type or some
     other "class type".

  Drivers:

   - Ethernet high-speed NICs:
      - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
      - Intel (100G, ice, idpf):
         - support E825-C devices
      - nVidia/Mellanox:
         - support devices with one port and multiple PCIe links
      - Broadcom (bnxt):
         - support n-tuple filters
         - support configuring the RSS key
      - Wangxun (ngbe/txgbe):
         - implement irq_domain for TXGBE's sub-interrupts
      - Pensando/AMD:
         - support XDP
         - optimize queue submission and wakeup handling (+17% bps)
         - optimize struct layout, saving 28% of memory on queues

   - Ethernet NICs embedded and virtual:
      - Google cloud vNIC:
         - refactor driver to perform memory allocations for new queue
           config before stopping and freeing the old queue memory
      - Synopsys (stmmac):
         - obey queueMaxSDU and implement counters required by 802.1Qbv
      - Renesas (ravb):
         - support packet checksum offload
         - suspend to RAM and runtime PM support

   - Ethernet switches:
      - nVidia/Mellanox:
         - support for nexthop group statistics
      - Microchip:
         - ksz8: implement PHY loopback
         - add support for KSZ8567, a 7-port 10/100Mbps switch

   - PTP:
      - New driver for RENESAS FemtoClock3 Wireless clock generator.
      - Support OCP PTP cards designed and built by Adva.

   - CAN:
      - Support recvmsg() flags for own, local and remote traffic on CAN
        BCM sockets.
      - Support for esd GmbH PCIe/402 CAN device family.
      - m_can:
         - Rx/Tx submission coalescing
         - wake on frame Rx

   - WiFi:
      - Intel (iwlwifi):
         - enable signaling and payload protected A-MSDUs
         - support wider-bandwidth OFDMA
         - support for new devices
         - bump FW API to 89 for AX devices; 90 for BZ/SC devices
      - MediaTek (mt76):
         - mt7915: newer ADIE version support
         - mt7925: radio temperature sensor support
      - Qualcomm (ath11k):
         - support 6 GHz station power modes: Low Power Indoor (LPI),
           Standard Power) SP and Very Low Power (VLP)
         - QCA6390 & WCN6855: support 2 concurrent station interfaces
         - QCA2066 support
      - Qualcomm (ath12k):
         - refactoring in preparation for Multi-Link Operation (MLO)
           support
         - 1024 Block Ack window size support
         - firmware-2.bin support
         - support having multiple identical PCI devices (firmware needs
           to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
         - QCN9274: support split-PHY devices
         - WCN7850: enable Power Save Mode in station mode
         - WCN7850: P2P support
      - RealTek:
         - rtw88: support for more rtw8811cu and rtw8821cu devices
         - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
         - rtlwifi: speed up USB firmware initialization
         - rtwl8xxxu:
             - RTL8188F: concurrent interface support
             - Channel Switch Announcement (CSA) support in AP mode
      - Broadcom (brcmfmac):
         - per-vendor feature support
         - per-vendor SAE password setup
         - DMI nvram filename quirk for ACEPC W5 Pro"

* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
  nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
  nexthop: Fix out-of-bounds access during attribute validation
  nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
  nexthop: Only parse NHA_OP_FLAGS for get messages that require it
  bpf: move sleepable flag from bpf_prog_aux to bpf_prog
  bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
  selftests/bpf: Add kprobe multi triggering benchmarks
  ptp: Move from simple ida to xarray
  vxlan: Remove generic .ndo_get_stats64
  vxlan: Do not alloc tstats manually
  devlink: Add comments to use netlink gen tool
  nfp: flower: handle acti_netdevs allocation failure
  net/packet: Add getsockopt support for PACKET_COPY_THRESH
  net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
  selftests/bpf: Add bpf_arena_htab test.
  selftests/bpf: Add bpf_arena_list test.
  selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
  bpf: Add helper macro bpf_addr_space_cast()
  libbpf: Recognize __arena global variables.
  bpftool: Recognize arena map type
  ...
2024-03-12 17:44:08 -07:00
Linus Torvalds
1f44039766 A moderatly busy cycle for development this time around.
- Some cleanup of the main index page for easier navigation
 
 - Rework some of the other top-level pages for better readability and, with
   luck, fewer merge conflicts in the future.
 
 - Submit-checklist improvements, hopefully the first of many.
 
 - New Italian translations
 
 - A fair number of kernel-doc fixes and improvements.  We have also dropped
   the recommendation to use an old version of Sphinx.
 
 - A new document from Thorsten on bisection
 
 ...and lots of fixes and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmXvKVIACgkQF0NaE2wM
 flik1gf/ZFS1mHwDdmHA/vpx8UxdUlFEo0Pms8V24iPSW5aEIqkZ406c9DSyMTtp
 CXTzW+RSCfB1Q3ciYtakHBgv0RzZ5+RyaEZ1l7zVmMyw4nYvK6giYKmg8Y0EVPKI
 fAVuPWo5iE7io0sNVbKBKJJkj9Z8QEScM48hv/CV1FblMvHYn0lie6muJrF9G6Ez
 HND+hlYZtWkbRd5M86CDBiFeGMLVPx17T+psQyQIcbUYm9b+RUqZRHIVRLYbad7r
 18r9+83DsOhXTVJCBBSfCSZwzF8yAm+eD1w47sxnSItF8OiIjqCzQgXs3BZe9TXH
 h2YyeWbMN3xByA4mEgpmOPP44RW7Pg==
 =SC60
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A moderatly busy cycle for development this time around.

   - Some cleanup of the main index page for easier navigation

   - Rework some of the other top-level pages for better readability
     and, with luck, fewer merge conflicts in the future.

   - Submit-checklist improvements, hopefully the first of many.

   - New Italian translations

   - A fair number of kernel-doc fixes and improvements. We have also
     dropped the recommendation to use an old version of Sphinx.

   - A new document from Thorsten on bisection

  ... and lots of fixes and updates"

* tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
  docs: verify/bisect: fixes, finetuning, and support for Arch
  docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
  docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
  docs: submit-checklist: use subheadings
  docs: submit-checklist: structure by category
  docs: new text on bisecting which also covers bug validation
  docs: drop the version constraints for sphinx and dependencies
  docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
  docs: Restore "smart quotes" for quotes
  docs/zh_CN: accurate translation of "function"
  docs: Include simplified link titles in main index
  docs: Correct formatting of title in admin-guide/index.rst
  docs: kernel_feat.py: fix build error for missing files
  MAINTAINERS: Set the field name for subsystem profile section
  kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
  Fixed case issue with 'fault-injection' in documentation
  kernel-doc: handle #if in enums as well
  Documentation: update mailing list addresses
  doc: kerneldoc.py: fix indentation
  scripts/kernel-doc: simplify signature printing
  ...
2024-03-12 15:18:34 -07:00
Linus Torvalds
216532e147 hardening updates for v6.9-rc1
- string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko)
 
 - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit
   Mogalapalli)
 
 - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael
   Ellerman)
 
 - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)
 
 - Handle tail call optimization better in LKDTM (Douglas Anderson)
 
 - Use long form types in overflow.h (Andy Shevchenko)
 
 - Add flags param to string_get_size() (Andy Shevchenko)
 
 - Add Coccinelle script for potential struct_size() use (Jacob Keller)
 
 - Fix objtool corner case under KCFI (Josh Poimboeuf)
 
 - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)
 
 - Add str_plural() helper (Michal Wajdeczko, Kees Cook)
 
 - Ignore relocations in .notes section
 
 - Add comments to explain how __is_constexpr() works
 
 - Fix m68k stack alignment expectations in stackinit Kunit test
 
 - Convert string selftests to KUnit
 
 - Add KUnit tests for fortified string functions
 
 - Improve reporting during fortified string warnings
 
 - Allow non-type arg to type_max() and type_min()
 
 - Allow strscpy() to be called with only 2 arguments
 
 - Add binary mode to leaking_addresses scanner
 
 - Various small cleanups to leaking_addresses scanner
 
 - Adding wrapping_*() arithmetic helper
 
 - Annotate initial signed integer wrap-around in refcount_t
 
 - Add explicit UBSAN section to MAINTAINERS
 
 - Fix UBSAN self-test warnings
 
 - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
 
 - Reintroduce UBSAN's signed overflow sanitizer
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvm5kWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJiQqD/4mM6SWZpYHKlR1nEiqIyz7Hqr9
 g4oguuw6HIVNJXLyeBI5Hd43CTeHPA0e++EETqhUAt7HhErxfYJY+JB221nRYmu+
 zhhQ7N/xbTMV/Je7AR03kQjhiMm8LyEcM2X4BNrsAcoCieQzmO3g0zSp8ISzLUE0
 PEEmf1lOzMe3gK2KOFCPt5Hiz9sGWyN6at+BQubY18tQGtjEXYAQNXkpD5qhGn4a
 EF693r/17wmc8hvSsjf4AGaWy1k8crG0WfpMCZsaqftjj0BbvOC60IDyx4eFjpcy
 tGyAJKETq161AkCdNweIh2Q107fG3tm0fcvw2dv8Wt1eQCko6M8dUGCBinQs/thh
 TexjJFS/XbSz+IvxLqgU+C5qkOP23E0M9m1dbIbOFxJAya/5n16WOBlGr3ae2Wdq
 /+t8wVSJw3vZiku5emWdFYP1VsdIHUjVa5QizFaaRhzLGRwhxVV49SP4IQC/5oM5
 3MAgNOFTP6yRQn9Y9wP+SZs+SsfaIE7yfKa9zOi4S+Ve+LI2v4YFhh8NCRiLkeWZ
 R1dhp8Pgtuq76f/v0qUaWcuuVeGfJ37M31KOGIhi1sI/3sr7UMrngL8D1+F8UZMi
 zcLu+x4GtfUZCHl6znx1rNUBqE5S/5ndVhLpOqfCXKaQ+RAm7lkOJ3jXE2VhNkhp
 yVEmeSOLnlCaQjZvXQ==
 =OP+o
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "As is pretty normal for this tree, there are changes all over the
  place, especially for small fixes, selftest improvements, and improved
  macro usability.

  Some header changes ended up landing via this tree as they depended on
  the string header cleanups. Also, a notable set of changes is the work
  for the reintroduction of the UBSAN signed integer overflow sanitizer
  so that we can continue to make improvements on the compiler side to
  make this sanitizer a more viable future security hardening option.

  Summary:

   - string.h and related header cleanups (Tanzir Hasan, Andy
     Shevchenko)

   - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
     Harshit Mogalapalli)

   - selftests/powerpc: Fix load_unaligned_zeropad build failure
     (Michael Ellerman)

   - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)

   - Handle tail call optimization better in LKDTM (Douglas Anderson)

   - Use long form types in overflow.h (Andy Shevchenko)

   - Add flags param to string_get_size() (Andy Shevchenko)

   - Add Coccinelle script for potential struct_size() use (Jacob
     Keller)

   - Fix objtool corner case under KCFI (Josh Poimboeuf)

   - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)

   - Add str_plural() helper (Michal Wajdeczko, Kees Cook)

   - Ignore relocations in .notes section

   - Add comments to explain how __is_constexpr() works

   - Fix m68k stack alignment expectations in stackinit Kunit test

   - Convert string selftests to KUnit

   - Add KUnit tests for fortified string functions

   - Improve reporting during fortified string warnings

   - Allow non-type arg to type_max() and type_min()

   - Allow strscpy() to be called with only 2 arguments

   - Add binary mode to leaking_addresses scanner

   - Various small cleanups to leaking_addresses scanner

   - Adding wrapping_*() arithmetic helper

   - Annotate initial signed integer wrap-around in refcount_t

   - Add explicit UBSAN section to MAINTAINERS

   - Fix UBSAN self-test warnings

   - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL

   - Reintroduce UBSAN's signed overflow sanitizer"

* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
  selftests/powerpc: Fix load_unaligned_zeropad build failure
  string: Convert helpers selftest to KUnit
  string: Convert selftest to KUnit
  sh: Fix build with CONFIG_UBSAN=y
  compiler.h: Explain how __is_constexpr() works
  overflow: Allow non-type arg to type_max() and type_min()
  VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
  lib/string_helpers: Add flags param to string_get_size()
  x86, relocs: Ignore relocations in .notes section
  objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
  overflow: Use POD in check_shl_overflow()
  lib: stackinit: Adjust target string to 8 bytes for m68k
  sparc: vdso: Disable UBSAN instrumentation
  kernel.h: Move lib/cmdline.c prototypes to string.h
  leaking_addresses: Provide mechanism to scan binary files
  leaking_addresses: Ignore input device status lines
  leaking_addresses: Use File::Temp for /tmp files
  MAINTAINERS: Update LEAKING_ADDRESSES details
  fortify: Improve buffer overflow reporting
  fortify: Add KUnit tests for runtime overflows
  ...
2024-03-12 14:49:30 -07:00
Linus Torvalds
3bf95d567d fscrypt updates for 6.9
Fix flakiness in a test by releasing the quota synchronously when a key
 is removed, and other minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZe/STxQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyVAAQCJQr5l3fU+rm1FVpuVg8q/pbPdi5wJ
 N31pYFvY3AehtQEArdPNtBbXW3V7i9OL6CDmesuNtGr3Il5KRV1h89yyYgY=
 =RGab
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt updates from Eric Biggers:
 "Fix flakiness in a test by releasing the quota synchronously when a
  key is removed, and other minor cleanups"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: shrink the size of struct fscrypt_inode_info slightly
  fscrypt: write CBC-CTS instead of CTS-CBC
  fscrypt: clear keyring before calling key_put()
  fscrypt: explicitly require that inode->i_blkbits be set
2024-03-12 13:17:36 -07:00
Linus Torvalds
2184dbcde4 ARM: SoC drivers for 6.9
This is the usual mix of updates for drivers that are used on (mostly
 ARM) SoCs with no other top-level subsystem tree, including:
 
  - The SCMI firmware subsystem gains support for version 3.2 of the
    specification and updates to the notification code.
 
  - Feature updates for Tegra and Qualcomm platforms for added
    hardware support.
 
  - A number of platforms get soc_device additions for identifying newly
    added chips from Renesas, Qualcomm, Mediatek and Google.
 
  - Trivial improvements for firmware and memory drivers amongst
    others, in particular 'const' annotations throughout multiple
    subsystems.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXvgbsACgkQYKtH/8kJ
 UieH8Q/+LRzESrScIwFq0/V7lE1AadmhwMwcEf1Fsq8aMrelvPm/SWvHgIWIHTvV
 IZ/g3XS/CnBxr1JG3nbyMMe/2otEY7JxsUOOqixIuZ2gdzJvzZOBHMi54xDwbFRx
 4NbP0CRTy8K35XNnOkJO3TnwBFP+q2Fu6qHY90as8M2GIxQpWb8OONJHh8N2qPq+
 Hi3H0jjKXMInnOKpNIEQI60N4F2djGMHWkDySwFtHu40RaJjCIfmVd3PWQGz7RHl
 WQHjZ6CB+/BDgqfG0ccQ7Cikc4BLorZsjKCn8bsaLtdp4HvRCTp2ZpuFFTRq6vay
 IxqJCXrgpKjM1k9plehObEhMv4lNMbD1djG8Y6hqC+PPKbDfOLvlcat3xUK2AGgb
 ROJtKDQMXfAeSnLpw9n4Ox+BZRmwMIOcTU/20N72hlcZKY1jq/KuSqQn+LPVKIrW
 pJIhWd1B8R+2O1TewuIe3fjvfQwgATMBHBUVNRkSrzqkpcZNGQ3M5koMpClVvY6T
 Z/+hdAg58EQw0K6ukJLyrevxs1pHHhYXLCECIoU/xPs4NX4hDk7rKTFv6fdLS4Y2
 24qzjhIGYdhRXmhRQdVq+06cr3cvtm1z7Fqna3tW1+J6wtBnHO/xZ63M9n5saPcm
 NgKMAN7YLLMYuUNrd39W7U2wLGQCgknjhrbH8ZmxPypk467v08k=
 =bV/K
 -----END PGP SIGNATURE-----

Merge tag 'soc-drivers-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "This is the usual mix of updates for drivers that are used on (mostly
  ARM) SoCs with no other top-level subsystem tree, including:

   - The SCMI firmware subsystem gains support for version 3.2 of the
     specification and updates to the notification code

   - Feature updates for Tegra and Qualcomm platforms for added hardware
     support

   - A number of platforms get soc_device additions for identifying
     newly added chips from Renesas, Qualcomm, Mediatek and Google

   - Trivial improvements for firmware and memory drivers amongst
     others, in particular 'const' annotations throughout multiple
     subsystems"

* tag 'soc-drivers-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (96 commits)
  tee: make tee_bus_type const
  soc: qcom: aoss: add missing kerneldoc for qmp members
  soc: qcom: geni-se: drop unused kerneldoc struct geni_wrapper param
  soc: qcom: spm: fix building with CONFIG_REGULATOR=n
  bus: ti-sysc: constify the struct device_type usage
  memory: stm32-fmc2-ebi: keep power domain on
  memory: stm32-fmc2-ebi: add MP25 RIF support
  memory: stm32-fmc2-ebi: add MP25 support
  memory: stm32-fmc2-ebi: check regmap_read return value
  dt-bindings: memory-controller: st,stm32: add MP25 support
  dt-bindings: bus: imx-weim: convert to YAML
  watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs
  soc: samsung: exynos-pmu: Add regmap support for SoCs that protect PMU regs
  MAINTAINERS: Update SCMI entry with HWMON driver
  MAINTAINERS: samsung: gs101: match patches touching Google Tensor SoC
  memory: tegra: Fix indentation
  memory: tegra: Add BPMP and ICC info for DLA clients
  memory: tegra: Correct DLA client names
  dt-bindings: memory: renesas,rpc-if: Document R-Car V4M support
  firmware: arm_scmi: Update the supported clock protocol version
  ...
2024-03-12 10:35:24 -07:00
Linus Torvalds
306bee64b7 SoC: device tree updates for 6.9
There is very little going on with new SoC support this time, all the
 new chips are variations of others that we already support, and they
 are all based on ARMv8 cores:
 
  - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are
    networking SoCs designed to be used in wireless routers, similar
    to the already supported MT7986A (Filogic 830).
 
  - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less.
    These are used in many embedded and industrial applications.
 
  - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M)
    are automotive SoCs.
 
  - TI J722S is another automotive variant of its K3 family,
    related to the AM62 series.
 
 There are a total of 7 new arm32 machines and 45 arm64 ones, including
 
  - Two Android phones based on the old Tegra30 chip
 
  - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and
    a SoM development board
 
  - A set-top box using Amlogic Meson G12A S905X2
 
  - Eight embedded board using NXP i.MX6/8/9
 
  - Three machines using Mediatek network router chips
 
  - Ten Chromebooks, all based on Mediatek MT8186
 
  - One development board based on Mediatek MT8395 (Genio 1200)
 
  - Seven tablets and phones based on Qualcomm SoCs, most of them
    from Samsung.
 
  - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2)
 
  - Three variants of the "White Hawk" board for Renesas
    automotive SoCs
 
  - Ten Rockchips RK35xx based machines, including NAS, Tablet,
    Game console and industrial form factors.
 
  - Three evaluation boards for TI K3 based SoCs
 
 The other changes are mainly the usual feature additions for existing hardware,
 cleanups, and dtc compile time fixes. One notable change is the inclusion
 of PowerVR SGX GPU nodes on TI SoCs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXvLwQACgkQYKtH/8kJ
 Uidkhw/+LjDOIqF8f4+6TBCCS3pFAVSAZxKxlm7L4VhsVOOeGZdspOY57eKZJWqW
 bVqj+B22UjJSw/9LOrFBNApkV8vk+rR7UfJjzijXM34WB80DC8+s7DbenCHagqR8
 fsKCB4tHKTYbBk6EefzyWy7fSA1SFu7hpTg5qWK8XONbGdHnkhbj1aQDbUe7p961
 huKGM+2spO+bFs3ljHGymBWywFKtuMTmVzoq16mBZl/bnuIKobm7W2kF+n3NAo+h
 CMta6J9mBlinBT+VtIg2Xax+KvkjmoitevOmyURxp/33+14A64dafI+RLiSyeqb6
 DfeAp9ptrBbVGzYZq2r07WYX9AIBdD2hvdkrtrjOy6JPqtJpWdfA4slYzWCzZfOz
 O08sV3l7ERggpNkMcTWiwBiuB/y5Hci7SYVeQm8N8bp5PydgNpoo6kNVpnc1e6ri
 Ug8t/jQYvpkCVHT3ld8PmgpWoZRinKIe6PNmqdg5jUu8aH+m4TNNmHyA2IjBcovj
 006FBBGVKp4HlCrGz4t9/XsmKzt+cRxLaX06duoZ93FQknXSzs7j7UDkPhpR07kF
 yEHjETnfhziyONL2fHZ+ejBoK/9psTFtzbpgMreBJ0mFZM0yvL0c+gcMvDgDD8ho
 PCp2ohDYpKPoklrTqMLKM7Yjev5bTOdrAJeWoLDWCbgkzVDkyjw=
 =krkR
 -----END PGP SIGNATURE-----

Merge tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC device tree updates from Arnd Bergmann:
 "There is very little going on with new SoC support this time, all the
  new chips are variations of others that we already support, and they
  are all based on ARMv8 cores:

   - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are
     networking SoCs designed to be used in wireless routers, similar to
     the already supported MT7986A (Filogic 830).

   - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less.
     These are used in many embedded and industrial applications.

   - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M) are
     automotive SoCs.

   - TI J722S is another automotive variant of its K3 family, related to
     the AM62 series.

  There are a total of 7 new arm32 machines and 45 arm64 ones, including

   - Two Android phones based on the old Tegra30 chip

   - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and a
     SoM development board

   - A set-top box using Amlogic Meson G12A S905X2

   - Eight embedded board using NXP i.MX6/8/9

   - Three machines using Mediatek network router chips

   - Ten Chromebooks, all based on Mediatek MT8186

   - One development board based on Mediatek MT8395 (Genio 1200)

   - Seven tablets and phones based on Qualcomm SoCs, most of them from
     Samsung.

   - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2)

   - Three variants of the "White Hawk" board for Renesas automotive
     SoCs

   - Ten Rockchips RK35xx based machines, including NAS, Tablet, Game
     console and industrial form factors.

   - Three evaluation boards for TI K3 based SoCs

  The other changes are mainly the usual feature additions for existing
  hardware, cleanups, and dtc compile time fixes. One notable change is
  the inclusion of PowerVR SGX GPU nodes on TI SoCs"

* tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (824 commits)
  riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig
  riscv: dts: starfive: jh7100: fix root clock names
  ARM: dts: samsung: exynos4412: decrease memory to account for unusable region
  arm64: dts: qcom: sm8250-xiaomi-elish: set rotation
  arm64: dts: qcom: sm8650: Fix SPMI channels size
  arm64: dts: qcom: sm8550: Fix SPMI channels size
  arm64: dts: rockchip: Fix name for UART pin header on qnap-ts433
  arm: dts: marvell: clearfog-gtr-l8: align port numbers with enclosure
  arm: dts: marvell: clearfog-gtr-l8: add support for second sfp connector
  dt-bindings: soc: renesas: renesas-soc: Add pattern for gray-hawk
  dtc: Enable dtc interrupt_provider check
  arm64: dts: st: add video encoder support to stm32mp255
  arm64: dts: st: add video decoder support to stm32mp255
  ARM: dts: stm32: enable crypto accelerator on stm32mp135f-dk
  ARM: dts: stm32: enable CRC on stm32mp135f-dk
  ARM: dts: stm32: add CRC on stm32mp131
  ARM: dts: add stm32f769-disco-mb1166-reva09
  ARM: dts: stm32: add display support on stm32f769-disco
  ARM: dts: stm32: rename mmc_vcard to vcc-3v3 on stm32f769-disco
  ARM: dts: stm32: add DSI support on stm32f769
  ...
2024-03-12 10:29:57 -07:00
Linus Torvalds
b29f377119 x86/boot changes for v6.9:
- Continuing work by Ard Biesheuvel to improve the x86 early startup code,
    with the long-term goal to make it position independent:
 
       - Get rid of early accesses to global objects, either by moving them
         to the stack, deferring the access until later, or dropping the
         globals entirely.
 
       - Move all code that runs early via the 1:1 mapping into .head.text,
         and move code that does not out of it, so that build time checks can
         be added later to ensure that no inadvertent absolute references were
         emitted into code that does not tolerate them.
 
       - Remove fixup_pointer() and occurrences of __pa_symbol(), which rely
         on the compiler emitting absolute references, which is not guaranteed.
 
  - Improve the early console code.
 
  - Add early console message about ignored NMIs, so that users are at least
    warned about their existence - even if we cannot do anything about them.
 
  - Improve the kexec code's kernel load address handling.
 
  - Enable more X86S (simplified x86) bits.
 
  - Simplify early boot GDT handling
 
  - Micro-optimize the boot code a bit
 
  - Misc cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXwIg8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jVHg//bzqXyzhoppEP4QMPVEHQdhy3UN33djwF
 HsjNgw/V1P5O5CPvQehCOgrJOcQ8LLPSA68ugG7FY9mzBjvnGnINXzWzukaaQGTh
 EXIwz/uw2++m3JMDt2PAzfeNZ8LlHb8V2xgexfkBFE7O3BX6ThIg9BKaFH1n7XOY
 AQXRRxlB5YThS3Rcqqeo/jN9bQZn7crqeWVS5Dk0bL1f53Y8SJjKIA4mHUb4xjbo
 LX0Z61G9Qz5e26U1U89tloW82zmiD/pvvuIQUnVVtPVMhSoFKhrxYI9MTPLjj0vt
 p+5UwMutFdJyjbTIsito7YSE6OG6RA2d1uoQjTQCx0sr6NtABbDE5QrciQTfHRGa
 1TyScbineiCf3GtQMuDRAKTbaUzWlUzmk9SrpUxK8UR+R6xVvA4GElUUvGe0/dKh
 QnYD+i6wr71S80t3gHqbBGcs4xjUS5rmpTXJ86VPp9hHB+l/2tvBnNro1JNxM/Ei
 wchQLHbaeWwztnceaGOWlsfAln0prtIYvVOUeTbn6rUFTjgSE2kS2h6GD3h3ZVnM
 az5G+bhjWm6eDL6QoBN6XsZ1UF0O7hcjOa2UpS8N1ek0b4E/LVwtMnmpexM09ehE
 FoBAsxYy5SuGCYab636rMmAmHwRjDozwNNJG+6RrrAYwqoQDqKiSnIismJwcOEKD
 6UzK/KBwxuI=
 =zvw3
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

 - Continuing work by Ard Biesheuvel to improve the x86 early startup
   code, with the long-term goal to make it position independent:

      - Get rid of early accesses to global objects, either by moving
        them to the stack, deferring the access until later, or dropping
        the globals entirely

      - Move all code that runs early via the 1:1 mapping into
        .head.text, and move code that does not out of it, so that build
        time checks can be added later to ensure that no inadvertent
        absolute references were emitted into code that does not
        tolerate them

      - Remove fixup_pointer() and occurrences of __pa_symbol(), which
        rely on the compiler emitting absolute references, which is not
        guaranteed

 - Improve the early console code

 - Add early console message about ignored NMIs, so that users are at
   least warned about their existence - even if we cannot do anything
   about them

 - Improve the kexec code's kernel load address handling

 - Enable more X86S (simplified x86) bits

 - Simplify early boot GDT handling

 - Micro-optimize the boot code a bit

 - Misc cleanups

* tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86/sev: Move early startup code into .head.text section
  x86/sme: Move early SME kernel encryption handling into .head.text
  x86/boot: Move mem_encrypt= parsing to the decompressor
  efi/libstub: Add generic support for parsing mem_encrypt=
  x86/startup_64: Simplify virtual switch on primary boot
  x86/startup_64: Simplify calculation of initial page table address
  x86/startup_64: Defer assignment of 5-level paging global variables
  x86/startup_64: Simplify CR4 handling in startup code
  x86/boot: Use 32-bit XOR to clear registers
  efi/x86: Set the PE/COFF header's NX compat flag unconditionally
  x86/boot/64: Load the final kernel GDT during early boot directly, remove startup_gdt[]
  x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]
  x86/boot/64: Use RIP_REL_REF() to access early page tables
  x86/boot/64: Use RIP_REL_REF() to access '__supported_pte_mask'
  x86/boot/64: Use RIP_REL_REF() to access early_dynamic_pgts[]
  x86/boot/64: Use RIP_REL_REF() to assign 'phys_base'
  x86/boot/64: Simplify global variable accesses in GDT/IDT programming
  x86/trampoline: Bypass compat mode in trampoline_start64() if not needed
  kexec: Allocate kernel above bzImage's pref_address
  x86/boot: Add a message about ignored early NMIs
  ...
2024-03-12 09:58:57 -07:00
Linus Torvalds
0e33cf955f * Mitigate RFDS vulnerability
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmXvZgoACgkQaDWVMHDJ
 krC2Eg//aZKBp97/DSzRqXKDwJzVUr0sGJ9cii0gVT1sI+1U6ZZCh/roVH4xOT5/
 HqtOOnQ+X0mwUx2VG3Yv2VPI7VW68sJ3/y9D8R4tnMEsyQ4CmDw96Pre3NyKr/Av
 jmW7SK94fOkpNFJOMk3zpk7GtRUlCsVkS1P61dOmMYduguhel/V20rWlx83BgnAY
 Rf/c3rBjqe8Ri3rzBP5icY/d6OgwoafuhME31DD/j6oKOh+EoQBvA4urj46yMTMX
 /mrK7hCm/wqwuOOvgGbo7sfZNBLCYy3SZ3EyF4beDERhPF1DaSvCwOULpGVJroqu
 SelFsKXAtEbYrDgsan+MYlx3bQv43q7PbHska1gjkH91plO4nAsssPr5VsusUKmT
 sq8jyBaauZb40oLOSgooL4RqAHrfs8q5695Ouwh/DB/XovMezUI1N/BkpGFmqpJI
 o2xH9P5q520pkB8pFhN9TbRuFSGe/dbWC24QTq1DUajo3M3RwcwX6ua9hoAKLtDF
 pCV5DNcVcXHD3Cxp0M5dQ5JEAiCnW+ZpUWgxPQamGDNW5PEvjDmFwql2uWw/qOuW
 lkheOIffq8ejUBQFbN8VXfIzzeeKQNFiIcViaqGITjIwhqdHAzVi28OuIGwtdh3g
 ywLzSC8yvyzgKrNBgtFMr3ucKN0FoPxpBro253xt2H7w8srXW64=
 =5V9t
 -----END PGP SIGNATURE-----

Merge tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RFDS mitigation from Dave Hansen:
 "RFDS is a CPU vulnerability that may allow a malicious userspace to
  infer stale register values from kernel space. Kernel registers can
  have all kinds of secrets in them so the mitigation is basically to
  wait until the kernel is about to return to userspace and has user
  values in the registers. At that point there is little chance of
  kernel secrets ending up in the registers and the microarchitectural
  state can be cleared.

  This leverages some recent robustness fixes for the existing MDS
  vulnerability. Both MDS and RFDS use the VERW instruction for
  mitigation"

* tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
  x86/rfds: Mitigate Register File Data Sampling (RFDS)
  Documentation/hw-vuln: Add documentation for RFDS
  x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
2024-03-12 09:31:39 -07:00
Ingo Molnar
2e2bc42c83 Merge branch 'linus' into x86/boot, to resolve conflict
There's a new conflict with Linus's upstream tree, because
in the following merge conflict resolution in <asm/coco.h>:

  38b334fc767e Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Linus has resolved the conflicting placement of 'cc_mask' better
than the original commit:

  1c811d403afd x86/sev: Fix position dependent variable references in startup code

... which was also done by an internal merge resolution:

  2e5fc4786b7a Merge branch 'x86/sev' into x86/boot, to resolve conflicts and to pick up dependent tree

But Linus is right in 38b334fc767e, the 'cc_mask' declaration is sufficient
within the #ifdef CONFIG_ARCH_HAS_CC_PLATFORM block.

So instead of forcing Linus to do the same resolution again, merge in Linus's
tree and follow his conflict resolution.

 Conflicts:
	arch/x86/include/asm/coco.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-03-12 09:55:57 +01:00
Jakub Kicinski
ed1f164038 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.9 net-next PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 20:38:36 -07:00
Linus Torvalds
685d982112 Core x86 changes for v6.9:
- The biggest change is the rework of the percpu code,
   to support the 'Named Address Spaces' GCC feature,
   by Uros Bizjak:
 
    - This allows C code to access GS and FS segment relative
      memory via variables declared with such attributes,
      which allows the compiler to better optimize those accesses
      than the previous inline assembly code.
 
    - The series also includes a number of micro-optimizations
      for various percpu access methods, plus a number of
      cleanups of %gs accesses in assembly code.
 
    - These changes have been exposed to linux-next testing for
      the last ~5 months, with no known regressions in this area.
 
 - Fix/clean up __switch_to()'s broken but accidentally
   working handling of FPU switching - which also generates
   better code.
 
 - Propagate more RIP-relative addressing in assembly code,
   to generate slightly better code.
 
 - Rework the CPU mitigations Kconfig space to be less idiosyncratic,
   to make it easier for distros to follow & maintain these options.
 
 - Rework the x86 idle code to cure RCU violations and
   to clean up the logic.
 
 - Clean up the vDSO Makefile logic.
 
 - Misc cleanups and fixes.
 
 [ Please note that there's a higher number of merge commits in
   this branch (three) than is usual in x86 topic trees. This happened
   due to the long testing lifecycle of the percpu changes that
   involved 3 merge windows, which generated a longer history
   and various interactions with other core x86 changes that we
   felt better about to carry in a single branch. ]
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXvB0gRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jUqRAAqnEQPiabF5acQlHrwviX+cjSobDlqtH5
 9q2AQy9qaEHapzD0XMOxvFye6XIvehGOGxSPvk6CoviSxBND8rb56lvnsEZuLeBV
 Bo5QSIL2x42Zrvo11iPHwgXZfTIusU90sBuKDRFkYBAxY3HK2naMDZe8MAsYCUE9
 nwgHF8DDc/NYiSOXV8kosWoWpNIkoK/STyH5bvTQZMqZcwyZ49AIeP1jGZb/prbC
 e/rbnlrq5Eu6brpM7xo9kELO0Vhd34urV14KrrIpdkmUKytW2KIsyvW8D6fqgDBj
 NSaQLLcz0pCXbhF+8Nqvdh/1coR4L7Ymt08P1rfEjCsQgb/2WnSAGUQuC5JoGzaj
 ngkbFcZllIbD9gNzMQ1n4Aw5TiO+l9zxCqPC/r58Uuvstr+K9QKlwnp2+B3Q73Ft
 rojIJ04NJL6lCHdDgwAjTTks+TD2PT/eBWsDfJ/1pnUWttmv9IjMpnXD5sbHxoiU
 2RGGKnYbxXczYdq/ALYDWM6JXpfnJZcXL3jJi0IDcCSsb92xRvTANYFHnTfyzGfw
 EHkhbF4e4Vy9f6QOkSP3CvW5H26BmZS9DKG0J9Il5R3u2lKdfbb5vmtUmVTqHmAD
 Ulo5cWZjEznlWCAYSI/aIidmBsp9OAEvYd+X7Z5SBIgTfSqV7VWHGt0BfA1heiVv
 F/mednG0gGc=
 =3v4F
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:

 - The biggest change is the rework of the percpu code, to support the
   'Named Address Spaces' GCC feature, by Uros Bizjak:

      - This allows C code to access GS and FS segment relative memory
        via variables declared with such attributes, which allows the
        compiler to better optimize those accesses than the previous
        inline assembly code.

      - The series also includes a number of micro-optimizations for
        various percpu access methods, plus a number of cleanups of %gs
        accesses in assembly code.

      - These changes have been exposed to linux-next testing for the
        last ~5 months, with no known regressions in this area.

 - Fix/clean up __switch_to()'s broken but accidentally working handling
   of FPU switching - which also generates better code

 - Propagate more RIP-relative addressing in assembly code, to generate
   slightly better code

 - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
   make it easier for distros to follow & maintain these options

 - Rework the x86 idle code to cure RCU violations and to clean up the
   logic

 - Clean up the vDSO Makefile logic

 - Misc cleanups and fixes

* tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/idle: Select idle routine only once
  x86/idle: Let prefer_mwait_c1_over_halt() return bool
  x86/idle: Cleanup idle_setup()
  x86/idle: Clean up idle selection
  x86/idle: Sanitize X86_BUG_AMD_E400 handling
  sched/idle: Conditionally handle tick broadcast in default_idle_call()
  x86: Increase brk randomness entropy for 64-bit systems
  x86/vdso: Move vDSO to mmap region
  x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
  x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
  x86/retpoline: Ensure default return thunk isn't used at runtime
  x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
  x86/vdso: Use $(addprefix ) instead of $(foreach )
  x86/vdso: Simplify obj-y addition
  x86/vdso: Consolidate targets and clean-files
  x86/bugs: Rename CONFIG_RETHUNK              => CONFIG_MITIGATION_RETHUNK
  x86/bugs: Rename CONFIG_CPU_SRSO             => CONFIG_MITIGATION_SRSO
  x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY       => CONFIG_MITIGATION_IBRS_ENTRY
  x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY      => CONFIG_MITIGATION_UNRET_ENTRY
  x86/bugs: Rename CONFIG_SLS                  => CONFIG_MITIGATION_SLS
  ...
2024-03-11 19:53:15 -07:00
Linus Torvalds
b0402403e5 - Add a FRU (Field Replaceable Unit) memory poison manager which
collects and manages previously encountered hw errors in order to
    save them to persistent storage across reboots. Previously recorded
    errors are "replayed" upon reboot in order to poison memory which has
    caused said errors in the past.
 
    The main use case is stacked, on-chip memory which cannot simply be
    replaced so poisoning faulty areas of it and thus making them
    inaccessible is the only strategy to prolong its lifetime.
 
  - Add an AMD address translation library glue which converts the
    reported addresses of hw errors into system physical addresses in
    order to be used by other subsystems like memory failure, for
    example. Add support for MI300 accelerators to that library.
 
  - igen6: Add support for Alder Lake-N SoC
 
  - i10nm: Add Grand Ridge support
 
  - The usual fixlets and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvKHcACgkQEsHwGGHe
 VUo4Lg/+OwXDI1EaCDyaHJ+f6JRmNok1EGjKMVjpp71/XmE3eUjiXfCv/b0bwl3V
 oIXGlXpJ5RSME+9aFDWADaE3h5zAGzTwQXuKtOUQPiJ6UuCebXodm8SaIG8V8trG
 yaW/hhP98AoJD+fN6qzv4XWYvTG8VRQs4tdISg9FXiljTjv4mKA+sxuCu8KpfrDh
 Tg+9F4Rre6gyR5GaB6N7Cc0k97DM7n5yKBZZGKucv+oYzDyf6n631ZSJ2zA9NC51
 CJlux917hCXI/IWrCQ2nkyfPPXxn8AaznUAA30wKgwlt8TFSdKTW+DvRA2zyuAU3
 0UDHO4FezOKuzVnWkzdnKsIMAnDyTGOz3Fi2LU4mC+JHaHHmI2quSWDxp5phWBuy
 S+T3XHxpbSsLGEI7zxT5F9u1oAlCvYu1C7HJw+yxNSn2iCy5LoNo0H/kl/nhR8Xr
 FgVp8SYgQRU2Pp8vgGOibMYY/TAHX55EticKdxvBI0yY+iqoJyAbZ0fb0XyLNc7s
 GqoWfvrK1KQzf5/Ya1Mm//0/QTPyFmJwujMJ2eEnMRRER+23bYpGvVBBT8E1sG9s
 gqEJkKjmVCPt9xJTcivm96sLJ7CG36w8+r/axSqpKXdcvDG9ec8G8PRqjlo5pcvh
 gYevmCBIcKny1xuhALwD6Rn2mkPip7araycDx9X9nd5z1qCxBaU=
 =FR2l
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add a FRU (Field Replaceable Unit) memory poison manager which
   collects and manages previously encountered hw errors in order to
   save them to persistent storage across reboots. Previously recorded
   errors are "replayed" upon reboot in order to poison memory which has
   caused said errors in the past.

   The main use case is stacked, on-chip memory which cannot simply be
   replaced so poisoning faulty areas of it and thus making them
   inaccessible is the only strategy to prolong its lifetime.

 - Add an AMD address translation library glue which converts the
   reported addresses of hw errors into system physical addresses in
   order to be used by other subsystems like memory failure, for
   example. Add support for MI300 accelerators to that library.

 - igen6: Add support for Alder Lake-N SoC

 - i10nm: Add Grand Ridge support

 - The usual fixlets and cleanups

* tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/versal: Convert to platform remove callback returning void
  RAS/AMD/FMPM: Fix off by one when unwinding on error
  RAS/AMD/FMPM: Add debugfs interface to print record entries
  RAS/AMD/FMPM: Save SPA values
  RAS: Export helper to get ras_debugfs_dir
  RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
  RAS: Introduce a FRU memory poison manager
  RAS/AMD/ATL: Add MI300 row retirement support
  Documentation: Move RAS section to admin-guide
  EDAC/versal: Make the bit position of injected errors configurable
  EDAC/i10nm: Add Intel Grand Ridge micro-server support
  EDAC/igen6: Add one more Intel Alder Lake-N SoC support
  RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
  RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
  RAS/AMD/ATL: Add MI300 support
  Documentation: RAS: Add index and address translation section
  EDAC/amd64: Use new AMD Address Translation Library
  RAS: Introduce AMD Address Translation Library
  EDAC/synopsys: Convert to devm_platform_ioremap_resource()
2024-03-11 18:14:06 -07:00
Jakub Kicinski
5f20e6ab1f for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmXvm7IACgkQ6rmadz2v
 bTqdMA//VMHNHVLb4oROoXyQD9fw2mCmIUEKzP88RXfqcxsfEX7HF+k8B5ZTk0ro
 CHXTAnc79+Qqg0j24bkQKxup/fKBQVw9D+Ia4b3ytlm1I2MtyU/16xNEzVhAPU2D
 iKk6mVBsEdCbt/GjpWORy/VVnZlZpC7BOpZLxsbbxgXOndnCegyjXzSnLGJGxdvi
 zkrQTn2SrFzLi6aNpVLqrv6Nks6HJusfCKsIrtlbkQ85dulasHOtwK9s6GF60nte
 aaho+MPx3L+lWEgapsm8rR779pHaYIB/GbZUgEPxE/xUJ/V8BzDgFNLMzEiIBRMN
 a0zZam11BkBzCfcO9gkvDRByaei/dZz2jdqfU4GlHklFj1WFfz8Q7fRLEPINksvj
 WXLgJADGY5mtGbjG21FScThxzj+Ruqwx0a13ddlyI/W+P3y5yzSWsLwJG5F9p0oU
 6nlkJ4U8yg+9E1ie5ae0TibqvRJzXPjfOERZGwYDSVvfQGzv1z+DGSOPMmgNcWYM
 dIaO+A/+NS3zdbk8+1PP2SBbhHPk6kWyCUByWc7wMzCPTiwriFGY/DD2sN+Fsufo
 zorzfikUQOlTfzzD5jbmT49U8hUQUf6QIWsu7BijSiHaaC7am4S8QB2O6ibJMqdv
 yNiwvuX+ThgVIY3QKrLLqL0KPGeKMR5mtfq6rrwSpfp/b4g27FE=
 =eFgA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2024-03-11

We've added 59 non-merge commits during the last 9 day(s) which contain
a total of 88 files changed, 4181 insertions(+), 590 deletions(-).

The main changes are:

1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
   VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena,
   from Alexei.

2) Introduce bpf_arena which is sparse shared memory region between bpf
   program and user space where structures inside the arena can have
   pointers to other areas of the arena, and pointers work seamlessly for
   both user-space programs and bpf programs, from Alexei and Andrii.

3) Introduce may_goto instruction that is a contract between the verifier
   and the program. The verifier allows the program to loop assuming it's
   behaving well, but reserves the right to terminate it, from Alexei.

4) Use IETF format for field definitions in the BPF standard
   document, from Dave.

5) Extend struct_ops libbpf APIs to allow specify version suffixes for
   stuct_ops map types, share the same BPF program between several map
   definitions, and other improvements, from Eduard.

6) Enable struct_ops support for more than one page in trampolines,
   from Kui-Feng.

7) Support kCFI + BPF on riscv64, from Puranjay.

8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay.

9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke.
====================

Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 18:06:04 -07:00
Linus Torvalds
1f75619a72 - Fix a wrong check in the function reporting whether a CPU executes (or
not) a NMI handler
 
 - Ratelimit unknown NMIs messages in order to not potentially slow down
   the machine
 
 - Other fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvN0wACgkQEsHwGGHe
 VUqZLg//fo0puvI2XVjcyW2aNZXNyCWUID5J0HvIZqLveQQQzOopfuX4NLfgKSRR
 GUX3k/jlfO9pku+gz6rQRYi8kaTlY8rScf9XpbUBgZZg3Pz2/ySel5uhPpHatgZ7
 Zj455XALGVLA3T4bFKfCvUGKmRVmSTyXgPg3i/yFpfVzRZ8yhvAyJWJSWxJpFOpC
 Eeg/cXUUPjlb2qOom0Bk9BEjG8Ez76yImAlN5ys/csG2Fe7iE3rU+DQ2IfU/yLfI
 22QNZa8xGJY47c7iP1A/tGsxKGu5Pjsz4I2QvobWhteeiu+03g2NUWUcAaP+3/GN
 6hj2IeiNAkhDcWaJMS9U5vaVAcfDZzTEErkPf896bk6lrR0UY1CRQlJzEQZLz1Vy
 0ZVUuppY2hBcTj3YA9h65a/+sdsxAUG4BdsUJ63jHejJYEPN5YSFvL5wXZlxj3GO
 XVVMsHMs9Lgnz1x+xzAB8SmmoPSj6qdMneY1Xp92cEtV6QQM/EinTfIcTUtvDACZ
 9FJ77Iu6Up4hemftTGOC8eVqr+V0Q8M5x2Xs8NQAwlq9dnFVQCIwd/LjdRDyJ3Gw
 ksFrq6Cv94Fi4bqmQi4CY04GH3kc5ua9sDeTM7rkBMm6RRSTO2NBgIOqHcBbrlOT
 B3kSUqoUB6BEqlRRqP/YZ8YSOL5FWk2A2WDKtp8+ThkDYixGy1M=
 =Jt9B
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Borislav Petkov:

 - Fix a wrong check in the function reporting whether a CPU executes
   (or not) a NMI handler

 - Ratelimit unknown NMIs messages in order to not potentially slow down
   the machine

 - Other fixlets

* tag 'x86_misc_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix the inverse "in NMI handler" check
  Documentation/maintainer-tip: Add C++ tail comments exception
  Documentation/maintainer-tip: Add Closes tag
  x86/nmi: Rate limit unknown NMI messages
  Documentation/kernel-parameters: Add spec_rstack_overflow to mitigations=off
2024-03-11 18:02:44 -07:00
Linus Torvalds
38b334fc76 - Add the x86 part of the SEV-SNP host support. This will allow the
kernel to be used as a KVM hypervisor capable of running SNP (Secure
   Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal
   of the AMD confidential computing side, providing the most
   comprehensive confidential computing environment up to date.
 
   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the next
   cycle.
 
 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel
 
 - The usual set of fixes, cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvH0wACgkQEsHwGGHe
 VUrzmA//VS/n6dhHRnm/nAGngr4PeegkgV1OhyKYFfiZ272rT6P9QvblQrgcY0dc
 Ij1DOhEKlke51pTHvMOQ33B3P4Fuc0mx3dpCLY0up5V26kzQiKCjRKEkC4U1bcw8
 W4GqMejaR89bE14bYibmwpSib9T/uVsV65eM3xf1iF5UvsnoUaTziymDoy+nb43a
 B1pdd5vcl4mBNqXeEvt0qjg+xkMLpWUI9tJDB8mbMl/cnIFGgMZzBaY8oktHSROK
 QpuUnKegOgp1RXpfLbNjmZ2Q4Rkk4MNazzDzWq3EIxaRjXL3Qp507ePK7yeA2qa0
 J3jCBQc9E2j7lfrIkUgNIzOWhMAXM2YH5bvH6UrIcMi1qsWJYDmkp2MF1nUedjdf
 Wj16/pJbeEw1aKKIywJGwsmViSQju158vY3SzXG83U/A/Iz7zZRHFmC/ALoxZptY
 Bi7VhfcOSpz98PE3axnG8CvvxRDWMfzBr2FY1VmQbg6VBNo1Xl1aP/IH1I8iQNKg
 /laBYl/qP+1286TygF1lthYROb1lfEIJprgi2xfO6jVYUqPb7/zq2sm78qZRfm7l
 25PN/oHnuidfVfI/H3hzcGubjOG9Zwra8WWYBB2EEmelf21rT0OLqq+eS4T6pxFb
 GNVfc0AzG77UmqbrpkAMuPqL7LrGaSee4NdU3hkEdSphlx1/YTo=
 =c1ps
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Add the x86 part of the SEV-SNP host support.

   This will allow the kernel to be used as a KVM hypervisor capable of
   running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP
   is the ultimate goal of the AMD confidential computing side,
   providing the most comprehensive confidential computing environment
   up to date.

   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the
   next cycle.

 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel

 - The usual set of fixes, cleanups and improvements all over the place

* tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/sev: Disable KMSAN for memory encryption TUs
  x86/sev: Dump SEV_STATUS
  crypto: ccp - Have it depend on AMD_IOMMU
  iommu/amd: Fix failure return from snp_lookup_rmpentry()
  x86/sev: Fix position dependent variable references in startup code
  crypto: ccp: Make snp_range_list static
  x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  Documentation: virt: Fix up pre-formatted text block for SEV ioctls
  crypto: ccp: Add the SNP_SET_CONFIG command
  crypto: ccp: Add the SNP_COMMIT command
  crypto: ccp: Add the SNP_PLATFORM_STATUS command
  x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
  KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
  crypto: ccp: Handle legacy SEV commands when SNP is enabled
  crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
  crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  x86/sev: Introduce an SNP leaked pages list
  crypto: ccp: Provide an API to issue SEV and SNP commands
  ...
2024-03-11 17:44:11 -07:00
Linus Torvalds
720c857907 Support for x86 Fast Return and Event Delivery (FRED):
FRED is a replacement for IDT event delivery on x86 and addresses most of
 the technical nightmares which IDT exposes:
 
  1) Exception cause registers like CR2 need to be manually preserved in
     nested exception scenarios.
 
  2) Hardware interrupt stack switching is suboptimal for nested exceptions
     as the interrupt stack mechanism rewinds the stack on each entry which
     requires a massive effort in the low level entry of #NMI code to handle
     this.
 
  3) No hardware distinction between entry from kernel or from user which
     makes establishing kernel context more complex than it needs to be
     especially for unconditionally nestable exceptions like NMI.
 
  4) NMI nesting caused by IRET unconditionally reenabling NMIs, which is a
     problem when the perf NMI takes a fault when collecting a stack trace.
 
  5) Partial restore of ESP when returning to a 16-bit segment
 
  6) Limitation of the vector space which can cause vector exhaustion on
     large systems.
 
  7) Inability to differentiate NMI sources
 
 FRED addresses these shortcomings by:
 
  1) An extended exception stack frame which the CPU uses to save exception
     cause registers. This ensures that the meta information for each
     exception is preserved on stack and avoids the extra complexity of
     preserving it in software.
 
  2) Hardware interrupt stack switching is non-rewinding if a nested
     exception uses the currently interrupt stack.
 
  3) The entry points for kernel and user context are separate and GS BASE
     handling which is required to establish kernel context for per CPU
     variable access is done in hardware.
 
  4) NMIs are now nesting protected. They are only reenabled on the return
     from NMI.
 
  5) FRED guarantees full restore of ESP
 
  6) FRED does not put a limitation on the vector space by design because it
     uses a central entry points for kernel and user space and the CPUstores
     the entry type (exception, trap, interrupt, syscall) on the entry stack
     along with the vector number. The entry code has to demultiplex this
     information, but this removes the vector space restriction.
 
     The first hardware implementations will still have the current
     restricted vector space because lifting this limitation requires
     further changes to the local APIC.
 
  7) FRED stores the vector number and meta information on stack which
     allows having more than one NMI vector in future hardware when the
     required local APIC changes are in place.
 
 The series implements the initial FRED support by:
 
  - Reworking the existing entry and IDT handling infrastructure to
    accomodate for the alternative entry mechanism.
 
  - Expanding the stack frame to accomodate for the extra 16 bytes FRED
    requires to store context and meta information
 
  - Providing FRED specific C entry points for events which have information
    pushed to the extended stack frame, e.g. #PF and #DB.
 
  - Providing FRED specific C entry points for #NMI and #MCE
 
  - Implementing the FRED specific ASM entry points and the C code to
    demultiplex the events
 
  - Providing detection and initialization mechanisms and the necessary
    tweaks in context switching, GS BASE handling etc.
 
 The FRED integration aims for maximum code reuse vs. the existing IDT
 implementation to the extent possible and the deviation in hot paths like
 context switching are handled with alternatives to minimalize the
 impact. The low level entry and exit paths are seperate due to the extended
 stack frame and the hardware based GS BASE swichting and therefore have no
 impact on IDT based systems.
 
 It has been extensively tested on existing systems and on the FRED
 simulation and as of now there are know outstanding problems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuKPgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWyUEACevJMHU+Ot9zqBPizSWxByM1uunHbp
 bjQXhaFeskd3mt7k7HU6GsPRSmC3q4lliP1Y9ypfbU0DvYSI2h/PhMWizjhmot2y
 nIvFpl51r/NsI+JHx1oXcFetz0eGHEqBui/4YQ/swgOCMymYgfqgHhazXTdldV3g
 KpH9/8W3AeGvw79uzXFH9tjBzTkbvywpam3v0LYNDJWTCuDkilyo8PjhsgRZD4x3
 V9f1nLD7nSHZW8XLoktdJJ38bKwI2Lhao91NQ0ErwopekA4/9WphZEKsDpidUSXJ
 sn1O148oQ8X92IO2OaQje8XC5pLGr5GqQBGPWzRH56P/Vd3+WOwBxaFoU6Drxc5s
 tIe23ZjkVcpA8EEG7BQBZV1Un/NX7XaCCnMniOt0RauXw+1NaslX7t/tnUAh5F1V
 TWCH4D0I0oJ0qJ7kNliGn2BP3agYXOVg81xVEUjT6KfHcYU4ImUrwi+BkeNXuXtL
 Ch5ADnbYAcUjWLFnAmEmaRtfmfNGY5T7PeGFHW2RRkaOJ88v5g14Voo6gPJaDUPn
 wMQ0nLq1xN4xZWF6ZgfRqAhArvh20k38ZujRku5vXEqnhOugQ76TF2UYiFEwOXbQ
 8jcM+yEBLGgBz7tGMwmIAml6kfxaFF1KPpdrtcPxNkGlbE6KTSuIolLx2YGUvlSU
 6/O8nwZy49ckmQ==
 =Ib7w
 -----END PGP SIGNATURE-----

Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FRED support from Thomas Gleixner:
 "Support for x86 Fast Return and Event Delivery (FRED).

  FRED is a replacement for IDT event delivery on x86 and addresses most
  of the technical nightmares which IDT exposes:

   1) Exception cause registers like CR2 need to be manually preserved
      in nested exception scenarios.

   2) Hardware interrupt stack switching is suboptimal for nested
      exceptions as the interrupt stack mechanism rewinds the stack on
      each entry which requires a massive effort in the low level entry
      of #NMI code to handle this.

   3) No hardware distinction between entry from kernel or from user
      which makes establishing kernel context more complex than it needs
      to be especially for unconditionally nestable exceptions like NMI.

   4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
      is a problem when the perf NMI takes a fault when collecting a
      stack trace.

   5) Partial restore of ESP when returning to a 16-bit segment

   6) Limitation of the vector space which can cause vector exhaustion
      on large systems.

   7) Inability to differentiate NMI sources

  FRED addresses these shortcomings by:

   1) An extended exception stack frame which the CPU uses to save
      exception cause registers. This ensures that the meta information
      for each exception is preserved on stack and avoids the extra
      complexity of preserving it in software.

   2) Hardware interrupt stack switching is non-rewinding if a nested
      exception uses the currently interrupt stack.

   3) The entry points for kernel and user context are separate and GS
      BASE handling which is required to establish kernel context for
      per CPU variable access is done in hardware.

   4) NMIs are now nesting protected. They are only reenabled on the
      return from NMI.

   5) FRED guarantees full restore of ESP

   6) FRED does not put a limitation on the vector space by design
      because it uses a central entry points for kernel and user space
      and the CPUstores the entry type (exception, trap, interrupt,
      syscall) on the entry stack along with the vector number. The
      entry code has to demultiplex this information, but this removes
      the vector space restriction.

      The first hardware implementations will still have the current
      restricted vector space because lifting this limitation requires
      further changes to the local APIC.

   7) FRED stores the vector number and meta information on stack which
      allows having more than one NMI vector in future hardware when the
      required local APIC changes are in place.

  The series implements the initial FRED support by:

   - Reworking the existing entry and IDT handling infrastructure to
     accomodate for the alternative entry mechanism.

   - Expanding the stack frame to accomodate for the extra 16 bytes FRED
     requires to store context and meta information

   - Providing FRED specific C entry points for events which have
     information pushed to the extended stack frame, e.g. #PF and #DB.

   - Providing FRED specific C entry points for #NMI and #MCE

   - Implementing the FRED specific ASM entry points and the C code to
     demultiplex the events

   - Providing detection and initialization mechanisms and the necessary
     tweaks in context switching, GS BASE handling etc.

  The FRED integration aims for maximum code reuse vs the existing IDT
  implementation to the extent possible and the deviation in hot paths
  like context switching are handled with alternatives to minimalize the
  impact. The low level entry and exit paths are seperate due to the
  extended stack frame and the hardware based GS BASE swichting and
  therefore have no impact on IDT based systems.

  It has been extensively tested on existing systems and on the FRED
  simulation and as of now there are no outstanding problems"

* tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/fred: Fix init_task thread stack pointer initialization
  MAINTAINERS: Add a maintainer entry for FRED
  x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
  x86/fred: Invoke FRED initialization code to enable FRED
  x86/fred: Add FRED initialization functions
  x86/syscall: Split IDT syscall setup code into idt_syscall_init()
  KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
  x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
  x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
  x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
  x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
  x86/traps: Add sysvec_install() to install a system interrupt handler
  x86/fred: FRED entry/exit and dispatch code
  x86/fred: Add a machine check entry stub for FRED
  x86/fred: Add a NMI entry stub for FRED
  x86/fred: Add a debug fault entry stub for FRED
  x86/idtentry: Incorporate definitions/declarations of the FRED entries
  x86/fred: Make exc_page_fault() work for FRED
  x86/fred: Allow single-step trap and NMI when starting a new task
  x86/fred: No ESPFIX needed when FRED is enabled
  ...
2024-03-11 16:00:17 -07:00
Linus Torvalds
ca7e917769 Rework of APIC enumeration and topology evaluation:
The current implementation has a couple of shortcomings:
 
   - It fails to handle hybrid systems correctly.
 
   - The APIC registration code which handles CPU number assignents is in
     the middle of the APIC code and detached from the topology evaluation.
 
   - The various mechanisms which enumerate APICs, ACPI, MPPARSE and guest
     specific ones, tweak global variables as they see fit or in case of
     XENPV just hack around the generic mechanisms completely.
 
   - The CPUID topology evaluation code is sprinkled all over the vendor
     code and reevaluates global variables on every hotplug operation.
 
   - There is no way to analyze topology on the boot CPU before bringing up
     the APs. This causes problems for infrastructure like PERF which needs
     to size certain aspects upfront or could be simplified if that would be
     possible.
 
   - The APIC admission and CPU number association logic is incomprehensible
     and overly complex and needs to be kept around after boot instead of
     completing this right after the APIC enumeration.
 
 This update addresses these shortcomings with the following changes:
 
   - Rework the CPUID evaluation code so it is common for all vendors and
     provides information about the APIC ID segments in a uniform way
     independent of the number of segments (Thread, Core, Module, ..., Die,
     Package) so that this information can be computed instead of rewriting
     global variables of dubious value over and over.
 
   - A few cleanups and simplifcations of the APIC, IO/APIC and related
     interfaces to prepare for the topology evaluation changes.
 
   - Seperation of the parser stages so the early evaluation which tries to
     find the APIC address can be seperately overridden from the late
     evaluation which enumerates and registers the local APIC as further
     preparation for sanitizing the topology evaluation.
 
   - A new registration and admission logic which
 
      - encapsulates the inner workings so that parsers and guest logic
        cannot longer fiddle in it
 
      - uses the APIC ID segments to build topology bitmaps at registration
        time
 
      - provides a sane admission logic
 
      - allows to detect the crash kernel case, where CPU0 does not run on
        the real BSP, automatically. This is required to prevent sending
        INIT/SIPI sequences to the real BSP which would reset the whole
        machine. This was so far handled by a tedious command line
        parameter, which does not even work in nested crash scenarios.
 
      - Associates CPU number after the enumeration completed and prevents
        the late registration of APICs, which was somehow tolerated before.
 
   - Converting all parsers and guest enumeration mechanisms over to the
     new interfaces.
 
     This allows to get rid of all global variable tweaking from the parsers
     and enumeration mechanisms and sanitizes the XEN[PV] handling so it can
     use CPUID evaluation for the first time.
 
   - Mopping up existing sins by taking the information from the APIC ID
     segment bitmaps.
 
     This evaluates hybrid systems correctly on the boot CPU and allows for
     cleanups and fixes in the related drivers, e.g. PERF.
 
 The series has been extensively tested and the minimal late fallout due to
 a broken ACPI/MADT table has been addressed by tightening the admission
 logic further.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuDawTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobE7EACngItF+UOTCoCV6och2lL6HVoIdZD1
 Y5oaAgD+WzQSz/lBkH6b9kZSyvjlMo6O9GlnGX+ii+VUnijDp4VrspnxbJDaKEq3
 gOfsSg2Tk+ps50HqMcZawjjBYJb/TmvKwEV2XuzIBPOONSWLNjvN7nBSzLl1eF9/
 8uCE39/8aB5K3GXryRyXdo2uLu6eHTVC0aYFu/kLX1/BbVqF5NMD3sz9E9w8+D/U
 MIIMEMXy4Fn+P2o0vVH+gjUlwI76mJbB1WqCX/sqbVacXrjl3KfNJRiisTFIOOYV
 8o+rIV0ef5X9xmZqtOXAdyZQzj++Gwmz9+4TU1M4YHtS7UkYn6AluOjvVekCc+gc
 qXE3WhqKfCK2/carRMLQxAMxNeRylkZG+Wuv1Qtyjpe9JX2dTqtems0f4DMp9DKf
 b7InO3z39kJanpqcUG2Sx+GWanetfnX+0Ho2Moqu6Xi+2ATr1PfMG/Wyr5/WWOfV
 qApaHSTwa+J43mSzP6BsXngEv085EHSGM5tPe7u46MCYFqB21+bMl+qH82KjMkOe
 c6uZovFQMmX2WBlqJSYGVCH+Jhgvqq8HFeRs19Hd4enOt3e6LE3E74RBVD1AyfLV
 1b/m8tYB/o871ZlEZwDCGVrV/LNnA7PxmFpq5ZHLpUt39g2/V0RH1puBVz1e97pU
 YsTT7hBCUYzgjQ==
 =/5oR
 -----END PGP SIGNATURE-----

Merge tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:
 "Rework of APIC enumeration and topology evaluation.

  The current implementation has a couple of shortcomings:

   - It fails to handle hybrid systems correctly.

   - The APIC registration code which handles CPU number assignents is
     in the middle of the APIC code and detached from the topology
     evaluation.

   - The various mechanisms which enumerate APICs, ACPI, MPPARSE and
     guest specific ones, tweak global variables as they see fit or in
     case of XENPV just hack around the generic mechanisms completely.

   - The CPUID topology evaluation code is sprinkled all over the vendor
     code and reevaluates global variables on every hotplug operation.

   - There is no way to analyze topology on the boot CPU before bringing
     up the APs. This causes problems for infrastructure like PERF which
     needs to size certain aspects upfront or could be simplified if
     that would be possible.

   - The APIC admission and CPU number association logic is
     incomprehensible and overly complex and needs to be kept around
     after boot instead of completing this right after the APIC
     enumeration.

  This update addresses these shortcomings with the following changes:

   - Rework the CPUID evaluation code so it is common for all vendors
     and provides information about the APIC ID segments in a uniform
     way independent of the number of segments (Thread, Core, Module,
     ..., Die, Package) so that this information can be computed instead
     of rewriting global variables of dubious value over and over.

   - A few cleanups and simplifcations of the APIC, IO/APIC and related
     interfaces to prepare for the topology evaluation changes.

   - Seperation of the parser stages so the early evaluation which tries
     to find the APIC address can be seperately overridden from the late
     evaluation which enumerates and registers the local APIC as further
     preparation for sanitizing the topology evaluation.

   - A new registration and admission logic which

       - encapsulates the inner workings so that parsers and guest logic
         cannot longer fiddle in it

       - uses the APIC ID segments to build topology bitmaps at
         registration time

       - provides a sane admission logic

       - allows to detect the crash kernel case, where CPU0 does not run
         on the real BSP, automatically. This is required to prevent
         sending INIT/SIPI sequences to the real BSP which would reset
         the whole machine. This was so far handled by a tedious command
         line parameter, which does not even work in nested crash
         scenarios.

       - Associates CPU number after the enumeration completed and
         prevents the late registration of APICs, which was somehow
         tolerated before.

   - Converting all parsers and guest enumeration mechanisms over to the
     new interfaces.

     This allows to get rid of all global variable tweaking from the
     parsers and enumeration mechanisms and sanitizes the XEN[PV]
     handling so it can use CPUID evaluation for the first time.

   - Mopping up existing sins by taking the information from the APIC ID
     segment bitmaps.

     This evaluates hybrid systems correctly on the boot CPU and allows
     for cleanups and fixes in the related drivers, e.g. PERF.

  The series has been extensively tested and the minimal late fallout
  due to a broken ACPI/MADT table has been addressed by tightening the
  admission logic further"

* tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
  x86/topology: Ignore non-present APIC IDs in a present package
  x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
  smp: Provide 'setup_max_cpus' definition on UP too
  smp: Avoid 'setup_max_cpus' namespace collision/shadowing
  x86/bugs: Use fixed addressing for VERW operand
  x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
  x86/cpu/topology: Provide __num_[cores|threads]_per_package
  x86/cpu/topology: Rename topology_max_die_per_package()
  x86/cpu/topology: Rename smp_num_siblings
  x86/cpu/topology: Retrieve cores per package from topology bitmaps
  x86/cpu/topology: Use topology logical mapping mechanism
  x86/cpu/topology: Provide logical pkg/die mapping
  x86/cpu/topology: Simplify cpu_mark_primary_thread()
  x86/cpu/topology: Mop up primary thread mask handling
  x86/cpu/topology: Use topology bitmaps for sizing
  x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
  x86/xen/smp_pv: Count number of vCPUs early
  x86/cpu/topology: Assign hotpluggable CPUIDs during init
  x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
  x86/topology: Add a mechanism to track topology via APIC IDs
  ...
2024-03-11 15:45:55 -07:00
Jakub Kicinski
ba980f8dff netlink: specs: support generating code for genl socket priv
The family struct is auto-generated for new families, support
use of the sock_priv_* mechanism added in commit a731132424ad
("genetlink: introduce per-sock family private storage").

For example if the family wants to use struct sk_buff as its
private struct (unrealistic but just for illustration), it would
add to its spec:

  kernel-family:
    headers: [ "linux/skbuff.h" ]
    sock-priv: struct sk_buff

ynl-gen-c will declare the appropriate priv size and hook
in function prototypes to be implemented by the family.

Link: https://lore.kernel.org/r/20240308190319.2523704-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:15:42 -07:00
Linus Torvalds
d08c407f71 A large set of updates and features for timers and timekeeping:
- The hierarchical timer pull model
 
     When timer wheel timers are armed they are placed into the timer wheel
     of a CPU which is likely to be busy at the time of expiry. This is done
     to avoid wakeups on potentially idle CPUs.
 
     This is wrong in several aspects:
 
      1) The heuristics to select the target CPU are wrong by
         definition as the chance to get the prediction right is close
         to zero.
 
      2) Due to #1 it is possible that timers are accumulated on a
         single target CPU
 
      3) The required computation in the enqueue path is just overhead for
      	dubious value especially under the consideration that the vast
      	majority of timer wheel timers are either canceled or rearmed
      	before they expire.
 
     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on which
     they get armed.
 
     This is achieved by having separate wheels for CPU pinned timers and
     global timers which do not care about where they expire.
 
     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.
 
     When a CPU goes idle it evaluates its own timer wheels:
 
       - If the first expiring timer is a pinned timer, then the global
       	timers can be ignored as the CPU will wake up before they expire.
 
       - If the first expiring timer is a global timer, then the expiry time
         is propagated into the timer pull hierarchy and the CPU makes sure
         to wake up for the first pinned timer.
 
     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to the
     point where no further aggregation of groups is required, i.e. the
     number of levels is log8(NR_CPUS). The magic number of eight has been
     established by experimention, but can be adjusted if needed.
 
     In each group one busy CPU acts as the migrator. It's only one CPU to
     avoid lock contention on remote timer wheels.
 
     The migrator CPU checks in its own timer wheel handling whether there
     are other CPUs in the group which have gone idle and have global timers
     to expire. If there are global timers to expire, the migrator locks the
     remote CPU timer wheel and handles the expiry.
 
     Depending on the group level in the hierarchy this handling can require
     to walk the hierarchy downwards to the CPU level.
 
     Special care is taken when the last CPU goes idle. At this point the
     CPU is the systemwide migrator at the top of the hierarchy and it
     therefore cannot delegate to the hierarchy. It needs to arm its own
     timer device to expire either at the first expiring timer in the
     hierarchy or at the first CPU local timer, which ever expires first.
 
     This completely removes the overhead from the enqueue path, which is
     e.g. for networking a true hotpath and trades it for a slightly more
     complex idle path.
 
     This has been in development for a couple of years and the final series
     has been extensively tested by various teams from silicon vendors and
     ran through extensive CI.
 
     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them to
     power down a die completely on a mult-die socket for the first time in
     a mostly idle scenario.
 
     There is only one outstanding ~1.5% regression on a specific overloaded
     netperf test which is currently investigated, but the rest is either
     positive or neutral performance wise and positive on the power
     management side.
 
   - Fixes for the timekeeping interpolation code for cross-timestamps:
 
     cross-timestamps are used for PTP to get snapshots from hardware timers
     and interpolated them back to clock MONOTONIC. The changes address a
     few corner cases in the interpolation code which got the math and logic
     wrong.
 
   - Simplifcation of the clocksource watchdog retry logic to automatically
     adjust to handle larger systems correctly instead of having more
     incomprehensible command line parameters.
 
   - Treewide consolidation of the VDSO data structures.
 
   - The usual small improvements and cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuAN0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVKXEADIR45rjR1Xtz32js7B53Y65O4WNoOQ
 6/ycWcswuGzg/h4QUpPSJ6gOGVmKSWwZi4n0P/VadCiXGSPPm0aUKsoRUt9DZsPY
 mtj2wjCSXKXiyhTl9OtrZME86ZAIGO1dQXa/sOHsiP5PCjgQkD0b5CYi1+B6eHDt
 1/Uo2Tb9g8VAPppq20V5Uo93GrPf642oyi3FCFrR1M112Uuak5DmqHJYiDpreNcG
 D5SgI+ykSiaUaVyHifvqijoJk0rYXkqEC6evl02477lJ/X0vVo2/M8XPS95BxHST
 s5Iruo4rP+qeAy8QvhZpoPX59fO0m/AgA7cf77XXAtOpVdLH+bs4ILsEbouAIOtv
 lsmRkcYt+TpvrZFHPAxks+6g3afuROiDtxD5sXXpVWxvofi8FwWqubdlqdsbw9MP
 ZCTNyzNyKL47QeDwBfSynYUL1RSyqsphtIwk4oeQklH9rwMAnW21hi30z15hQ0pQ
 FOVkmcwi79JNvl/G+jRkDzw7r8/zcHshWdSjyUM04CDjjnCDjQOFWSIjEPwbQjjz
 S4HXpJKJW963dBgs9Z84/Ctw1GwoBk1qedDWDJE1257Qvmo/Wpe/7GddWcazOGnN
 RRFMzGPbOqBDbjtErOKGU+iCisgNEvz2XK+TI16uRjWde7DxZpiTVYgNDrZ+/Pyh
 rQ23UBms6ZRR+A==
 =iQlu
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A large set of updates and features for timers and timekeeping:

   - The hierarchical timer pull model

     When timer wheel timers are armed they are placed into the timer
     wheel of a CPU which is likely to be busy at the time of expiry.
     This is done to avoid wakeups on potentially idle CPUs.

     This is wrong in several aspects:

       1) The heuristics to select the target CPU are wrong by
          definition as the chance to get the prediction right is
          close to zero.

       2) Due to #1 it is possible that timers are accumulated on
          a single target CPU

       3) The required computation in the enqueue path is just overhead
          for dubious value especially under the consideration that the
          vast majority of timer wheel timers are either canceled or
          rearmed before they expire.

     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on
     which they get armed.

     This is achieved by having separate wheels for CPU pinned timers
     and global timers which do not care about where they expire.

     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.

     When a CPU goes idle it evaluates its own timer wheels:

       - If the first expiring timer is a pinned timer, then the global
         timers can be ignored as the CPU will wake up before they
         expire.

       - If the first expiring timer is a global timer, then the expiry
         time is propagated into the timer pull hierarchy and the CPU
         makes sure to wake up for the first pinned timer.

     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to
     the point where no further aggregation of groups is required, i.e.
     the number of levels is log8(NR_CPUS). The magic number of eight
     has been established by experimention, but can be adjusted if
     needed.

     In each group one busy CPU acts as the migrator. It's only one CPU
     to avoid lock contention on remote timer wheels.

     The migrator CPU checks in its own timer wheel handling whether
     there are other CPUs in the group which have gone idle and have
     global timers to expire. If there are global timers to expire, the
     migrator locks the remote CPU timer wheel and handles the expiry.

     Depending on the group level in the hierarchy this handling can
     require to walk the hierarchy downwards to the CPU level.

     Special care is taken when the last CPU goes idle. At this point
     the CPU is the systemwide migrator at the top of the hierarchy and
     it therefore cannot delegate to the hierarchy. It needs to arm its
     own timer device to expire either at the first expiring timer in
     the hierarchy or at the first CPU local timer, which ever expires
     first.

     This completely removes the overhead from the enqueue path, which
     is e.g. for networking a true hotpath and trades it for a slightly
     more complex idle path.

     This has been in development for a couple of years and the final
     series has been extensively tested by various teams from silicon
     vendors and ran through extensive CI.

     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them
     to power down a die completely on a mult-die socket for the first
     time in a mostly idle scenario.

     There is only one outstanding ~1.5% regression on a specific
     overloaded netperf test which is currently investigated, but the
     rest is either positive or neutral performance wise and positive on
     the power management side.

   - Fixes for the timekeeping interpolation code for cross-timestamps:

     cross-timestamps are used for PTP to get snapshots from hardware
     timers and interpolated them back to clock MONOTONIC. The changes
     address a few corner cases in the interpolation code which got the
     math and logic wrong.

   - Simplifcation of the clocksource watchdog retry logic to
     automatically adjust to handle larger systems correctly instead of
     having more incomprehensible command line parameters.

   - Treewide consolidation of the VDSO data structures.

   - The usual small improvements and cleanups all over the place"

* tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  timer/migration: Fix quick check reporting late expiry
  tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
  vdso/datapage: Quick fix - use asm/page-def.h for ARM64
  timers: Assert no next dyntick timer look-up while CPU is offline
  tick: Assume timekeeping is correctly handed over upon last offline idle call
  tick: Shut down low-res tick from dying CPU
  tick: Split nohz and highres features from nohz_mode
  tick: Move individual bit features to debuggable mask accesses
  tick: Move got_idle_tick away from common flags
  tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
  tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
  tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
  tick: Start centralizing tick related CPU hotplug operations
  tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
  tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
  tick: Use IS_ENABLED() whenever possible
  tick/sched: Remove useless oneshot ifdeffery
  tick/nohz: Remove duplicate between lowres and highres handlers
  tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
  hrtimer: Select housekeeping CPU during migration
  ...
2024-03-11 14:38:26 -07:00
Matthew Wood
2b39535859 net: netconsole: Add continuation line prefix to userdata messages
Add a space (' ') prefix to every userdata line to match docs for
dev-kmsg. To account for this extra character in each userdata entry,
reduce userdata entry names (directory name) from 54 characters to 53.

According to the dev-kmsg docs, a space is used for subsequent lines to
mark them as continuation lines.

> A line starting with ' ', is a continuation line, adding
> key/value pairs to the log message, which provide the machine
> readable context of the message, for reliable processing in
> userspace.

Testing for this patch::

 cd /sys/kernel/config/netconsole && mkdir cmdline0
 cd cmdline0
 mkdir userdata/test && echo "hello" > userdata/test/value
 mkdir userdata/test2 && echo "hello2" > userdata/test2/value
 echo "message" > /dev/kmsg

Outputs::

 6.8.0-rc5-virtme,12,493,231373579,-;message
  test=hello
  test2=hello2

And I confirmed all testing works as expected from the original patchset

Fixes: df03f830d099 ("net: netconsole: cache userdata formatted string in netconsole_target")
Signed-off-by: Matthew Wood <thepacketgeek@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240308002525.248672-1-thepacketgeek@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 14:07:57 -07:00
Linus Torvalds
02d4df78c5 Updates for the interrupt subsystem:
- Core:
 
    - Make affinity changes immediately effective for interrupt
      threads. This reduces the impact on isolated CPUs as it pulls over the
      thread right away instead of doing it after the next hardware
      interrupt arrived.
 
    - Cleanup and improvements for the interrupt chip simulator
 
    - Deduplication of the interrupt descriptor initialization code so the
      sparse and non-sparse mode share more code.
 
  - Drivers:
 
    - A set of conversions to platform_drivers::remove_new() which gets rid
      of the pointless return value.
 
    - A new driver for the Starfive JH8100 SoC
 
    - Support for Amlogic-T7 SoCs
 
    - Improvement for the interrupt handling and EOI management for the
      loongson interrupt controller.
 
    - The usual fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXt6RUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRahEACenZz//vEy+n5t94UCNoYEBsqL4qsl
 eHb2LPkOwJdzy0I0et8sSRfmjFgfmiB5vmcOtuTjbA+pAASMU16M5nU38dD4Qw7V
 lwfutv3wb0XT7INslvrsEF4SvhapoiSBtzdK4IEVJysaHek/bbvZg8rot2tXTjCR
 3sK4sMuWLXxB+MzcaYEXSZlIlsrXcARHYNVCbudsEqL2Rt7mGtBJBMIPAYXaWLMn
 Y1B15huDNcj+Z9s/rbX218oSajEYJv24NE7JW/eYhG8Rv3yc+1zMTIARq35V77/3
 KIV15XqKozkR4G8BEzQ1hUp6l1cggOjMslkwjyKnXTddkHQnQs5928/48y1qs4W0
 IDpJqpPL30ckfzg/fUKfUU98t95qB4X55jmK3LuiWfdS8cfd65gq4Ro2bIszM1NQ
 SYhcTvZRRcNJqlbO3rQfFAmVU0bvVyR3DlmrLzVl2tH5touwNBBQ/3D3o7CRGEns
 37c07zjVZnir+HFmrtTKOiENTay+fHrtIw5dFf7FMqREpE4kL/nsgZfN0wgZPUHj
 QGFExV/kJNSMvqwCz77uvHt6c5uoVZGn2j8iYAdqWVKYRcWCMids2gVEkc8QK4gQ
 eWsIEAClIEjArPqpQzPE2v3a9puCmOpbHWRmU7VDtNka9/ur8qoU2KMXMJBySaL4
 UKXfWYE+43RVbQ==
 =AbVv
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Core:

   - Make affinity changes take effect immediately for interrupt
     threads. This reduces the impact on isolated CPUs as it pulls over
     the thread right away instead of doing it after the next hardware
     interrupt arrived.

   - Cleanup and improvements for the interrupt chip simulator

   - Deduplication of the interrupt descriptor initialization code so
     the sparse and non-sparse mode share more code.

  Drivers:

   - A set of conversions to platform_drivers::remove_new() which gets
     rid of the pointless return value.

   - A new driver for the Starfive JH8100 SoC

   - Support for Amlogic-T7 SoCs

   - Improvement for the interrupt handling and EOI management for the
     loongson interrupt controller.

   - The usual fixes and improvements all over the place"

* tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  irqchip/ts4800: Convert to platform_driver::remove_new() callback
  irqchip/stm32-exti: Convert to platform_driver::remove_new() callback
  irqchip/renesas-rza1: Convert to platform_driver::remove_new() callback
  irqchip/renesas-irqc: Convert to platform_driver::remove_new() callback
  irqchip/renesas-intc-irqpin: Convert to platform_driver::remove_new() callback
  irqchip/pruss-intc: Convert to platform_driver::remove_new() callback
  irqchip/mvebu-pic: Convert to platform_driver::remove_new() callback
  irqchip/madera: Convert to platform_driver::remove_new() callback
  irqchip/ls-scfg-msi: Convert to platform_driver::remove_new() callback
  irqchip/keystone: Convert to platform_driver::remove_new() callback
  irqchip/imx-irqsteer: Convert to platform_driver::remove_new() callback
  irqchip/imx-intmux: Convert to platform_driver::remove_new() callback
  irqchip/imgpdc: Convert to platform_driver::remove_new() callback
  irqchip: Add StarFive external interrupt controller
  dt-bindings: interrupt-controller: Add starfive,jh8100-intc
  arm64: dts: Add gpio_intc node for Amlogic-T7 SoCs
  irqchip/meson-gpio: Add support for Amlogic-T7 SoCs
  dt-bindings: interrupt-controller: Add support for Amlogic-T7 SoCs
  irqchip/vic: Fix a kernel-doc warning
  genirq: Wake interrupt threads immediately when changing affinity
  ...
2024-03-11 13:50:30 -07:00
William Tu
8f4cd89bf1 devlink: Fix length of eswitch inline-mode
Set eswitch inline-mode to be u8, not u16. Otherwise, errors below

$ devlink dev eswitch set pci/0000:08:00.0 mode switchdev \
  inline-mode network
    Error: Attribute failed policy validation.
    kernel answers: Numerical result out of rang
    netlink: 'devlink': attribute type 26 has an invalid length.

Fixes: f2f9dd164db0 ("netlink: specs: devlink: add the remaining command to generate complete split_ops")
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240310164547.35219-1-witu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 13:13:53 -07:00
Pawan Gupta
8076fcde01 x86/rfds: Mitigate Register File Data Sampling (RFDS)
RFDS is a CPU vulnerability that may allow userspace to infer kernel
stale data previously used in floating point registers, vector registers
and integer registers. RFDS only affects certain Intel Atom processors.

Intel released a microcode update that uses VERW instruction to clear
the affected CPU buffers. Unlike MDS, none of the affected cores support
SMT.

Add RFDS bug infrastructure and enable the VERW based mitigation by
default, that clears the affected buffers just before exiting to
userspace. Also add sysfs reporting and cmdline parameter
"reg_file_data_sampling" to control the mitigation.

For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-03-11 13:13:48 -07:00
Pawan Gupta
4e42765d1b Documentation/hw-vuln: Add documentation for RFDS
Add the documentation for transient execution vulnerability Register
File Data Sampling (RFDS) that affects Intel Atom CPUs.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-03-11 13:13:46 -07:00
Linus Torvalds
045395d86a cgroup: Changes for 6.9
A quiet cycle. One trivial doc update patch. Two patches to drop now defunct
 memory_spread_slab feature from cgroup1 cpuset.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZe7MVQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGR59APwO8h/GCRH0KovpemkjsIHxicWMlvfHVleIdS4l
 FY7lLgD+JGucXcxd4YM/ZAZkj9pSUvrEm46n+Jrst7GFH8lfUQ0=
 =YY0C
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:
 "A quiet cycle. One trivial doc update patch. Two patches to drop the
  now defunct memory_spread_slab feature from cgroup1 cpuset"

* tag 'cgroup-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Mark memory_spread_slab as obsolete
  cgroup/cpuset: Remove cpuset_do_slab_mem_spread()
  docs: cgroup-v1: add missing code-block tags
2024-03-11 13:13:22 -07:00
Hangbin Liu
44208f5936 netlink: specs: support unterminated-ok
ynl-gen-c.py supports check unterminated-ok, but the yaml schemas don't
have this key. Add this to the yaml files.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240308081239.3281710-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 13:09:15 -07:00
Hangbin Liu
8d0c314c30 tools: ynl-gen: support using pre-defined values in attr checks
Support using pre-defined values in checks so we don't need to use hard
code number for the string, binary length. e.g. we have a definition like

 #define TEAM_STRING_MAX_LEN 32

Which defined in yaml like:

 definitions:
   -
     name: string-max-len
     type: const
     value: 32

It can be used in the attribute-sets like

attribute-sets:
  -
    name: attr-option
    name-prefix: team-attr-option-
    attributes:
      -
        name: name
        type: string
        checks:
          len: string-max-len

With this patch it will be converted to

[TEAM_ATTR_OPTION_NAME] = { .type = NLA_STRING, .len = TEAM_STRING_MAX_LEN, }

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240311140727.109562-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 13:07:48 -07:00
Linus Torvalds
ff887eb07c workqueue: Changes for v6.9
This cycle, a lot of workqueue changes including some that are significant
 and invasive.
 
 - During v6.6 cycle, unbound workqueues were updated so that they are more
   topology aware and flexible, which among other things improved workqueue
   behavior on modern multi-L3 CPUs. In the process, 636b927eba5b
   ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
   switched unbound workqueues to use per-CPU frontend pool_workqueues as a
   part of increasing front-back mapping flexibility.
 
   An unwelcome side effect of this change was that this made max concurrency
   enforcement per-CPU blowing up the maximum number of allowed concurrent
   executions. I incorrectly assumed that this wouldn't cause practical
   problems as most unbound workqueue users are self-regulate max
   concurrency; however, there definitely are which don't (e.g. on IO paths)
   and the drastic increase in the allowed max concurrency led to noticeable
   perf regressions in some use cases.
 
   This is now addressed by separating out max concurrency enforcement to a
   separate struct - wq_node_nr_active - which makes @max_active consistently
   mean system-wide max concurrency regardless of the number of CPUs or
   (finally) NUMA nodes. This is a rather invasive and, in places, a bit
   clunky; however, the clunkiness rises from the the inherent requirement to
   handle the disagreement between the execution locality domain and max
   concurrency enforcement domain on some modern machines. See 5797b1c18919
   ("workqueue: Implement system-wide nr_active enforcement for unbound
   workqueues") for more details.
 
 - BH workqueue support is added. They are similar to per-CPU workqueues but
   execute work items in the softirq context. This is expected to replace
   tasklet. However, currently, it's missing the ability to disable and
   enable work items which is needed to convert many tasklet users. To avoid
   crowding this merge window too much, this will be included in the next
   merge window. A separate pull request will be sent for the couple
   conversion patches that are currently pending.
 
 - Waiman plugged a long-standing hole in workqueue CPU isolation where
   ordered workqueues didn't follow wq_unbound_cpumask updates. Ordered
   workqueues now follow the same rules as other unbound workqueues.
 
 - More CPU isolation improvements: Juri fixed another deficit in workqueue
   isolation where unbound rescuers don't respect wq_unbound_cpumask.
   Leonardo fixed delayed_work timers firing on isolated CPUs.
 
 - Other misc changes.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZe7JCQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGcnqAP9UP8zEM1la19cilhboDumxmRWyRpV/egFOqsMP
 Y5PuoAEAtsBJtQWtm5w46+y+fk3nK2ugXlQio2gH0qQcxX6SdgQ=
 =/ovv
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "This cycle, a lot of workqueue changes including some that are
  significant and invasive.

   - During v6.6 cycle, unbound workqueues were updated so that they are
     more topology aware and flexible, which among other things improved
     workqueue behavior on modern multi-L3 CPUs. In the process, commit
     636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu
     pool_workqueues") switched unbound workqueues to use per-CPU
     frontend pool_workqueues as a part of increasing front-back mapping
     flexibility.

     An unwelcome side effect of this change was that this made max
     concurrency enforcement per-CPU blowing up the maximum number of
     allowed concurrent executions. I incorrectly assumed that this
     wouldn't cause practical problems as most unbound workqueue users
     are self-regulate max concurrency; however, there definitely are
     which don't (e.g. on IO paths) and the drastic increase in the
     allowed max concurrency led to noticeable perf regressions in some
     use cases.

     This is now addressed by separating out max concurrency enforcement
     to a separate struct - wq_node_nr_active - which makes @max_active
     consistently mean system-wide max concurrency regardless of the
     number of CPUs or (finally) NUMA nodes. This is a rather invasive
     and, in places, a bit clunky; however, the clunkiness rises from
     the the inherent requirement to handle the disagreement between the
     execution locality domain and max concurrency enforcement domain on
     some modern machines.

     See commit 5797b1c18919 ("workqueue: Implement system-wide
     nr_active enforcement for unbound workqueues") for more details.

   - BH workqueue support is added.

     They are similar to per-CPU workqueues but execute work items in
     the softirq context. This is expected to replace tasklet. However,
     currently, it's missing the ability to disable and enable work
     items which is needed to convert many tasklet users. To avoid
     crowding this merge window too much, this will be included in the
     next merge window. A separate pull request will be sent for the
     couple conversion patches that are currently pending.

   - Waiman plugged a long-standing hole in workqueue CPU isolation
     where ordered workqueues didn't follow wq_unbound_cpumask updates.
     Ordered workqueues now follow the same rules as other unbound
     workqueues.

   - More CPU isolation improvements: Juri fixed another deficit in
     workqueue isolation where unbound rescuers don't respect
     wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on
     isolated CPUs.

   - Other misc changes"

* tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits)
  workqueue: Drain BH work items on hot-unplugged CPUs
  workqueue: Introduce from_work() helper for cleaner callback declarations
  workqueue: Control intensive warning threshold through cmdline
  workqueue: Make @flags handling consistent across set_work_data() and friends
  workqueue: Remove clear_work_data()
  workqueue: Factor out work_grab_pending() from __cancel_work_sync()
  workqueue: Clean up enum work_bits and related constants
  workqueue: Introduce work_cancel_flags
  workqueue: Use variable name irq_flags for saving local irq flags
  workqueue: Reorganize flush and cancel[_sync] functions
  workqueue: Rename __cancel_work_timer() to __cancel_timer_sync()
  workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held()
  workqueue: Cosmetic changes
  workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK
  workqueue: Fix queue_work_on() with BH workqueues
  async: Use a dedicated unbound workqueue with raised min_active
  workqueue: Implement workqueue_set_min_active()
  workqueue: Fix kernel-doc comment of unplug_oldest_pwq()
  workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask
  kernel/workqueue: Let rescuers follow unbound wq cpumask changes
  ...
2024-03-11 12:50:42 -07:00
Linus Torvalds
8ede842f66 Rust changes for v6.9
Another routine one in terms of features. We got two version upgrades
 this time, but in terms of lines, 'alloc' changes are not very large.
 
 Toolchain and infrastructure:
 
  - Upgrade to Rust 1.76.0.
 
    This time around, due to how the kernel and Rust schedules have
    aligned, there are two upgrades in fact. These allow us to remove two
    more unstable features ('const_maybe_uninit_zeroed' and
    'ptr_metadata') from the list, among other improvements.
 
  - Mark 'rustc' (and others) invocations as recursive, which fixes a new
    warning and prepares us for the future in case we eventually take
    advantage of the Make jobserver.
 
 'kernel' crate:
 
  - Add the 'container_of!' macro.
 
  - Stop using the unstable 'ptr_metadata' feature by employing the now
    stable 'byte_sub' method to implement 'Arc::from_raw()'.
 
  - Add the 'time' module with a 'msecs_to_jiffies()' conversion function
    to begin with, to be used by Rust Binder.
 
  - Add 'notify_sync()' and 'wait_interruptible_timeout()' methods to
    'CondVar', to be used by Rust Binder.
 
  - Update integer types for 'CondVar'.
 
  - Rename 'wait_list' field to 'wait_queue_head' in 'CondVar'.
 
  - Implement 'Display' and 'Debug' for 'BStr'.
 
  - Add the 'try_from_foreign()' method to the 'ForeignOwnable' trait.
 
  - Add reexports for macros so that they can be used from the right
    module (in addition to the root).
 
  - A series of code documentation improvements, including adding
    intra-doc links, consistency improvements, typo fixes...
 
 'macros' crate:
 
  - Place generated 'init_module()' function in '.init.text'.
 
 Documentation:
 
  - Add documentation on Rust doctests and how they work.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmXsZXsACgkQGXyLc2ht
 IW26LQ//QdJvnkqwrhijfFchTZSc1SuPPb88yeUveVv9Ve568EObkGuvlFKo+OLB
 vt16h+0/LFIW32ZbJ1GeXYsmztOjc3xfyUoSi0Le9jDcffiO+km1DRFAkTVTlYha
 18h01bJs/55JuIjU7UkKrxav6pNqBoNGOkkUvWdlitzqdw+kG0ad/7XiUomoAOI3
 AEibG2Vltr0DmazW2sZLs4Ae9ytOBPuyMeRoh8WaxiFWz/Rtq3qCNN9ww5Et9RKl
 7nhjoc6r2nweavE0oCilYhoFDl6fblhRUSGBCpF1nBOdG6KyrJswdAlv3xpncC/u
 TSZ+6N1BMn+xgPP4ftv0kG8TXm/AcInjiOlbOfnx+UX/R3laxfNrTrjpDzftc4Qm
 f+ygKefMClBCMHPlXu4OXCpL5C52p1GLK7q+q5PqF60P4qGoW6M3Vx6S8h9jT1oE
 kta+p0Rh3tz0YKwxPHcESuFdimkGh5+9zgAmbc3lKJ/uJ0AIdeEscriQn1S3xLF2
 De57l2iGO7OpMzV8T9hf4rQImTVOvd9zpoyPF0aMRymoxiy3kQtG8WVIkVixIDPW
 LKkoQif0Eh4r28rHBZ2Hvt5tC9ZYTSZP1MgDl8dQGi+5h4fmcN7WvcdciCYOPBRc
 em8ifLgCB77DuRhA6AWV5p0IgDDC0aHL6UAF8qm5vSyb6HcoGhE=
 =BAGo
 -----END PGP SIGNATURE-----

Merge tag 'rust-6.9' of https://github.com/Rust-for-Linux/linux

Pull Rust updates from Miguel Ojeda:
 "Another routine one in terms of features. We got two version upgrades
  this time, but in terms of lines, 'alloc' changes are not very large.

  Toolchain and infrastructure:

   - Upgrade to Rust 1.76.0

     This time around, due to how the kernel and Rust schedules have
     aligned, there are two upgrades in fact. These allow us to remove
     two more unstable features ('const_maybe_uninit_zeroed' and
     'ptr_metadata') from the list, among other improvements

   - Mark 'rustc' (and others) invocations as recursive, which fixes a
     new warning and prepares us for the future in case we eventually
     take advantage of the Make jobserver

  'kernel' crate:

   - Add the 'container_of!' macro

   - Stop using the unstable 'ptr_metadata' feature by employing the now
     stable 'byte_sub' method to implement 'Arc::from_raw()'

   - Add the 'time' module with a 'msecs_to_jiffies()' conversion
     function to begin with, to be used by Rust Binder

   - Add 'notify_sync()' and 'wait_interruptible_timeout()' methods to
     'CondVar', to be used by Rust Binder

   - Update integer types for 'CondVar'

   - Rename 'wait_list' field to 'wait_queue_head' in 'CondVar'

   - Implement 'Display' and 'Debug' for 'BStr'

   - Add the 'try_from_foreign()' method to the 'ForeignOwnable' trait

   - Add reexports for macros so that they can be used from the right
     module (in addition to the root)

   - A series of code documentation improvements, including adding
     intra-doc links, consistency improvements, typo fixes...

  'macros' crate:

   - Place generated 'init_module()' function in '.init.text'

  Documentation:

   - Add documentation on Rust doctests and how they work"

* tag 'rust-6.9' of https://github.com/Rust-for-Linux/linux: (29 commits)
  rust: upgrade to Rust 1.76.0
  kbuild: mark `rustc` (and others) invocations as recursive
  rust: add `container_of!` macro
  rust: str: implement `Display` and `Debug` for `BStr`
  rust: module: place generated init_module() function in .init.text
  rust: types: add `try_from_foreign()` method
  docs: rust: Add description of Rust documentation test as KUnit ones
  docs: rust: Move testing to a separate page
  rust: kernel: stop using ptr_metadata feature
  rust: kernel: add reexports for macros
  rust: locked_by: shorten doclink preview
  rust: kernel: remove unneeded doclink targets
  rust: kernel: add doclinks
  rust: kernel: add blank lines in front of code blocks
  rust: kernel: mark code fragments in docs with backticks
  rust: kernel: unify spelling of refcount in docs
  rust: str: move SAFETY comment in front of unsafe block
  rust: str: use `NUL` instead of 0 in doc comments
  rust: kernel: add srctree-relative doclinks
  rust: ioctl: end top-level module docs with full stop
  ...
2024-03-11 12:31:28 -07:00
Linus Torvalds
e5a3878c94 RCU pull request for v6.9
This pull request contains the following branches:
 
 rcu-doc.2024.02.14a: Documentation updates.
 
 rcu-nocb.2024.02.14a: RCU NOCB updates, code cleanups, unnecessary
         barrier removals and minor bug fixes.
 
 rcu-exp.2024.02.14a: RCU exp, fixing a circular dependency between
         workqueue and RCU expedited callback handling.
 
 rcu-tasks.2024.02.26a: RCU tasks, avoiding deadlocks in do_exit() when
         calling synchronize_rcu_task() with a mutex hold, maintaining
 	real-time response in rcu_tasks_postscan() and a minor
         fix for tasks trace quiescence check.
 
 rcu-misc.2024.02.14a: Misc updates, comments and readibility
 	improvement, boot time parameter for lazy RCU and rcutorture
 	improvement.
 -----BEGIN PGP SIGNATURE-----
 
 iQFJBAABCAAzFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmXev80VHGJvcXVuLmZl
 bmdAZ21haWwuY29tAAoJEEl56MO1B/q4UYgH/3CQF495sAS58M3tsy/HCMbq8DUb
 9AoIKCdzqvN2xzjYxHHs59jA+MdEIOGbSIx1yWk0KZSqRSfxwd9nGbxO5EHbz6L3
 gdZdOHbpZHPmtcUbdOfXDyhy4JaF+EBuRp9FOnsJ+w4/a0lFWMinaic4BweMEESS
 y+gD5fcMzzCthedXn/HeQpeYUKOQ8Jpth5K5s4CkeaehEbdRVLFxjwFgQYd8Oeqn
 0SfjNMRdBubDxydi4Rx1Ado7mKnfBHoot+9l0PHi6T2Rq89H0AUn/Dj3YOEkW7QT
 aKRSVpPJnG3EFHUUzwprODAoQGOC6EpTVpxSqnpO2ewHnnMPhz/IXzRT86w=
 =gypc
 -----END PGP SIGNATURE-----

Merge tag 'rcu.next.v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux

Pull RCU updates from Boqun Feng:

 - Eliminate deadlocks involving do_exit() and RCU tasks, by Paul:
   Instead of SRCU read side critical sections, now a percpu list is
   used in do_exit() for scaning yet-to-exit tasks

 - Fix a deadlock due to the dependency between workqueue and RCU
   expedited grace period, reported by Anna-Maria Behnsen and Thomas
   Gleixner and fixed by Frederic: Now RCU expedited always uses its own
   kthread worker instead of a workqueue

 - RCU NOCB updates, code cleanups, unnecessary barrier removals and
   minor bug fixes

 - Maintain real-time response in rcu_tasks_postscan() and a minor fix
   for tasks trace quiescence check

 - Misc updates, comments and readibility improvement, boot time
   parameter for lazy RCU and rcutorture improvement

 - Documentation updates

* tag 'rcu.next.v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux: (34 commits)
  rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
  rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
  rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
  rcu-tasks: Initialize callback lists at rcu_init() time
  rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
  rcu-tasks: Repair RCU Tasks Trace quiescence check
  rcu/sync: remove un-used rcu_sync_enter_start function
  rcutorture: Suppress rtort_pipe_count warnings until after stalls
  srcu: Improve comments about acceleration leak
  rcu: Provide a boot time parameter to control lazy RCU
  rcu: Rename jiffies_till_flush to jiffies_lazy_flush
  doc: Update checklist.rst discussion of callback execution
  doc: Clarify use of slab constructors and SLAB_TYPESAFE_BY_RCU
  context_tracking: Fix kerneldoc headers for __ct_user_{enter,exit}()
  doc: Add EARLY flag to early-parsed kernel boot parameters
  doc: Add CONFIG_RCU_STRICT_GRACE_PERIOD to checklist.rst
  doc: Make checklist.rst note that spinlocks are implied RCU readers
  doc: Make whatisRCU.rst note that spinlocks are RCU readers
  doc: Spinlocks are implied RCU readers
  ...
2024-03-11 12:02:50 -07:00
Linus Torvalds
0f1a876682 vfs-6.9.uuid
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem5LwAKCRCRxhvAZXjc
 onZsAQCjMNabNWAty2VBAQrNIpGkZ+AMA2DxEajPldaPiJH5zQEA9ea7feB3T47i
 NUrXXfMQ5DSop+k5Y65pPkEpbX4rhQo=
 =NZgd
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs uuid updates from Christian Brauner:
 "This adds two new ioctl()s for getting the filesystem uuid and
  retrieving the sysfs path based on the path of a mounted filesystem.
  Getting the filesystem uuid has been implemented in filesystem
  specific code for a while it's now lifted as a generic ioctl"

* tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: add support for FS_IOC_GETFSSYSFSPATH
  fs: add FS_IOC_GETFSSYSFSPATH
  fat: Hook up sb->s_uuid
  fs: FS_IOC_GETUUID
  ovl: convert to super_set_uuid()
  fs: super_set_uuid()
2024-03-11 11:02:06 -07:00
Linus Torvalds
77417942e4 vfs-6.9.ntfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem42QAKCRCRxhvAZXjc
 opOtAQDUkiJNaOu3fR6ENLvDZSFmaI2jQXIL8ulHYpEiFrXmKwD9EZQ8bmEYU7uO
 WN4VM8p8UwQ7BmIV9b+jvwciF8Qi8QI=
 =T03q
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull ntfs update from Christian Brauner:
 "This removes the old ntfs driver. The new ntfs3 driver is a full
  replacement that was merged over two years ago. We've went through
  various userspace and either they use ntfs3 or they use the fuse
  version of ntfs and thus build neither ntfs nor ntfs3. I think that's
  a clear sign that we should risk removing the legacy ntfs driver.

  Quoting from Arch Linux and Debian:

   - Debian does neither build the legacy ntfs nor the new ntfs3:

     "Not currently built with Debian's kernel packages, 'ntfs' has been
      symlinked to 'ntfs-3g' as it relates to fstab and mount commands.

      Debian kernels are built without support of the ntfs3 driver
      developed by Paragon Software."  (cf. [2])

   - Archlinux provides ntfs3 as their default since 5.15:

     "All officially supported kernels with versions 5.15 or newer are
      built with CONFIG_NTFS3_FS=m and thus support it. Before 5.15,
      NTFS read and write support is provided by the NTFS-3G FUSE file
      system."  (cf. [1]).

  It's unmaintained apart from various odd fixes as well. Worst case we
  have to reintroduce it if someone really has a valid dependency on it.
  But it's worth trying to see whether we can remove it"

Link: https://wiki.archlinux.org/title/NTFS [1]
Link: https://wiki.debian.org/NTFS [2]

* tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: remove NTFS classic from docum. index
  fs: Remove NTFS classic
2024-03-11 09:55:17 -07:00
Linus Torvalds
7ea65c89d8 vfs-6.9.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem3wQAKCRCRxhvAZXjc
 otRMAQDeo8qsuuIAcS2KUicKqZR5yMVvrY9r4sQzf7YRcJo5HQD+NQXkKwQuv1VO
 OUeScsic/+I+136AgdjWnlEYO5dp0go=
 =4WKU
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Misc features, cleanups, and fixes for vfs and individual filesystems.

  Features:

   - Support idmapped mounts for hugetlbfs.

   - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
     where the passed offset is ignored if the file is O_APPEND. The new
     flag allows a caller to enforce that the offset is honored to
     conform to posix even if the file was opened in append mode.

   - Move i_mmap_rwsem in struct address_space to avoid false sharing
     between i_mmap and i_mmap_rwsem.

   - Convert efs, qnx4, and coda to use the new mount api.

   - Add a generic is_dot_dotdot() helper that's used by various
     filesystems and the VFS code instead of open-coding it multiple
     times.

   - Recently we've added stable offsets which allows stable ordering
     when iterating directories exported through NFS on e.g., tmpfs
     filesystems. Originally an xarray was used for the offset map but
     that caused slab fragmentation issues over time. This switches the
     offset map to the maple tree which has a dense mode that handles
     this scenario a lot better. Includes tests.

   - Finally merge the case-insensitive improvement series Gabriel has
     been working on for a long time. This cleanly propagates case
     insensitive operations through ->s_d_op which in turn allows us to
     remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
     It also improves performance by trying a case-sensitive comparison
     first and then fallback to case-insensitive lookup if that fails.
     This also fixes a bug where overlayfs would be able to be mounted
     over a case insensitive directory which would lead to all sort of
     odd behaviors.

  Cleanups:

   - Make file_dentry() a simple accessor now that ->d_real() is
     simplified because of the backing file work we did the last two
     cycles.

   - Use the dedicated file_mnt_idmap helper in ntfs3.

   - Use smp_load_acquire/store_release() in the i_size_read/write
     helpers and thus remove the hack to handle i_size reads in the
     filemap code.

   - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
     fs/

   - It's no longer necessary to perform a second built-in initramfs
     unpack call because we retain the contents of the previous
     extraction. Remove it.

   - Now that we have removed various allocators kfree_rcu() always
     works with kmem caches and kmalloc(). So simplify various places
     that only use an rcu callback in order to handle the kmem cache
     case.

   - Convert the pipe code to use a lockdep comparison function instead
     of open-coding the nesting making lockdep validation easier.

   - Move code into fs-writeback.c that was located in a header but can
     be made static as it's only used in that one file.

   - Rewrite the alignment checking iterators for iovec and bvec to be
     easier to read, and also significantly more compact in terms of
     generated code. This saves 270 bytes of text on x86-64 (with
     clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
     saves a bit of time for the same workload.

   - Switch various places to use KMEM_CACHE instead of
     kmem_cache_create().

   - Use inode_set_ctime_to_ts() in inode_set_ctime_current()

   - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.

   - Various smaller cleanups for eventfds.

  Fixes:

   - Fix various comments and typos, and unneeded initializations.

   - Fix stack allocation hack for clang in the select code.

   - Improve dump_mapping() debug code on a best-effort basis.

   - Fix build errors in various selftests.

   - Avoid wrap-around instrumentation in various places.

   - Don't allow user namespaces without an idmapping to be used for
     idmapped mounts.

   - Fix sysv sb_read() call.

   - Fix fallback implementation of the get_name() export operation"

* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
  hugetlbfs: support idmapped mounts
  qnx4: convert qnx4 to use the new mount api
  fs: use inode_set_ctime_to_ts to set inode ctime to current time
  libfs: Drop generic_set_encrypted_ci_d_ops
  ubifs: Configure dentry operations at dentry-creation time
  f2fs: Configure dentry operations at dentry-creation time
  ext4: Configure dentry operations at dentry-creation time
  libfs: Add helper to choose dentry operations at mount-time
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  fscrypt: Drop d_revalidate once the key is added
  fscrypt: Drop d_revalidate for valid dentries during lookup
  fscrypt: Factor out a helper to configure the lookup dentry
  ovl: Always reject mounting over case-insensitive directories
  libfs: Attempt exact-match comparison first during casefolded lookup
  efs: remove SLAB_MEM_SPREAD flag usage
  jfs: remove SLAB_MEM_SPREAD flag usage
  minix: remove SLAB_MEM_SPREAD flag usage
  openpromfs: remove SLAB_MEM_SPREAD flag usage
  proc: remove SLAB_MEM_SPREAD flag usage
  qnx6: remove SLAB_MEM_SPREAD flag usage
  ...
2024-03-11 09:38:17 -07:00
Linus Torvalds
d451b075f7 linux_kselftest-next-6.9-rc1
This kselftest next update for Linux 6.9-rc1 consists of:
 
 -- livepatch restructuring to move the module out of lib to be
    built as a out-of-tree modules during kselftest build. This
    change makes it easier change, debug and rebuild the tests by
    running make on the selftests/livepatch directory, which is not
    currently possible since the modules on lib/livepatch are build
    and installed using the main makefile modules target.
 
 -- livepatch restructuring fixes for problems found by kernel test
    robot. The change skips the test if kernel-devel isn't installed
    (default value of KDIR), or if KDIR variable passed doesn't exists.
 
 -- resctrl test restructuring and new non-contiguous CBMs CAT test
 
 -- new ktap_helpers to print diagnostic messages, pass/fail tests
    based on exit code, abort test, and finish the test.
 
 -- a new test verify power supply properties.
 
 -- a new ftrace to exercise function tracer across cpu hotplug.
 
 -- timeout increase for mqueue test to allow the test to run on
    i3.metal AWS instances.
 
 -- minor spelling corrections in several tests.
 
 -- missing gitignore files and changes to existing gitignore files.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmXo/kUACgkQCwJExA0N
 Qxy0aBAAk0SLA1ZIAdlNjo5B13C7GC7rFRrtaai9ReXSvU/X5TX9sD5T9DIULKdj
 Mcqi+oaP88GPSUZS+bn7DyVxKyuvHg/f4jWQwqZ34WxK4K1K+yt+3YhTnHZx7ezU
 6WIbUsD1Zs7tXXI2v76riHFbD3pfxZ+AXQaf/1cXDi4SpIpLkiqyeYWoWN5Z2rtJ
 BwMzrI2RBiLMox4g8F3Ey4BX+bOIYiiJq5bdl7gJVKcp74VdU3S7IyOuXFbSdcFR
 xxmFMxWGFOgRzexW0fmDWLudD2dII0XQAExSsl5xMnR/lmSh+lHWheoNgphQl050
 VcLmrPugWVJSioe0fHEgmDQXe3lPqDtepUg921tIlWvCmtR3Ur6+GpILTbSvQ4qp
 SK+2pt7nGSAT2UkRO/6/TYFG3mELADvj6tglj0b1SkIXmNiF+7OZ+hJ2XqyM7peo
 Z7gtmSmpbAotxp64Jj8HsNZLpCX0xdaxoTMEWPoG09fwTXY7Hy03yoWDKBKB4MZ9
 jBtNXDolhpEQ/ppSGFnRPzXuNVapYX28UY0cwBBVgke5jwB8SUnBEr2dbNnVU1q0
 y5uxtj/EFQzxSynB3eM1us2OuXvr5TfAWmKVpyE/cNC3WreHeA+Y2kN1dzv8hgpw
 o4NbltdF8F+a9qQF9B1XvjVhqa5By1esS1jOg96cJgGseAVWiQs=
 =G+DO
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest update from Shuah Khan:

 - livepatch restructuring to move the module out of lib to be built as
   a out-of-tree modules during kselftest build. This makes it easier
   change, debug and rebuild the tests by running make on the
   selftests/livepatch directory, which is not currently possible since
   the modules on lib/livepatch are build and installed using the main
   makefile modules target.

 - livepatch restructuring fixes for problems found by kernel test
   robot. The change skips the test if kernel-devel isn't installed
   (default value of KDIR), or if KDIR variable passed doesn't exists.

 - resctrl test restructuring and new non-contiguous CBMs CAT test

 - new ktap_helpers to print diagnostic messages, pass/fail tests based
   on exit code, abort test, and finish the test.

 - a new test verify power supply properties.

 - a new ftrace to exercise function tracer across cpu hotplug.

 - timeout increase for mqueue test to allow the test to run on i3.metal
   AWS instances.

 - minor spelling corrections in several tests.

 - missing gitignore files and changes to existing gitignore files.

* tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits)
  kselftest: Add basic test for probing the rust sample modules
  selftests: lib.mk: Do not process TEST_GEN_MODS_DIR
  selftests: livepatch: Avoid running the tests if kernel-devel is missing
  selftests: livepatch: Add initial .gitignore
  selftests/resctrl: Add non-contiguous CBMs CAT test
  selftests/resctrl: Add resource_info_file_exists()
  selftests/resctrl: Split validate_resctrl_feature_request()
  selftests/resctrl: Add a helper for the non-contiguous test
  selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
  selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy"
  selftests/mqueue: Set timeout to 180 seconds
  selftests/ftrace: Add test to exercize function tracer across cpu hotplug
  selftest: ftrace: fix minor typo in log
  selftests: thermal: intel: workload_hint: add missing gitignore
  selftests: thermal: intel: power_floor: add missing gitignore
  selftests: uevent: add missing gitignore
  selftests: Add test to verify power supply properties
  selftests: ktap_helpers: Add a helper to finish the test
  selftests: ktap_helpers: Add a helper to abort the test
  selftests: ktap_helpers: Add helper to pass/fail test based on exit code
  ...
2024-03-11 09:25:33 -07:00
David S. Miller
f541fd7adf Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
ethtool: ice: Support for RSS settings to GTP

Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.

A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`

In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.

gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11 09:33:01 +00:00
Linus Torvalds
137e0ec05a KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable
   from userspace, so there would be no way to write to a read-only
   guest_memfd).
 
 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely for development and testing.
 
 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.
 
 - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes.
 
 x86 fixes:
 
 - Fix missing marking of a guest page as dirty when emulating an atomic access.
 
 - Check for mmu_notifier invalidation events before faulting in the pfn,
   and before acquiring mmu_lock, to avoid unnecessary work and lock
   contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC
   in non-preemptible mode).
 
 - Disable AMD DebugSwap by default, it breaks VMSA signing and will be
   re-enabled with a better VM creation API in 6.10.
 
 - Do the cache flush of converted pages in svm_register_enc_region() before
   dropping kvm->lock, to avoid a race with unregistering of the same region
   and the consequent use-after-free issue.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmXskdYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroN1TAf/SUGf4QuYG7nnfgWDR+goFO6Gx7NE
 pJr3kAwv6d2f+qTlURfGjnX929pgZDLgoTkXTNeZquN6LjgownxMjBIpymVobvAD
 AKvqJS/ECpryuehXbeqlxJxJn+TrxJ5r4QeNILMHc3AOZoiUqM6xl3zFfXWDNWVo
 IazwT8P3d8wxiHAxv1eG6OVWHxbcg31068FVKRX3f/bWPbVwROJrPkCopmz2BJvU
 6KYdYcn2rkpDTEM3ouDC/6gxJ9vpSY3+nW7Q7dNtGtOH2+BddfSA6I0rphCQWCNs
 uXOxd5bDrC+KmkiULTPostuvwBgIm1k9wC2kW9A4P2VEf6Ay+ZHEdAOBJQ==
 =+MT/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "KVM GUEST_MEMFD fixes for 6.8:

   - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
     to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
     writable from userspace, so there would be no way to write to a
     read-only guest_memfd).

   - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
     clear that such VMs are purely for development and testing.

   - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
     plan is to support confidential VMs with deterministic private
     memory (SNP and TDX) only in the TDP MMU.

   - Fix a bug in a GUEST_MEMFD dirty logging test that caused false
     passes.

  x86 fixes:

   - Fix missing marking of a guest page as dirty when emulating an
     atomic access.

   - Check for mmu_notifier invalidation events before faulting in the
     pfn, and before acquiring mmu_lock, to avoid unnecessary work and
     lock contention with preemptible kernels (including
     CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).

   - Disable AMD DebugSwap by default, it breaks VMSA signing and will
     be re-enabled with a better VM creation API in 6.10.

   - Do the cache flush of converted pages in svm_register_enc_region()
     before dropping kvm->lock, to avoid a race with unregistering of
     the same region and the consequent use-after-free issue"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  SEV: disable SEV-ES DebugSwap by default
  KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
  KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
  KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
  KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
  KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
  KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
  KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
  KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-10 09:27:39 -07:00
Jakub Kicinski
d7e14e5344 Support Multi-PF netdev (Socket Direct)
This series adds support for combining multiple devices (PFs) of the
 same port under one netdev instance. Passing traffic through different
 devices belonging to different NUMA sockets saves cross-numa traffic and
 allows apps running on the same netdev from different numas to still
 feel a sense of proximity to the device and achieve improved
 performance.
 
 We achieve this by grouping PFs together, and creating the netdev only
 once all group members are probed. Symmetrically, we destroy the netdev
 once any of the PFs is removed.
 
 The channels are distributed between all devices, a proper configuration
 would utilize the correct close numa when working on a certain app/cpu.
 
 We pick one device to be a primary (leader), and it fills a special
 role.  The other devices (secondaries) are disconnected from the network
 in the chip level (set to silent mode). All RX/TX traffic is steered
 through the primary to/from the secondaries.
 
 Currently, we limit the support to PFs only, and up to two devices
 (sockets).
 
 V6:
 - Address documentation comments from Jakub.
 
 V5:
  - Address documentation comments from Przemek Kitszel.
 
 V4:
  - Improve documentation for better user observability and understanding
    of the feature, in terms of queues and their expected NUMA/CPU/IRQ
    affinity.
 
 V3:
  - Fix documentation per Jakubs feedback.
  - Fix typos
  - Link new documentation in the networking index.rst
 
 V2:
  - Add documentation in a new patch.
  - Add debugfs in a new patch.
  - Add mlx5_ifc bit for MPIR cap check and use it before query.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXpfYgACgkQSD+KveBX
 +j5jIAf/VGIX/UQttq74MzK9pWgJNKtf7l8aSYtZuKXx68pmpr+25DfsxbKEeVfy
 KzjvGFx5peoKisWILyaljQXSn7snmSqOsQf/IwDzmsmF/2ZTDyf6NPC6gND0bIjJ
 Uu6cJ2T6Sa9ktg+ANz/gLDvGBBfPqSYTYIXrJnNQKsnW6nV8mDvy4WVf6etvCxOi
 rMjfcqwNijf3GPTJd/qkaWhwneDG2AFWd5HzdORpNh6iuv8Cbc9aNhWgAPh18o7v
 VWuAiFraTgaz6jj2H/NfziAk4ZrtVsCqhaFjJe3eLO+MCk/bZ/SizsAcR61JLkjL
 pFqh5wqxA6v+5YJm4zVatZqPLIt4gQ==
 =GZBa
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Support Multi-PF netdev (Socket Direct)

This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.

We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.

The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.

We pick one device to be a primary (leader), and it fills a special
role.  The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.

Currently, we limit the support to PFs only, and up to two devices
(sockets).

* tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  Documentation: networking: Add description for multi-pf netdev
  net/mlx5: Enable SD feature
  net/mlx5e: Block TLS device offload on combined SD netdev
  net/mlx5e: Support per-mdev queue counter
  net/mlx5e: Support cross-vhca RSS
  net/mlx5e: Let channels be SD-aware
  net/mlx5e: Create EN core HW resources for all secondary devices
  net/mlx5e: Create single netdev per SD group
  net/mlx5: SD, Add debugfs
  net/mlx5: SD, Add informative prints in kernel log
  net/mlx5: SD, Implement steering for primary and secondaries
  net/mlx5: SD, Implement devcom communication and primary election
  net/mlx5: SD, Implement basic query and instantiation
  net/mlx5: SD, Introduce SD lib
  net/mlx5: Add MPIR bit in mcam_access_reg
====================

Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 20:45:17 -08:00
Linus Torvalds
6dfeb04c46 sound fixes for 6.8-final
A collection of small fixes.  A half of them are HD-audio quirks while
 the rest are various device-specific ASoC fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmXqx+YOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8zIw//cuFpJBmqj/Qkk40BBkPTVIGF42K5IE+zYuxr
 kO4z4Rq6uI/s1T9pgCGU0c8rLsBfkeW9k6M6fglDusLD4zl7A4NjEOVCl3/rpAwg
 rQoglgEfBue1PZ/3yVGI1PxBaCGOdoKxyBDNy3dwAzZBe+PJ3cbAsvflAsAK/XqD
 vjV9SMMgxIfOqWJTAYXKTnk2VSoyFdKulK5n9Eb3941Bj43YOpo0TPmD0YXfmf1b
 sCvDzGkfUdDM3hkDLlI/uY9T/7vFLYMN9ktF+BEdDmqeVZBwMEnqW7neF3t8uFjn
 6OsWCwlU6jHxe3texMEeGwyXDETnK3YSCiYPZQClEuDG6rkOu9XEzsTCuJLK5yQR
 Q9iY9/R/3UOCa/ykyISAi0oZtL8HgASo2S8FyBiw8bYV7dGw2oXCx5ieLVntuE0R
 ktSUm8/F0esQ8D3EPdQ4H+St5xSWUIz6vX53T7zJGZ5EOr0Bv9W38L7jXHbmxzd0
 GodPCNGOSzNgLxg+pDjj0smJEury14ASTyF5wGGd2SF00cHB71QPlORbcBD44X6q
 PxFSZ3R56PWVyrsD6IONrP88af9M9zM/tR458pYi+eoRFyYTungFZDvSeWzdGA37
 7i/z396TzYQrkXSoTfLef0R0vIt2s816ZUvNluikBWQJnvDxrUvoNFCwV0Qa8bKn
 jh7EDfM=
 =TMCk
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes. Half of them are HD-audio quirks while
  the rest are various device-specific ASoC fixes"

* tag 'sound-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll
  ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
  ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
  ASoC: dt-bindings: nvidia: Fix 'lge' vendor prefix
  ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook
  ASoC: amd: yc: Add HP Pavilion Aero Laptop 13-be2xxx(8BD6) into DMI quirk table
  ASoC: rcar: adg: correct TIMSEL setting for SSI9
  ALSA: hda: cs35l41: Overwrite CS35L41 configuration for ASUS UM5302LA
  ALSA: hda/realtek: Add quirks for Lenovo Thinkbook 16P laptops
  ALSA: hda: cs35l41: Support Lenovo Thinkbook 16P
  ALSA: hda/realtek - Add Headset Mic supported Acer NB platform
  ALSA: hda: optimize the probe codec process
  ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
  ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet
  ASoC: madera: Fix typo in madera_set_fll_clks shift value
2024-03-08 13:01:16 -08:00
Jakub Kicinski
75c2946db3 wireless-next patches for v6.9
The fourth "new features" pull request for v6.9 with changes both in
 stack and in drivers. The theme in this pull request is to fix sparse
 warnings but we still have some left in wireless subsystem. Otherwise
 quite normal.
 
 Major changes:
 
 rtw89
 
 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support
 
 * NL80211_EXT_FEATURE_SET_SCAN_DWELL support
 
 rtw88
 
 * support for more rtw8811cu and rtw8821cu devices
 
 mt76
 
 * mt76x2u: add Netgear WNDA3100v3 USB
 
 * mt7915: newer ADIE version support
 
 * mt7925: radio temperature sensor support
 
 * mt7996: remove GCMP IGTK offload
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXq4hARHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtOawf9Gf2FAi56zA/4vKJPE/mZzRvNodj/u9WL
 mEX3KERw744IEmWY0yXEAyvzKkkNqUUtmdUbbsnXnnEtzsVZ2oRmOZdXsvEW3vOD
 IEsjWY/405MBWyuBttAa6orBSgelr99k86HzoLN86s52HmliVDhr2EUnYIf2O++9
 SVhHFKE4BMVCO6hlyEg419K9M2VhWtBDNYweoXAfn8Y1byAw6Pt6WunjRuGwJG5n
 qvcrZcFCFSa3daPpx0uIA/yiSjZlq0hwVC3r/PnoX/r1FDR8tS2ecvC2rP3MaZJ+
 1x3IcNvwC97D80wvdW+f+qKtV4OXZefsZpzJJpvREH8FbAgYLDef0Q==
 =gln7
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The fourth "new features" pull request for v6.9 with changes both in
stack and in drivers. The theme in this pull request is to fix sparse
warnings but we still have some left in wireless subsystem. Otherwise
quite normal.

Major changes:

rtw89
 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support
 * NL80211_EXT_FEATURE_SET_SCAN_DWELL support

rtw88
 * support for more rtw8811cu and rtw8821cu devices

mt76
 * mt76x2u: add Netgear WNDA3100v3 USB
 * mt7915: newer ADIE version support
 * mt7925: radio temperature sensor support
 * mt7996: remove GCMP IGTK offload

* tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (125 commits)
  wifi: rtw89: wow: move release offload packet earlier for WoWLAN mode
  wifi: rtw89: wow: set security engine options for 802.11ax chips only
  wifi: rtw89: update suspend/resume for different generation
  wifi: rtw89: wow: update config mac function with different generation
  wifi: rtw89: update DMA function with different generation
  wifi: rtw89: wow: update WoWLAN status register for different generation
  wifi: rtw89: wow: update WoWLAN reason register for different chips
  wifi: brcm80211: handle pmk_op allocation failure
  wifi: rtw89: coex: Add coexistence policy to decrease WiFi packet CRC-ERR
  wifi: rtw89: coex: When Bluetooth not available don't set power/gain
  wifi: rtw89: coex: add return value to ensure H2C command is success or not
  wifi: rtw89: coex: Reorder H2C command index to align with firmware
  wifi: rtw89: coex: add BTC ctrl_info version 7 and related logic
  wifi: rtw89: coex: add init_info H2C command format version 7
  wifi: rtw89: 8922a: add coexistence helpers of SW grant
  wifi: rtw89: mac: add coexistence helpers {cfg/get}_plt
  wifi: cw1200: restore endian swapping
  wifi: wlcore: sdio: Rate limit wl12xx_sdio_raw_{read,write}() failures warns
  wifi: rtlwifi: Remove rtl_intf_ops.read_efuse_byte
  wifi: rtw88: 8821c: Fix false alarm count
  ...
====================

Link: https://lore.kernel.org/r/20240308100429.B8EA2C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 09:05:49 -08:00
Jakub Kicinski
6025b9135f net: dqs: add NIC stall detector based on BQL
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?

Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.

We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.

This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).

The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.

To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.

Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.

We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:23:26 +00:00
Jakub Kicinski
92f8b1f5ca netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers.
Support reporting those via netdev-genl queue stats.

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:13:26 -08:00
Jakub Kicinski
ab63a2387c netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:13:25 -08:00
Jiri Pirko
5c497a6482 dpll: spec: use proper enum for pin capabilities attribute
The enum is defined, however the pin capabilities attribute does
refer to it. Add this missing enum field.

This fixes ynl cli output:

Example current output:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-get --json '{"id": 0}'
{'capabilities': 4,
 ...
Example new output:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-get --json '{"id": 0}'
{'capabilities': {'state-can-change'},
 ...

Fixes: 3badff3a25d8 ("dpll: spec: Add Netlink spec in YAML")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240306120739.1447621-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:51:00 -08:00
Donald Hunter
768e044a5f doc/netlink/specs: Add spec for nlctrl netlink family
Add a spec for the nlctrl family.

Example usage:

./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/nlctrl.yaml \
    --do getfamily --json '{"family-name": "nlctrl"}'

./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/nlctrl.yaml \
    --dump getpolicy --json '{"family-name": "nlctrl"}'

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-7-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:38 -08:00
Donald Hunter
bc52b39309 doc/netlink: Allow empty enum-name in ynl specs
Update the ynl schemas to allow the specification of empty enum names
for all enum code generation.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-6-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:35 -08:00
Jérémie Dautheribes
b72413211b dt-bindings: net: dp83822: change ti,rmii-mode description
Drop reference to the 25MHz clock as it has nothing to do with connecting
the PHY and the MAC.
Add info about the reference clock direction between the PHY and the MAC
as it depends on the selected rmii mode.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240305141309.127669-1-jeremie.dautheribes@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:25:28 -08:00
Jakub Kicinski
e3afe5dd3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/page_pool_user.c
  0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
  429679dcf7d9 ("page_pool: fix netlink dump stop/resume")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 10:29:36 -08:00
Linus Torvalds
df4793505a Including fixes from bpf, ipsec and netfilter.
No solution yet for the stmmac issue mentioned in the last PR,
 but it proved to be a lockdep false positive, not a blocker.
 
 Current release - regressions:
 
   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers
 
 Current release - new code bugs:
 
   - page_pool: fix netlink dump stop/resume
 
 Previous releases - regressions:
 
   - bpf: fix verifier to check bpf_func_state->callback_depth when pruning
        states as otherwise unsafe programs could get accepted
 
   - ipv6: avoid possible UAF in ip6_route_mpath_notify()
 
   - ice: reconfig host after changing MSI-X on VF
 
   - mlx5:
     - e-switch, change flow rule destination checking
     - add a memory barrier to prevent a possible null-ptr-deref
     - switch to using _bh variant of of spinlock where needed
 
 Previous releases - always broken:
 
   - netfilter: nf_conntrack_h323: add protection for bmp length out of range
 
   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
 	program in CPU map which led to random xdp_md fields
 
   - xfrm: fix UDP encapsulation in TX packet offload
 
   - netrom: fix data-races around sysctls
 
   - ice:
     - fix potential NULL pointer dereference in ice_bridge_setlink()
     - fix uninitialized dplls mutex usage
 
   - igc: avoid returning frame twice in XDP_REDIRECT
 
   - i40e: disable NAPI right after disabling irqs when handling xsk_pool
 
   - geneve: make sure to pull inner header in geneve_rx()
 
   - sparx5: fix use after free inside sparx5_del_mact_entry
 
   - dsa: microchip: fix register write order in ksz8_ind_write8()
 
 Misc:
 
   -  selftests: mptcp: fixes for diag.sh
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmXptoYSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkK3IP+QGe1Q37l75YM8IPpihjNYvBTiP6VWv0
 3cKoI0kz2EF5zmt3RAPK1M/ea1GY1L4Fsa/tdV0b9BzP9xC3si7IdFLZLqXh5tUX
 tW5m1LIoPqYLXE2i7qtOS5omMuCqKm2gM7TURarJA0XsAGyu645bYiJeT5dybnZQ
 AuAsXKj9RM3AkcLiqB4PZjdDuG9vIQLi2wSIybP4KFGqY7UMRlkRKFYlu2rpF29s
 XPlR671chaX90sP4bNwf+qVr81Ebu9APmDA0a9tVFDkgEqhPezpRDGHr2Kj+W25s
 j3XXwoygL6gIpJKzRgHsugAaZjla82DpCuygPOcmtTEEtHmF6fn8mBebjY/QDL6w
 ibbcOYJpzPFccRfMyHiiwzjqcaj+Zc58DktFf3H4EnKJULPralhKyMoyPngiAo1Y
 wNIGlWR8SNLhJzyZMeFPMKsz3RnLiC5vMdXMFfZdyH1RHHib5L+8AVogya+SaVkF
 1J1DrrShOEddvlrbZbM8c/03WHkAJXSRD34oHW9c3PkZscSzHmB1xqI1bER6sc5U
 5FjuDnsQDQ61pa6pip2Ug71UOw6ZAwZJs6AgestI49caDvUpSKI7jg/F6Dle6wNT
 p2KVUWFoz5BQBXG8Ut7yWpWvoEmaHe0cEn03rqZSYFnltWgkNvWMRMhkzuroOHWO
 UmOnuVIQH9Vh
 =0bH0
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, ipsec and netfilter.

  No solution yet for the stmmac issue mentioned in the last PR, but it
  proved to be a lockdep false positive, not a blocker.

  Current release - regressions:

   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers

  Current release - new code bugs:

   - page_pool: fix netlink dump stop/resume

  Previous releases - regressions:

   - bpf: fix verifier to check bpf_func_state->callback_depth when
     pruning states as otherwise unsafe programs could get accepted

   - ipv6: avoid possible UAF in ip6_route_mpath_notify()

   - ice: reconfig host after changing MSI-X on VF

   - mlx5:
       - e-switch, change flow rule destination checking
       - add a memory barrier to prevent a possible null-ptr-deref
       - switch to using _bh variant of of spinlock where needed

  Previous releases - always broken:

   - netfilter: nf_conntrack_h323: add protection for bmp length out of
     range

   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
     program in CPU map which led to random xdp_md fields

   - xfrm: fix UDP encapsulation in TX packet offload

   - netrom: fix data-races around sysctls

   - ice:
       - fix potential NULL pointer dereference in ice_bridge_setlink()
       - fix uninitialized dplls mutex usage

   - igc: avoid returning frame twice in XDP_REDIRECT

   - i40e: disable NAPI right after disabling irqs when handling
     xsk_pool

   - geneve: make sure to pull inner header in geneve_rx()

   - sparx5: fix use after free inside sparx5_del_mact_entry

   - dsa: microchip: fix register write order in ksz8_ind_write8()

  Misc:

   - selftests: mptcp: fixes for diag.sh"

* tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  net: pds_core: Fix possible double free in error handling path
  netrom: Fix data-races around sysctl_net_busy_read
  netrom: Fix a data-race around sysctl_netrom_link_fails_count
  netrom: Fix a data-race around sysctl_netrom_routing_control
  netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
  netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
  netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
  netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
  netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
  netrom: Fix a data-race around sysctl_netrom_transport_timeout
  netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
  netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
  netrom: Fix a data-race around sysctl_netrom_default_path_quality
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
  net/rds: fix WARNING in rds_conn_connect_if_down
  net: dsa: microchip: fix register write order in ksz8_ind_write8()
  ...
2024-03-07 09:23:33 -08:00