This patch integrates read/write page helper functions as MTK phy lib.
They are basically the same in mtk-ge.c & mtk-ge-soc.c.
Signed-off-by: SkyLake.Huang <skylake.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes parens around TRIGGER_NETDEV_RX/TRIGGER_NETDEV_TX in
mtk_phy_led_hw_ctrl_set(), which improves readability.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: SkyLake.Huang <skylake.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates mtk-phy-lib.c & mtk-phy.h and integrates mtk-ge-soc.c's
LED helper functions so that we can use those helper functions in other
MTK's ethernet phy driver.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: SkyLake.Huang <skylake.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-organize MediaTek ethernet phy driver files and get ready to integrate
some common functions and add new 2.5G phy driver.
mtk-ge.c: MT7530 Gphy on MT7621 & MT7531 Gphy
mtk-ge-soc.c: Built-in Gphy on MT7981 & Built-in switch Gphy on MT7988
mtk-2p5ge.c: Planned for built-in 2.5G phy on MT7988
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: SkyLake.Huang <skylake.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya says:
====================
Introduce RVU representors
This series adds representor support for each rvu devices.
When switchdev mode is enabled, representor netdev is registered
for each rvu device. In implementation of representor model,
one NIX HW LF with multiple SQ and RQ is reserved, where each
RQ and SQ of the LF are mapped to a representor. A loopback channel
is reserved to support packet path between representors and VFs.
CN10K silicon supports 2 types of MACs, RPM and SDP. This
patch set adds representor support for both RPM and SDP MAC
interfaces.
- Patch 1: Implements basic representor driver.
- Patch 2: Add devlink support to create representor netdevs that
can be used to manage VFs.
- Patch 3: Implements basec netdev_ndo_ops.
- Patch 4: Installs tcam rules to route packets between representor and
VFs.
- Patch 5: Enables fetching VF stats via representor interface
- Patch 6: Adds support to sync link state between representors and VFs .
- Patch 7: Enables configuring VF MTU via representor netdevs.
- Patch 8: Adds representors for sdp MAC.
- Patch 9: Adds devlink port support.
- Patch 10: Implements offload stats.
- Patch 11: Implements tc offload support.
- patch 12: Adds documentation for rvu port representor.
pci/0002:1c:00.0
Command to create PF/VF representor
Rpf1vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether f6:43:83:ee:26:21 brd ff:ff:ff:ff:ff:ff
Rpf1vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 12:b2:54:0e:24:54 brd ff:ff:ff:ff:ff:ff
Rpf1vf2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 4a:12:c4:4c:32:62 brd ff:ff:ff:ff:ff:ff
Rpf1vf3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether ca:cb:68:0e:e2:6e brd ff:ff:ff:ff:ff:ff
Rpf2vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 06:cc:ad:b4:f0:93 brd ff:ff:ff:ff:ff:ff
~# devlink port
pci/0002:1c:00.0/0: type eth netdev Rpf1vf0 flavour physical port 0 splittable false
pci/0002:1c:00.0/1: type eth netdev Rpf1vf1 flavour pcivf controller 0 pfnum 1 vfnum 1 external false splittable false
pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false
pci/0002:1c:00.0/3: type eth netdev Rpf1vf3 flavour pcivf controller 0 pfnum 1 vfnum 3 external false splittable false
-----------
v11:v1:
- Submitted refactoring changes as a separate patch set.
https://lore.kernel.org/netdev/20241023161843.15543-1-gakula@marvell.com/T/
- Moved documentation to a separate patch.
- patch 9: Added code changes to forward updated mac address to VF.
- Implemented TC offload support.
v10-v11:
- As suggested by "Jiri Pirko" adjusted the documentation.
- Added more commit description to patch1.
v9-v10:
- Fixed build warning w.r.t documentation.
v8-v9:
- Updated the documentation.
v7-v8:
- Implemented offload stats ndo.
- Added documentation.
v6-v7:
- Rebased on top net-next branch.
v5-v6:
- Addressed review comments provided by "Simon Horman".
- Added review tag.
v4-v5:
- Patch 3: Removed devm_* usage in rvu_rep_create()
- Patch 3: Fixed build warnings.
v3-v4:
- Patch 2 & 3: Fixed coccinelle reported warnings.
- Patch 10: Added devlink port support.
v2-v3:
- Used extack for error messages.
- As suggested reworked commit messages.
- Fixed sparse warning.
v1-v2:
-Fixed build warnings.
-Address review comments provided by "Kalesh Anakkur Purayil".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds documentation for creating and configuring rvu port representors
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements tc offload support for rvu representors.
Usage example:
- Add tc rule to drop packets with vlan id 3 using port
representor(Rpf1vf0).
# tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower
vlan_id 3 vlan_ethtype ipv4 skip_sw action drop
- Redirect packets with vlan id 5 and IPv4 packets to eth1,
after stripping vlan header.
# tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5
vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress
redirect dev eth1
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the offload stat ndo by fetching the HW stats
of rx/tx queues attached to the representor.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register devlink port for the rvu representors.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware supports different types of MACs eg RPM, SDP, LBK.
LBK is for internal Tx->Rx HW loopback path. RPM and SDP MACs support
ingress/egress pkt IO on interfaces with different set of capabilities
like interface modes. At the time of netdev driver registration PF will
seek MAC related information from Admin function driver
'drivers/net/ethernet/marvell/octeontx2/af' and sets up ingress/egress
queues etc such that pkt IO on the channels of these different MACs is
possible. This patch add representors for SDP MAC.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support to manage the mtu configuration for VF through representor.
On update of representor mtu a mbox notification is send
to VF to update its mtu.
This feature is implemented based on the "Network Function Representors"
kernel documentation.
"
Setting an MTU on the representor should cause that same MTU
to be reported to the representee.
"
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements the below requirement mentioned
in the representors documentation.
"
The representee's link state is controlled through the
representor. Setting the representor administratively UP
or DOWN should cause carrier ON or OFF at the representee.
"
This patch enables
- Reflecting the link state of representor based on the VF state and
link state of VF based on representor.
- On VF interface up/down a notification is sent via mbox to representor
to update the link state.
eg: ip link set eth0 up/down will disable carrier on/off
of the corresponding representor(r0p1) interface.
- On representor interface up/down will cause the link state update of VF.
eg: ip link set r0p1 up/down will disable carrier on/off
of the corresponding representee(eth0) interface.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support to export VF port statistics via representor
netdev. Defines new mbox "NIX_LF_STATS" to fetch VF hw stats.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current HW, do not support in-built switch which will forward pkts
between representee and representor. When representor is put under
a bridge and pkts needs to be sent to representee, then pkts from
representor are sent on a HW internal loopback channel, which again
will be punted to ingress pkt parser. Now the rules that this patch
installs are the MCAM filters/rules which will match against these
pkts and forward them to representee.
The rules that this patch installs are for basic
representor <=> representee path similar to Tun/TAP between VM and
Host.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements basic set of net_device_ops.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds initial devlink support to set/get the switchdev mode.
Representor netdevs are created for each rvu devices when
the switch mode is set to 'switchdev'. These netdevs are
be used to control and configure VFs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds basic driver for the RVU representor.
Driver on probe does pci specific initialization and
does hw resources configuration. Introduces RVU_ESWITCH
kernel config to enable/disable the driver. Representor
and NIC shares the code but representors netdev support
subset of NIC functionality. Hence "otx2_rep_dev" API
helps to skip the features initialization that are not
supported by the representors.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0f6deac3a079 ("net: page_pool: add page allocation stats for
two fast page allocate path") added increments for "fast path"
allocation to page frag alloc. It mentions performance degradation
analysis but the details are unclear. Could be that the author
was simply surprised by the alloc stats not matching packet count.
In my experience the key metric for page pool is the recycling rate.
Page return stats, however, count returned _pages_ not frags.
This makes it impossible to calculate recycling rate for drivers
using the frag API. Here is example output of the page-pool
YNL sample for a driver allocating 1200B frags (4k pages)
with nearly perfect recycling:
$ ./page-pool
eth0[2] page pools: 32 (zombies: 0)
refs: 291648 bytes: 1194590208 (refs: 0 bytes: 0)
recycling: 33.3% (alloc: 4557:2256365862 recycle: 200476245:551541893)
The recycling rate is reported as 33.3% because we give out
4096 // 1200 = 3 frags for every recycled page.
Effectively revert the aforementioned commit. This also aligns
with the stats we would see for drivers which do the fragmentation
themselves, although that's not a strong reason in itself.
On the (very unlikely) path where we can reuse the current page
let's bump the "cached" stat. The fact that we don't put the page
in the cache is just an optimization.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://patch.msgid.link/20241109023303.3366500-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Testing small size RPCs (300B-400B) on a large AMD system suggests
that page pool recycling is very useful even for just the head frags.
With this patch (and copy break disabled) I see a 30% performance
improvement (82Gbps -> 106Gbps).
Convert bnxt from normal page frags to page pool frags for head buffers.
On systems with small page size we can use the same pool as for TPA
pages. On systems with large pages the frag allocation logic of the
page pool is already used to split a large page into TPA chunks.
TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB)
and we always allocate the same sized chunks. Mixing allocation
of TPA and head pages would lead to sub-optimal memory use.
Plus Taehee's work on zero-copy / devmem will need to differentiate
between TPA and non-TPA page pool, anyway. Conditionally allocate
a new page pool for heads.
Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":
tc filter add dev myve $FILTER1
tc filter add dev myve $FILTER2
for i in {1..2048}
do
echo $i
tc filter del dev myve $FILTER2
tc filter add dev myve $FILTER2
done
This patch adds the missing decoding logic for handles that
deserve it.
Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20241110172836.331319-1-alexandre.ferrieux@orange.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
qca8k_phy_eth_command() is used to probe the child MDIO bus while the
parent MDIO is locked. This causes lockdep splat, reporting a possible
deadlock. It is not an actually deadlock, because different locks are
used. By making use of mutex_lock_nested() we can avoid this false
positive.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241110175955.3053664-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Removing full driver sections also removed mailing list entries, causing
submitters of future patches to forget CCing these mailing lists.
Hence re-add the sections for the Renesas Ethernet AVB, R-Car SATA, and
SuperH Ethernet drivers. Add people who volunteered to maintain these
drivers (thanks a lot!), and mark all of them as supported.
Fixes: 6e90b675cf942e50 ("MAINTAINERS: Remove some entries due to various compliance requirements.")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Niklas Cassel <cassel@kernel.org>
Acked-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Link: https://patch.msgid.link/4b2105332edca277f07ffa195796975e9ddce994.1731319098.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In 'mptcp_reset_tout_timer', promote 'probe_timestamp' to unsigned long
to avoid possible integer overflow. Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Kandybka <d.kandybka@gmail.com>
Link: https://patch.msgid.link/20241107103657.1560536-1-d.kandybka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f.
Sebastian noticed the ISR indirectly acquires spin_locks, which are
sleeping locks under PREEMPT_RT, which leads to kernel splats.
Fixes: 338c4d3902feb ("igb: Disable threaded IRQ for igb_msix_other")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241106111427.7272-1-wander@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- btintel: Direct exception event to bluetooth stack
- hci_core: Fix calling mgmt_device_connected
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmczlaIZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKTBYD/9guawZa/20oiagUDOutuT/
i1KCPlVNrMpbeyDyK2sC2ConWOMHdpBToo0/vEwdKbAB/kCbWDh9DjWvaawSUngX
XPMTnk279WdWOLh6JUb87af1Q4wt8faro63g5gwTmXQrsIED5MlpMQJ2pZAkEmBe
pQU3QZJjz2BtnFHnVXHLXe53E3P0kWqlrqcAvdeJWRew+0rm++9f187pn9F/kUd0
F9f4YZgZAmlk56nT5kdv3NSi/cscm5xajlJSG9PlR40n7Un/T6RZXGzl0KeJ+hJw
DeyMOYBpBnGDOUe/7coqeZH6AulZWzHHIm5UXmqmVMM7KyT0mL/bxSDyXJnv2e6F
lXBEFNu6o/15N1S8uU6677+wcnbJ1BXwtDSk8iGOXECBN9hoB52NiIx1HPOI5mEX
dflH8FLe5hZx4b+yktTVBWWcBOd9cMonOxqOWPgfZ4ZbhnEe1SlVf4Qnh5Amq0yt
ZixbEP7G4k4uWhHvTdwVWIXPxGeBSmn8sQXG1ZSutwLaU9TQYL5W7m0DSnB4xdQB
h8J2/tdX63Fjm2tpkabb/oRvns9ekjq98QqNGlA2GP7jaqndJmg5ixg8Jhjn9uPF
OjG9z6OX4yrFFpJP4SXKAl7W3sg2g0yFGLDjoj2h9zVPuIcvbZG1NTzLgytNNXlG
JcFpsADEZAZcuUotyfFytg==
=7vO+
-----END PGP SIGNATURE-----
Merge tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btintel: Direct exception event to bluetooth stack
- hci_core: Fix calling mgmt_device_connected
* tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
====================
Link: https://patch.msgid.link/20241112175326.930800-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A last minute mlx5 bugfix
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmcz5NMPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRptJcIAMuzl2PX/HzYwMBGI7TrRObAaZo8j6Zub9Be
TVw6H33OKK86y2MoBz1hTj1Z32KA+qAGJEui03ckrFpHSVkzRvNGXJEI2rbtY5sX
bmP3ch/9Yr4aEw1eF1cpcQlTMyFFFoeqbTLf5qBItsZ+qMfqiknAeSRL31YDBteK
uOWaTPHMW8nNyy6wQaI9dEdP84Dluhx+B/IxcGcl8FySpSl+faA/uHr5YJP9kTO4
e7PxFYa0oBeCqu7varkVRHuaoMaPk4OCrjeZWZAY9dp9LOGtfgh/YbYj7wsNfkXH
mvKy8lRu+o1/Fh6bRc0TxNmtvOPpB1Myto6wj4ntEtNQun3khOo=
=uDap
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute mlx5 bugfix"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
When calculating the physical address range based on the iotlb and mr
[start,end) ranges, the offset of mr->start relative to map->start
is not taken into account. This leads to some incorrect and duplicate
mappings.
For the case when mr->start < map->start the code is already correct:
the range in [mr->start, map->start) was handled by a different
iteration.
Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Cc: stable@vger.kernel.org
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20241021134040.975221-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
x86:
- When emulating a guest TLB flush for a nested guest, flush vpid01, not
vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if L2 and
L1 are sharing VPID '0' (from L1's perspective).
- Fix a bug in the SNP initialization flow where KVM would return '0' to
userspace instead of -errno on failure.
- Move the Intel PT virtualization (i.e. outputting host trace to host
buffer and guest trace to guest buffer) behind CONFIG_BROKEN.
- Fix memory leak on failure of KVM_SEV_SNP_LAUNCH_START
- Fix a bug where KVM fails to inject an interrupt from the IRR after
KVM_SET_LAPIC.
Selftests:
- Increase the timeout for the memslot performance selftest to avoid false
failures on arm64 and nested x86 platforms.
- Fix a goof in the guest_memfd selftest where a for-loop initialized a
bit mask to zero instead of BIT(0).
- Disable strict aliasing when building KVM selftests to prevent the
compiler from treating things like "u64 *" to "uint64_t *" cases as
undefined behavior, which can lead to nasty, hard to debug failures.
- Force -march=x86-64-v2 for KVM x86 selftests if and only if the uarch
is supported by the compiler.
- Fix broken compilation of kvm selftests after a header sync in tools/
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmczm1QUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOLKwf+IjkJHZ/LS95HuP/0QLM17Sc4MmiZ
Pk5gLd5un7BBSLA98RvALR/YPnsA7emEJ34bE/8lQ6R5VSZ5PrIzF+29f60HzRFe
EDi1/24dqnzdWn50na5nk7A2QhFpfnLQQTl7vMqPFsrU7gfLuHQI6ABp9kloEwP/
xnjAT683IWNX9v0N2A8kNemy9NNMGssJk1ssDTGzNflSyRNL8cLPGlPkZqAIMsM6
fHjkDRg0UxasUDkL5CjwnTSdBGoz+/Myyz4unFlYGJB9D3+ev2qDlMqATO4Jfik/
peJMZ65i8/8/7MgKCTn8qQuT0FLLEvxTuzDHUSGzjMZl0DGaZi2BPETNqg==
=nW8/
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 and selftests fixes.
x86:
- When emulating a guest TLB flush for a nested guest, flush vpid01,
not vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if
L2 and L1 are sharing VPID '0' (from L1's perspective).
- Fix a bug in the SNP initialization flow where KVM would return '0'
to userspace instead of -errno on failure.
- Move the Intel PT virtualization (i.e. outputting host trace to
host buffer and guest trace to guest buffer) behind CONFIG_BROKEN.
- Fix memory leak on failure of KVM_SEV_SNP_LAUNCH_START
- Fix a bug where KVM fails to inject an interrupt from the IRR after
KVM_SET_LAPIC.
Selftests:
- Increase the timeout for the memslot performance selftest to avoid
false failures on arm64 and nested x86 platforms.
- Fix a goof in the guest_memfd selftest where a for-loop initialized
a bit mask to zero instead of BIT(0).
- Disable strict aliasing when building KVM selftests to prevent the
compiler from treating things like "u64 *" to "uint64_t *" cases as
undefined behavior, which can lead to nasty, hard to debug
failures.
- Force -march=x86-64-v2 for KVM x86 selftests if and only if the
uarch is supported by the compiler.
- Fix broken compilation of kvm selftests after a header sync in
tools/"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN
KVM: x86: Unconditionally set irr_pending when updating APICv state
kvm: svm: Fix gctx page leak on invalid inputs
KVM: selftests: use X86_MEMTYPE_WB instead of VMX_BASIC_MEM_TYPE_WB
KVM: SVM: Propagate error from snp_guest_req_init() to userspace
KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
KVM: selftests: Don't force -march=x86-64-v2 if it's unsupported
KVM: selftests: Disable strict aliasing
KVM: selftests: fix unintentional noop test in guest_memfd_test.c
KVM: selftests: memslot_perf_test: increase guest sync timeout
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZzNj9BQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5QKDAQCkbTcWVTnMrdz/0hV9JVmoLCFs6GWZ
cTjaBApOQge1pgD/bTQGJ0fYP6sWEzMPSTMXr6uJaJtlmpsGdPNoOmKUTQU=
=+K7B
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
"One bug fix, one performance improvement, and the use of
static_assert:
- The bug fix addresses "only a cosmetic change" commit, which didn't
take into account the original 'ima' template definition.
- The performance improvement limits the atomic_read()"
* tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Use static_assert() to check struct sizes
evm: stop avoidably reading i_writecount in evm_file_release
ima: fix buffer overrun in ima_eventdigest_init_common
This reverts commit 02b682d54598f61cbb7dbb14d98ec1801112b878.
Alf reports that this commit causes the connection to eventually die on
iwl4965. The reason is that rx_status.flag is zeroed after
RX_FLAG_FAILED_FCS_CRC is set and mac80211 doesn't know the received frame is
corrupted.
Fixes: 02b682d54598 ("wifi: iwlegacy: do not skip frames with bad FCS")
Reported-by: Alf Marius <post@alfmarius.net>
Closes: https://lore.kernel.org/r/60f752e8-787e-44a8-92ae-48bdfc9b43e7@app.fastmail.com/
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20241112142419.1023743-1-kvalo@kernel.org
This test verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping. To test this, we read the
count of free hugepages before and after the mmap, DIO, and munmap
operations, then check if the free hugepage count is the same.
Reading free hugepages before the test was removed by commit 0268d4579901
('selftests: hugetlb_dio: check for initial conditions to skip at the
start'), causing the test to always fail.
This patch adds back reading the free hugepages before starting the test.
With this patch, the tests are now passing.
Test results without this patch:
./tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 1 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 2 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 3 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 4 : Huge pages not freed!
# Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0
Test results with this patch:
/tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 1 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 2 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 3 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 4 : Huge pages freed successfully !
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Link: https://lkml.kernel.org/r/20241110064903.23626-1-donettom@linux.ibm.com
Fixes: 0268d4579901 ("selftests: hugetlb_dio: check for initial conditions to skip in the start")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Though even more elusive than before, list_del corruption has still been
seen on THP's deferred split queue.
The idea in commit e66f3185fa04 was right, but its implementation wrong.
The context omitted an important comment just before the critical test:
"split_folio() removes folio from list on success." In ignoring that
comment, when a THP split succeeded, the code went on to release the
preceding safe folio, preserving instead an irrelevant (formerly head)
folio: which gives no safety because it's not on the list. Fix the logic.
Link: https://lkml.kernel.org/r/3c995a30-31ce-0998-1b9f-3a2cb9354c91@google.com
Fixes: e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation must
now occur, for FOLL_LONGTERM cases.
A user-visible consequence has now appeared: user space can no longer pin
more than 2GB of memory anymore on x86_64. That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().
In addition to the directly visible effect described above, there is also
the problem of adding an unnecessary allocation. The **pages array
argument has already been allocated, and there is no need for a redundant
**folios array allocation in this case.
Fix this by avoiding the new allocation entirely. This is done by
referring to either the original page[i] within **pages, or to the
associated folio. Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.
[jhubbard@nvidia.com: whitespace tweak, per David]
Link: https://lkml.kernel.org/r/131cf9c8-ebc0-4cbb-b722-22fa8527bf3c@nvidia.com
[jhubbard@nvidia.com: bypass pofs_get_folio(), per Oscar]
Link: https://lkml.kernel.org/r/c1587c7f-9155-45be-bd62-1e36c0dd6923@nvidia.com
Link: https://lkml.kernel.org/r/20241105032944.141488-2-jhubbard@nvidia.com
Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Have exception event part of HCI traces which helps for debug.
snoop traces:
> HCI Event: Vendor (0xff) plen 79
Vendor Prefix (0x8780)
Intel Extended Telemetry (0x03)
Unknown extended telemetry event type (0xde)
01 01 de
Unknown extended subevent 0x07
01 01 de 07 01 de 06 1c ef be ad de ef be ad de
ef be ad de ef be ad de ef be ad de ef be ad de
ef be ad de 05 14 ef be ad de ef be ad de ef be
ad de ef be ad de ef be ad de 43 10 ef be ad de
ef be ad de ef be ad de ef be ad de
Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
BT_CONNECTED state is reached") there is no long the need to call
mgmt_device_connected as ACL data will be queued until BT_CONNECTED
state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219458
Link: https://github.com/bluez/bluez/issues/1014
Fixes: 333b4fd11e89 ("Bluetooth: L2CAP: Fix uaf in l2cap_connect")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Support EHT 1024 aggregation size in TX
The 1024 agg size for RX is supported but not for TX.
This patch adds this support and refactors common parsing logics for
addbaext in both process_addba_resp and process_addba_req into a
function.
Reviewed-by: Shayne Chen <shayne.chen@mediatek.com>
Reviewed-by: Money Wang <money.wang@mediatek.com>
Co-developed-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: MeiChia Chiu <MeiChia.Chiu@mediatek.com>
Link: https://patch.msgid.link/20241112083846.32063-1-MeiChia.Chiu@mediatek.com
[pass elems/len instead of mgmt/len/is_req]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We run into an exhaustion problem with the kernel-allocated filter IDs.
Our allocation problem can be fixed on the user space side,
but the error message in this case was quite misleading:
"Filter with specified priority/protocol not found" (EINVAL)
Specifically when we can't allocate a _new_ ID because filter with
lowest ID already _exists_, saying "filter not found", is confusing.
Kernel allocates IDs in range of 0xc0000 -> 0x8000, giving out ID one
lower than lowest existing in that range. The error message makes sense
when tcf_chain_tp_find() gets called for GET and DEL but for NEW we
need to provide more specific error messages for all three cases:
- user wants the ID to be auto-allocated but filter with ID 0x8000
already exists
- filter already exists and can be replaced, but user asked
for a protocol change
- filter doesn't exist
Caller of tcf_chain_tp_insert_unique() doesn't set extack today,
so don't bother plumbing it in.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20241108010254.2995438-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj says:
====================
virtio/vsock: Fix memory leaks
Short series fixing some memory leaks that I've stumbled upon while toying
with the selftests.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
====================
Link: https://patch.msgid.link/20241107-vsock-mem-leaks-v2-0-4e21bfcfc818@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As the final stages of socket destruction may be delayed, it is possible
that virtio_transport_recv_listen() will be called after the accept_queue
has been flushed, but before the SOCK_DONE flag has been set. As a result,
sockets enqueued after the flush would remain unremoved, leading to a
memory leak.
vsock_release
__vsock_release
lock
virtio_transport_release
virtio_transport_close
schedule_delayed_work(close_work)
sk_shutdown = SHUTDOWN_MASK
(!) flush accept_queue
release
virtio_transport_recv_pkt
vsock_find_bound_socket
lock
if flag(SOCK_DONE) return
virtio_transport_recv_listen
child = vsock_create_connected
(!) vsock_enqueue_accept(child)
release
close_work
lock
virtio_transport_do_close
set_flag(SOCK_DONE)
virtio_transport_remove_sock
vsock_remove_sock
vsock_remove_bound
release
Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during
socket destruction.
unreferenced object 0xffff888109e3f800 (size 2040):
comm "kworker/5:2", pid 371, jiffies 4294940105
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............
backtrace (crc 9e5f4e84):
[<ffffffff81418ff1>] kmem_cache_alloc_noprof+0x2c1/0x360
[<ffffffff81d27aa0>] sk_prot_alloc+0x30/0x120
[<ffffffff81d2b54c>] sk_alloc+0x2c/0x4b0
[<ffffffff81fe049a>] __vsock_create.constprop.0+0x2a/0x310
[<ffffffff81fe6d6c>] virtio_transport_recv_pkt+0x4dc/0x9a0
[<ffffffff81fe745d>] vsock_loopback_work+0xfd/0x140
[<ffffffff810fc6ac>] process_one_work+0x20c/0x570
[<ffffffff810fce3f>] worker_thread+0x1bf/0x3a0
[<ffffffff811070dd>] kthread+0xdd/0x110
[<ffffffff81044fdd>] ret_from_fork+0x2d/0x50
[<ffffffff8100785a>] ret_from_fork_asm+0x1a/0x30
Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Introduce a fault injection mechanism to force skb reallocation. The
primary goal is to catch bugs related to pointer invalidation after
potential skb reallocation.
The fault injection mechanism aims to identify scenarios where callers
retain pointers to various headers in the skb but fail to reload these
pointers after calling a function that may reallocate the data. This
type of bug can lead to memory corruption or crashes if the old,
now-invalid pointers are used.
By forcing reallocation through fault injection, we can stress-test code
paths and ensure proper pointer management after potential skb
reallocations.
Add a hook for fault injection in the following functions:
* pskb_trim_rcsum()
* pskb_may_pull_reason()
* pskb_trim()
As the other fault injection mechanism, protect it under a debug Kconfig
called CONFIG_FAIL_SKB_REALLOC.
This patch was *heavily* inspired by Jakub's proposal from:
https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/
CC: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20241107-fault_v6-v6-1-1b82cb6ecacd@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Menglong Dong says:
====================
net: ip: add drop reasons to input route
In this series, we mainly add some skb drop reasons to the input path of
ip routing, and we make the following functions return drop reasons:
fib_validate_source()
ip_route_input_mc()
ip_mc_validate_source()
ip_route_input_slow()
ip_route_input_rcu()
ip_route_input_noref()
ip_route_input()
ip_mkroute_input()
__mkroute_input()
ip_route_use_hint()
And following new skb drop reasons are added:
SKB_DROP_REASON_IP_LOCAL_SOURCE
SKB_DROP_REASON_IP_INVALID_SOURCE
SKB_DROP_REASON_IP_LOCALNET
SKB_DROP_REASON_IP_INVALID_DEST
Changes since v4:
- in the 6th patch: remove the unneeded "else" in ip_expire()
- in the 8th patch: delete the unneeded comment in __mkroute_input()
- in the 9th patch: replace "return 0" with "return SKB_NOT_DROPPED_YET"
in ip_route_use_hint()
Changes since v3:
- don't refactor fib_validate_source/__fib_validate_source, and introduce
a wrapper for fib_validate_source() instead in the 1st patch.
- some small adjustment in the 4-7 patches
Changes since v2:
- refactor fib_validate_source and __fib_validate_source to make
fib_validate_source return drop reasons
- add the 9th and 10th patches to make this series cover the input route
code path
Changes since v1:
- make ip_route_input_noref/ip_route_input_rcu/ip_route_input_slow return
drop reasons, instead of passing a local variable to their function
arguments.
====================
Link: https://patch.msgid.link/20241107125601.1076814-1-dongml2@chinatelecom.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In this commit, we make ip_route_use_hint() return drop reasons. The
drop reasons that we return are similar to what we do in
ip_route_input_slow(), and no drop reasons are added in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In this commit, we make ip_mkroute_input() and __mkroute_input() return
drop reasons.
The drop reason "SKB_DROP_REASON_ARP_PVLAN_DISABLE" is introduced for
the case: the packet which is not IP is forwarded to the in_dev, and
the proxy_arp_pvlan is not enabled.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>