mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2024-12-29 17:25:38 +00:00
7a17308c17
1323635 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Iulia Tanasescu
|
7a17308c17 |
Bluetooth: iso: Fix circular lock in iso_conn_big_sync
This fixes the circular locking dependency warning below, by reworking
iso_sock_recvmsg, to ensure that the socket lock is always released
before calling a function that locks hdev.
[ 561.670344] ======================================================
[ 561.670346] WARNING: possible circular locking dependency detected
[ 561.670349] 6.12.0-rc6+ #26 Not tainted
[ 561.670351] ------------------------------------------------------
[ 561.670353] iso-tester/3289 is trying to acquire lock:
[ 561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3},
at: iso_conn_big_sync+0x73/0x260 [bluetooth]
[ 561.670405]
but task is already holding lock:
[ 561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0},
at: iso_sock_recvmsg+0xbf/0x500 [bluetooth]
[ 561.670450]
which lock already depends on the new lock.
[ 561.670452]
the existing dependency chain (in reverse order) is:
[ 561.670453]
-> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}:
[ 561.670458] lock_acquire+0x7c/0xc0
[ 561.670463] lock_sock_nested+0x3b/0xf0
[ 561.670467] bt_accept_dequeue+0x1a5/0x4d0 [bluetooth]
[ 561.670510] iso_sock_accept+0x271/0x830 [bluetooth]
[ 561.670547] do_accept+0x3dd/0x610
[ 561.670550] __sys_accept4+0xd8/0x170
[ 561.670553] __x64_sys_accept+0x74/0xc0
[ 561.670556] x64_sys_call+0x17d6/0x25f0
[ 561.670559] do_syscall_64+0x87/0x150
[ 561.670563] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 561.670567]
-> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[ 561.670571] lock_acquire+0x7c/0xc0
[ 561.670574] lock_sock_nested+0x3b/0xf0
[ 561.670577] iso_sock_listen+0x2de/0xf30 [bluetooth]
[ 561.670617] __sys_listen_socket+0xef/0x130
[ 561.670620] __x64_sys_listen+0xe1/0x190
[ 561.670623] x64_sys_call+0x2517/0x25f0
[ 561.670626] do_syscall_64+0x87/0x150
[ 561.670629] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 561.670632]
-> #0 (&hdev->lock){+.+.}-{3:3}:
[ 561.670636] __lock_acquire+0x32ad/0x6ab0
[ 561.670639] lock_acquire.part.0+0x118/0x360
[ 561.670642] lock_acquire+0x7c/0xc0
[ 561.670644] __mutex_lock+0x18d/0x12f0
[ 561.670647] mutex_lock_nested+0x1b/0x30
[ 561.670651] iso_conn_big_sync+0x73/0x260 [bluetooth]
[ 561.670687] iso_sock_recvmsg+0x3e9/0x500 [bluetooth]
[ 561.670722] sock_recvmsg+0x1d5/0x240
[ 561.670725] sock_read_iter+0x27d/0x470
[ 561.670727] vfs_read+0x9a0/0xd30
[ 561.670731] ksys_read+0x1a8/0x250
[ 561.670733] __x64_sys_read+0x72/0xc0
[ 561.670736] x64_sys_call+0x1b12/0x25f0
[ 561.670738] do_syscall_64+0x87/0x150
[ 561.670741] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 561.670744]
other info that might help us debug this:
[ 561.670745] Chain exists of:
&hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH
[ 561.670751] Possible unsafe locking scenario:
[ 561.670753] CPU0 CPU1
[ 561.670754] ---- ----
[ 561.670756] lock(sk_lock-AF_BLUETOOTH);
[ 561.670758] lock(sk_lock
AF_BLUETOOTH-BTPROTO_ISO);
[ 561.670761] lock(sk_lock-AF_BLUETOOTH);
[ 561.670764] lock(&hdev->lock);
[ 561.670767]
*** DEADLOCK ***
Fixes:
|
||
Iulia Tanasescu
|
168e28305b |
Bluetooth: iso: Fix circular lock in iso_listen_bis
This fixes the circular locking dependency warning below, by
releasing the socket lock before enterning iso_listen_bis, to
avoid any potential deadlock with hdev lock.
[ 75.307983] ======================================================
[ 75.307984] WARNING: possible circular locking dependency detected
[ 75.307985] 6.12.0-rc6+ #22 Not tainted
[ 75.307987] ------------------------------------------------------
[ 75.307987] kworker/u81:2/2623 is trying to acquire lock:
[ 75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO)
at: iso_connect_cfm+0x253/0x840 [bluetooth]
[ 75.308021]
but task is already holding lock:
[ 75.308022] ffff8fdd61a10078 (&hdev->lock)
at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[ 75.308053]
which lock already depends on the new lock.
[ 75.308054]
the existing dependency chain (in reverse order) is:
[ 75.308055]
-> #1 (&hdev->lock){+.+.}-{3:3}:
[ 75.308057] __mutex_lock+0xad/0xc50
[ 75.308061] mutex_lock_nested+0x1b/0x30
[ 75.308063] iso_sock_listen+0x143/0x5c0 [bluetooth]
[ 75.308085] __sys_listen_socket+0x49/0x60
[ 75.308088] __x64_sys_listen+0x4c/0x90
[ 75.308090] x64_sys_call+0x2517/0x25f0
[ 75.308092] do_syscall_64+0x87/0x150
[ 75.308095] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 75.308098]
-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[ 75.308100] __lock_acquire+0x155e/0x25f0
[ 75.308103] lock_acquire+0xc9/0x300
[ 75.308105] lock_sock_nested+0x32/0x90
[ 75.308107] iso_connect_cfm+0x253/0x840 [bluetooth]
[ 75.308128] hci_connect_cfm+0x6c/0x190 [bluetooth]
[ 75.308155] hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth]
[ 75.308180] hci_le_meta_evt+0xe7/0x200 [bluetooth]
[ 75.308206] hci_event_packet+0x21f/0x5c0 [bluetooth]
[ 75.308230] hci_rx_work+0x3ae/0xb10 [bluetooth]
[ 75.308254] process_one_work+0x212/0x740
[ 75.308256] worker_thread+0x1bd/0x3a0
[ 75.308258] kthread+0xe4/0x120
[ 75.308259] ret_from_fork+0x44/0x70
[ 75.308261] ret_from_fork_asm+0x1a/0x30
[ 75.308263]
other info that might help us debug this:
[ 75.308264] Possible unsafe locking scenario:
[ 75.308264] CPU0 CPU1
[ 75.308265] ---- ----
[ 75.308265] lock(&hdev->lock);
[ 75.308267] lock(sk_lock-
AF_BLUETOOTH-BTPROTO_ISO);
[ 75.308268] lock(&hdev->lock);
[ 75.308269] lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
[ 75.308270]
*** DEADLOCK ***
[ 75.308271] 4 locks held by kworker/u81:2/2623:
[ 75.308272] #0: ffff8fdd66e52148 ((wq_completion)hci0#2){+.+.}-{0:0},
at: process_one_work+0x443/0x740
[ 75.308276] #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)),
at: process_one_work+0x1ce/0x740
[ 75.308280] #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3}
at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[ 75.308304] #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2},
at: hci_connect_cfm+0x29/0x190 [bluetooth]
Fixes:
|
||
Frédéric Danis
|
29a651451e |
Bluetooth: SCO: Add support for 16 bits transparent voice setting
The voice setting is used by sco_connect() or sco_conn_defer_accept()
after being set by sco_sock_setsockopt().
The PCM part of the voice setting is used for offload mode through PCM
chipset port.
This commits add support for mSBC 16 bits offloading, i.e. audio data
not transported over HCI.
The BCM4349B1 supports 16 bits transparent data on its I2S port.
If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this
gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives
correct audio.
This has been tested with connection to iPhone 14 and Samsung S24.
Fixes:
|
||
Iulia Tanasescu
|
9bde7c3b3a |
Bluetooth: iso: Fix recursive locking warning
This updates iso_sock_accept to use nested locking for the parent
socket, to avoid lockdep warnings caused because the parent and
child sockets are locked by the same thread:
[ 41.585683] ============================================
[ 41.585688] WARNING: possible recursive locking detected
[ 41.585694] 6.12.0-rc6+ #22 Not tainted
[ 41.585701] --------------------------------------------
[ 41.585705] iso-tester/3139 is trying to acquire lock:
[ 41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH)
at: bt_accept_dequeue+0xe3/0x280 [bluetooth]
[ 41.585905]
but task is already holding lock:
[ 41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
at: iso_sock_accept+0x61/0x2d0 [bluetooth]
[ 41.586064]
other info that might help us debug this:
[ 41.586069] Possible unsafe locking scenario:
[ 41.586072] CPU0
[ 41.586076] ----
[ 41.586079] lock(sk_lock-AF_BLUETOOTH);
[ 41.586086] lock(sk_lock-AF_BLUETOOTH);
[ 41.586093]
*** DEADLOCK ***
[ 41.586097] May be due to missing lock nesting notation
[ 41.586101] 1 lock held by iso-tester/3139:
[ 41.586107] #0: ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
at: iso_sock_accept+0x61/0x2d0 [bluetooth]
Fixes:
|
||
Iulia Tanasescu
|
9c76fff747 |
Bluetooth: iso: Always release hdev at the end of iso_listen_bis
Since hci_get_route holds the device before returning, the hdev
should be released with hci_dev_put at the end of iso_listen_bis
even if the function returns with an error.
Fixes:
|
||
Luiz Augusto von Dentz
|
581dd2dc16 |
Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is
not safe since for the most part entries fetched this way shall be
treated as rcu_dereference:
Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section [1]_.
For example, the following is **not** legal::
rcu_read_lock();
p = rcu_dereference(head.next);
rcu_read_unlock();
x = p->address; /* BUG!!! */
rcu_read_lock();
y = p->data; /* BUG!!! */
rcu_read_unlock();
Fixes:
|
||
Luiz Augusto von Dentz
|
4d94f05558 |
Bluetooth: hci_core: Fix sleeping function called from invalid context
This reworks hci_cb_list to not use mutex hci_cb_list_lock to avoid bugs like the bellow: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5070, name: kworker/u9:2 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 4 locks held by kworker/u9:2/5070: #0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x1770 kernel/workqueue.c:3335 #1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x1770 kernel/workqueue.c:3335 #2: ffff8880665d0078 (&hdev->lock){+.+.}-{3:3}, at: hci_le_create_big_complete_evt+0xcf/0xae0 net/bluetooth/hci_event.c:6914 #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hci_le_create_big_complete_evt+0xdb/0xae0 net/bluetooth/hci_event.c:6915 CPU: 0 PID: 5070 Comm: kworker/u9:2 Not tainted 6.8.0-syzkaller-08073-g480e035fc4c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: hci0 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 __might_resched+0x5d4/0x780 kernel/sched/core.c:10187 __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0xc1/0xd70 kernel/locking/mutex.c:752 hci_connect_cfm include/net/bluetooth/hci_core.h:2004 [inline] hci_le_create_big_complete_evt+0x3d9/0xae0 net/bluetooth/hci_event.c:6939 hci_event_func net/bluetooth/hci_event.c:7514 [inline] hci_event_packet+0xa53/0x1540 net/bluetooth/hci_event.c:7569 hci_rx_work+0x3e8/0xca0 net/bluetooth/hci_core.c:4171 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> Reported-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com Tested-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2fb0835e0c9cefc34614 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> |
||
Michal Luczaj
|
3e643e4efa |
Bluetooth: Improve setsockopt() handling of malformed user input
The bt_copy_from_sockptr() return value is being misinterpreted by most users: a non-zero result is mistakenly assumed to represent an error code, but actually indicates the number of bytes that could not be copied. Remove bt_copy_from_sockptr() and adapt callers to use copy_safe_from_sockptr(). For sco_sock_setsockopt() (case BT_CODEC) use copy_struct_from_sockptr() to scrub parts of uninitialized buffer. Opportunistically, rename `len` to `optlen` in hci_sock_setsockopt_old() and hci_sock_setsockopt(). Fixes: |
||
Nikita Yushchenko
|
3dd002f200 |
net: renesas: rswitch: handle stop vs interrupt race
Currently the stop routine of rswitch driver does not immediately
prevent hardware from continuing to update descriptors and requesting
interrupts.
It can happen that when rswitch_stop() executes the masking of
interrupts from the queues of the port being closed, napi poll for
that port is already scheduled or running on a different CPU. When
execution of this napi poll completes, it will unmask the interrupts.
And unmasked interrupt can fire after rswitch_stop() returns from
napi_disable() call. Then, the handler won't mask it, because
napi_schedule_prep() will return false, and interrupt storm will
happen.
This can't be fixed by making rswitch_stop() call napi_disable() before
masking interrupts. In this case, the interrupt storm will happen if
interrupt fires between napi_disable() and masking.
Fix this by checking for priv->opened_ports bit when unmasking
interrupts after napi poll. For that to be consistent, move
priv->opened_ports changes into spinlock-protected areas, and reorder
other operations in rswitch_open() and rswitch_stop() accordingly.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes:
|
||
Jakub Kicinski
|
93763e68f1 |
Merge branch 'net-renesas-rswitch-several-fixes'
Nikita Yushchenko says: ==================== net: renesas: rswitch: several fixes This series fixes several glitches found in the rswitch driver. Repost of https://lore.kernel.org/20241206190015.4194153-1-nikita.yoush@cogentembedded.com ==================== Link: https://patch.msgid.link/20241208095004.69468-1-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Nikita Yushchenko
|
66b7e9f85b |
net: renesas: rswitch: avoid use-after-put for a device tree node
The device tree node saved in the rswitch_device structure is used at
several driver locations. So passing this node to of_node_put() after
the first use is wrong.
Move of_node_put() for this node to exit paths.
Fixes:
|
||
Nikita Yushchenko
|
bb617328ba |
net: renesas: rswitch: fix leaked pointer on error path
If error path is taken while filling descriptor for a frame, skb
pointer is left in the entry. Later, on the ring entry reuse, the
same entry could be used as a part of a multi-descriptor frame,
and skb for that new frame could be stored in a different entry.
Then, the stale pointer will reach the completion routine, and passed
to the release operation.
Fix that by clearing the saved skb pointer at the error path.
Fixes:
|
||
Nikita Yushchenko
|
0c9547e6cc |
net: renesas: rswitch: fix race window between tx start and complete
If hardware is already transmitting, it can start handling the
descriptor being written to immediately after it observes updated DT
field, before the queue is kicked by a write to GWTRC.
If the start_xmit() execution is preempted at unfortunate moment, this
transmission can complete, and interrupt handled, before gq->cur gets
updated. With the current implementation of completion, this will cause
the last entry not completed.
Fix that by changing completion loop to check DT values directly, instead
of depending on gq->cur.
Fixes:
|
||
Nikita Yushchenko
|
5cb099902b |
net: renesas: rswitch: fix possible early skb release
When sending frame split into multiple descriptors, hardware processes
descriptors one by one, including writing back DT values. The first
descriptor could be already marked as completed when processing of
next descriptors for the same frame is still in progress.
Although only the last descriptor is configured to generate interrupt,
completion of the first descriptor could be noticed by the driver when
handling interrupt for the previous frame.
Currently, driver stores skb in the entry that corresponds to the first
descriptor. This results into skb could be unmapped and freed when
hardware did not complete the send yet. This opens a window for
corrupting the data being sent.
Fix this by saving skb in the entry that corresponds to the last
descriptor used to send the frame.
Fixes:
|
||
Jakub Kicinski
|
d02af27fa2 |
A small set of fixes:
- avoid CSA warnings during link removal (by changing link bitmap after remove) - fix # of spatial streams initialisation - fix queues getting stuck in some CSA cases and resume failures - fix interface address when switching monitor mode - fix MBSS change flags 32-bit stack corruption - more UBSAN __counted_by "fixes" ... - fix link ID netlink validation -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmdYO1cACgkQ10qiO8sP aAC3Ug//aYzp6YDiEkmS0CWBKBa8sWDAXR64tUmtTAbWKq1W2JETTgpnR4mmSnA/ WfLjwd2cenbyiPbuIveBTD5cFUCmX8z1DM2C2WTmYEWCTMCCZj1YS6ZUXL7Z+1rI oCZy5dGsIHP8nUL85jaLiXXhiNpZYgyAqLnBawZ+JgNQ8V0a1AtTYW+Ysvu8sTwg qXyfTt7ZJGpYqNquDjiyLF2S06yBSgvyT01pXbV4Eny4u7X8d2nBIXmOMdO2CQK0 cFxAs5pXq37ROpjT1ocuvsTNviQ8y74YwhKPOAHENE2I0THcyL7NGk+eHCB5dDAh BijQ30AGdG+0wnBSFxWnZy9vbSXhsS3p8fzOS0th/3K3kuLo16INGhqMIQf4HkYt lNxYA+YL18uVy3dpvPKgDpNA9AKrzP0bCZRBnTUTWYyV/n7o/J9kSk7TuYjAkQZl 260gADBas1XIDE7MSvEWizFRzuurDE7RGIFk6pRLN4m1ueqynoLxNrgsOBEjvui6 4cKmnlnWwcczpUd5FDTTpNHTGDZblP/ruoPnJz4MrBwzJyJ7PNe87n/t9aqRq6SS 6LgOojt+dNZQCtrS51bhyVJhq2FhaEjhCAGtJ/TQkRrfAakX6jCEs7HDVoPkkuOX wt88PIUhDUCYkhmOmHXpIZGkdg3rJlk/d8+lbhyDI1htce3ptpQ= =qo16 -----END PGP SIGNATURE----- Merge tag 'wireless-2024-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A small set of fixes: - avoid CSA warnings during link removal (by changing link bitmap after remove) - fix # of spatial streams initialisation - fix queues getting stuck in some CSA cases and resume failures - fix interface address when switching monitor mode - fix MBSS change flags 32-bit stack corruption - more UBSAN __counted_by "fixes" ... - fix link ID netlink validation * tag 'wireless-2024-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: sme: init n_channels before channels[] access wifi: mac80211: fix station NSS capability initialization order wifi: mac80211: fix vif addr when switching from monitor to station wifi: mac80211: fix a queue stall in certain cases of CSA wifi: mac80211: wake the queues in case of failure in resume wifi: cfg80211: clear link ID from bitmap during link delete after clean up wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beacon wifi: mac80211: fix mbss changed flags corruption on 32 bit systems wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one ==================== Link: https://patch.msgid.link/20241210130145.28618-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
MoYuanhao
|
06d64ab46f |
tcp: check space before adding MPTCP SYN options
Ensure there is enough space before adding MPTCP options in
tcp_syn_options().
Without this check, 'remaining' could underflow, and causes issues. If
there is not enough space, MPTCP should not be used.
Signed-off-by: MoYuanhao <moyuanhao3676@163.com>
Fixes:
|
||
Petr Machata
|
bbe4b41259 |
Documentation: networking: Add a caveat to nexthop_compat_mode sysctl
net.ipv4.nexthop_compat_mode was added when nexthop objects were added to
provide the view of nexthop objects through the usual lens of the route
UAPI. As nexthop objects evolved, the information provided through this
lens became incomplete. For example, details of resilient nexthop groups
are obviously omitted.
Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot
convey the >8-bit weight accurately. Instead of inventing workarounds for
an obsolete interface, just document the expectations of inaccuracy.
Fixes:
|
||
Michael Chan
|
24c6843b73 |
bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
the previous generation (5750X or P5). However, the aggregation ID
fields in the completion structures on P7 have been redefined from
16 bits to 12 bits. The freed up 4 bits are redefined for part of the
metadata such as the VLAN ID. The aggregation ID mask was not modified
when adding support for P7 chips. Including the extra 4 bits for the
aggregation ID can potentially cause the driver to store or fetch the
packet header of GRO/LRO packets in the wrong TPA buffer. It may hit
the BUG() condition in __skb_pull() because the SKB contains no valid
packet header:
kernel BUG at include/linux/skbuff.h:2766!
Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI
CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022
RIP: 0010:eth_type_trans+0xda/0x140
Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
RSP: 0018:ff615003803fcc28 EFLAGS: 00010283
RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040
RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000
RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001
R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0
R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000
FS: 0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x33/0x90
? do_trap+0xd9/0x100
? eth_type_trans+0xda/0x140
? do_error_trap+0x65/0x80
? eth_type_trans+0xda/0x140
? exc_invalid_op+0x4e/0x70
? eth_type_trans+0xda/0x140
? asm_exc_invalid_op+0x16/0x20
? eth_type_trans+0xda/0x140
bnxt_tpa_end+0x10b/0x6b0 [bnxt_en]
? bnxt_tpa_start+0x195/0x320 [bnxt_en]
bnxt_rx_pkt+0x902/0xd90 [bnxt_en]
? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en]
? kmem_cache_free+0x343/0x440
? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en]
__bnxt_poll_work+0x193/0x370 [bnxt_en]
bnxt_poll_p5+0x9a/0x300 [bnxt_en]
? try_to_wake_up+0x209/0x670
__napi_poll+0x29/0x1b0
Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
12 bits. This will work because the maximum aggregation ID is less
than 4096 on all P5_PLUS chips.
Fixes:
|
||
Paolo Abeni
|
51a00be6a0 |
udp: fix l4 hash after reconnect
After the blamed commit below, udp_rehash() is supposed to be called
with both local and remote addresses set.
Currently that is already the case for IPv6 sockets, but for IPv4 the
destination address is updated after rehashing.
Address the issue moving the destination address and port initialization
before rehashing.
Fixes:
|
||
Paolo Abeni
|
00301ef4b2 |
Merge branch 'virtio_net-correct-netdev_tx_reset_queue-invocation-points'
Koichiro Den says: ==================== virtio_net: correct netdev_tx_reset_queue() invocation points When virtnet_close is followed by virtnet_open, some TX completions can possibly remain unconsumed, until they are finally processed during the first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash [1]. Commit |
||
Koichiro Den
|
76a771ec4c |
virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.
Fixes:
|
||
Koichiro Den
|
8d2da07c81 |
virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
When virtqueue_reset() has actually recycled all unused buffers, additional work may be required in some cases. Relying solely on its return status is fragile, so introduce a new function argument 'recycle_done', which is invoked when it really occurs. Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Koichiro Den
|
1480f0f61b |
virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.
Fixes:
|
||
Koichiro Den
|
8d6712c892 |
virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
When virtqueue_resize() has actually recycled all unused buffers, additional work may be required in some cases. Relying solely on its return status is fragile, so introduce a new function argument 'recycle_done', which is invoked when the recycle really occurs. Cc: <stable@vger.kernel.org> # v6.11+ Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Koichiro Den
|
4571dc7272 |
virtio_net: replace vq2rxq with vq2txq where appropriate
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.
Fixes:
|
||
Koichiro Den
|
3ddccbefeb |
virtio_net: correct netdev_tx_reset_queue() invocation point
When virtnet_close is followed by virtnet_open, some TX completions can possibly remain unconsumed, until they are finally processed during the first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash [1]. Commit |
||
Geetha sowjanya
|
af47a328e8 |
octeontx2-af: Fix installation of PF multicast rule
Due to target variable is being reassigned in npc_install_flow()
function, PF multicast rules are not getting installed.
This patch addresses the issue by fixing the "IF" condition
checks when rules are installed by AF.
Fixes:
|
||
Jakub Kicinski
|
f136552b7c |
Merge branch 'qca_spi-fix-spi-specific-issues'
Stefan Wahren says: ==================== qca_spi: Fix SPI specific issues This small series address two annoying SPI specific issues of the qca_spi driver. ==================== Link: https://patch.msgid.link/20241206184643.123399-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Stefan Wahren
|
becc6399ce |
qca_spi: Make driver probing reliable
The module parameter qcaspi_pluggable controls if QCA7000 signature
should be checked at driver probe (current default) or not. Unfortunately
this could fail in case the chip is temporary in reset, which isn't under
total control by the Linux host. So disable this check per default
in order to avoid unexpected probe failures.
Fixes:
|
||
Stefan Wahren
|
4dba406fac |
qca_spi: Fix clock speed for multiple QCA7000
Storing the maximum clock speed in module parameter qcaspi_clkspeed
has the unintended side effect that the first probed instance
defines the value for all other instances. Fix this issue by storing
it in max_speed_hz of the relevant SPI device.
This fix keeps the priority of the speed parameter (module parameter,
device tree property, driver default). Btw this uses the opportunity
to get the rid of the unused member clkspeed.
Fixes:
|
||
Anumula Murali Mohan Reddy
|
356983f569 |
cxgb4: use port number to set mac addr
t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl()
uses port number to get mac addr, this leads to error when an attempt
to set MAC address on VF's of PF2 and PF3.
This patch fixes the issue by using port number to set mac address.
Fixes:
|
||
David S. Miller
|
b4906787d4 |
Merge branch 'net-sparx5-lan969x-fixes'
Daniel Machon says: ==================== net: sparx5: misc fixes for sparx5 and lan969x This series fixes various issues in the Sparx5 and lan969x drivers. Most of the fixes are for new issues introduced by the recent series adding lan969x switch support in the Sparx5 driver. Most notable is patch 1/5 that moves the lan969x dir into the sparx5 dir, in order to address a cyclic dependency issue reported by depmod, when installing modules. Details are in the commit descriptions. To: Andrew Lunn <andrew+netdev@lunn.ch> To: David S. Miller <davem@davemloft.net> To: Eric Dumazet <edumazet@google.com> To: Jakub Kicinski <kuba@kernel.org> To: Paolo Abeni <pabeni@redhat.com> To: Lars Povlsen <lars.povlsen@microchip.com> To: Steen Hegelund <Steen.Hegelund@microchip.com> To: UNGLinuxDriver@microchip.com To: Richard Cochran <richardcochran@gmail.com> To: Bjarni Jonasson <bjarni.jonasson@microchip.com> To: jensemil.schulzostergaard@microchip.com To: horatiu.vultur@microchip.com To: arnd@arndb.de To: jacob.e.keller@intel.com To: Parthiban.Veerasooran@microchip.com Cc: Calvin Owens <calvin@wbinvd.org> Cc: Muhammad Usama Anjum <Usama.Anjum@collabora.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org ==================== Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Daniel Machon
|
ddd7ba0060 |
net: sparx5: fix the maximum frame length register
On port initialization, we configure the maximum frame length accepted
by the receive module associated with the port. This value is currently
written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in
fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix
this.
Fixes:
|
||
Daniel Machon
|
e4d505fda6 |
net: sparx5: fix default value of monitor ports
When doing port mirroring, the physical port to send the frame to, is
written to the FRMC_PORT_VAL field of the QFWD_FRAME_COPY_CFG register.
This field is 7 bits wide on sparx5 and 6 bits wide on lan969x, and has
a default value of 65 and 30, respectively (the number of front ports).
On mirror deletion, we set the default value of the monitor port to
65 for this field, in case no more ports exists for the mirror. Needless
to say, this will not fit the 6 bits on lan969x.
Fix this by correctly using the n_ports constant instead.
Fixes:
|
||
Daniel Machon
|
f004f2e535 |
net: sparx5: fix FDMA performance issue
The FDMA handler is responsible for scheduling a NAPI poll, which will
eventually fetch RX packets from the FDMA queue. Currently, the FDMA
handler is run in a threaded context. For some reason, this kills
performance. Admittedly, I did not do a thorough investigation to see
exactly what causes the issue, however, I noticed that in the other
driver utilizing the same FDMA engine, we run the FDMA handler in hard
IRQ context.
Fix this performance issue, by running the FDMA handler in hard IRQ
context, not deferring any work to a thread.
Prior to this change, the RX UDP performance was:
Interval Transfer Bitrate Jitter
0.00-10.20 sec 44.6 MBytes 36.7 Mbits/sec 0.027 ms
After this change, the rx UDP performance is:
Interval Transfer Bitrate Jitter
0.00-9.12 sec 1.01 GBytes 953 Mbits/sec 0.020 ms
Fixes:
|
||
Daniel Machon
|
aa5fc88984 |
net: lan969x: fix the use of spin_lock in PTP handler
We are mixing the use of spin_lock() and spin_lock_irqsave() functions
in the PTP handler of lan969x. Fix this by correctly using the _irqsave
variants.
Fixes:
|
||
Daniel Machon
|
1cd7523f4b |
net: lan969x: fix cyclic dependency reported by depmod
Depmod reports a cyclic dependency between modules sparx5-switch.ko and
lan969x-switch.ko:
depmod: ERROR: Cycle detected: lan969x_switch -> sparx5_switch -> lan969x_switch
depmod: ERROR: Found 2 modules in dependency cycles!
make[2]: *** [scripts/Makefile.modinst:132: depmod] Error 1
make: *** [Makefile:224: __sub-make] Error 2
This makes sense, as they both require symbols from each other.
Fix this by compiling lan969x support into the sparx5-switch.ko module.
In order to do this, in a sensible way, we move the lan969x/ dir into
the sparx5/ dir and do some code cleanup of code that is no longer
required.
After this patch, depmod will no longer complain, as lan969x support is
compiled into the sparx5-swicth.ko module, and can no longer be compiled
as a standalone module.
Fixes:
|
||
Dan Carpenter
|
09310cfd4e |
rtnetlink: fix error code in rtnl_newlink()
If rtnl_get_peer_net() fails, then propagate the error code. Don't
return success.
Fixes:
|
||
Jakub Kicinski
|
ab80e715b7 |
Merge branch 'ocelot-ptp-fixes'
Vladimir Oltean says: ==================== Ocelot PTP fixes This is another attempt at "net: mscc: ocelot: be resilient to loss of PTP packets during transmission". https://lore.kernel.org/netdev/20241203164755.16115-1-vladimir.oltean@nxp.com/ The central change is in patch 4/5. It recovers a port from a very reproducible condition where previously, it would become unable to take any hardware TX timestamp. Then we have patches 1/5 and 5/5 which are extra bug fixes. Patches 2/5 and 3/5 are logical changes split out of the monolithic patch previously submitted as v1, so that patch 4/5 is hopefully easier to follow. ==================== Link: https://patch.msgid.link/20241205145519.1236778-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Vladimir Oltean
|
43a4166349 |
net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set()
An unsupported RX filter will leave the port with TX timestamping still
applied as per the new request, rather than the old setting. When
parsing the tx_type, don't apply it just yet, but delay that until after
we've parsed the rx_filter as well (and potentially returned -ERANGE for
that).
Similarly, copy_to_user() may fail, which is a rare occurrence, but
should still be treated by unwinding what was done.
Fixes:
|
||
Vladimir Oltean
|
b454abfab5 |
net: mscc: ocelot: be resilient to loss of PTP packets during transmission
The Felix DSA driver presents unique challenges that make the simplistic
ocelot PTP TX timestamping procedure unreliable: any transmitted packet
may be lost in hardware before it ever leaves our local system.
This may happen because there is congestion on the DSA conduit, the
switch CPU port or even user port (Qdiscs like taprio may delay packets
indefinitely by design).
The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(),
runs out of timestamp IDs eventually, because it never detects that
packets are lost, and keeps the IDs of the lost packets on hold
indefinitely. The manifestation of the issue once the entire timestamp
ID range becomes busy looks like this in dmesg:
mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp
mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp
At the surface level, we need a timeout timer so that the kernel knows a
timestamp ID is available again. But there is a deeper problem with the
implementation, which is the monotonically increasing ocelot_port->ts_id.
In the presence of packet loss, it will be impossible to detect that and
reuse one of the holes created in the range of free timestamp IDs.
What we actually need is a bitmap of 63 timestamp IDs tracking which one
is available. That is able to use up holes caused by packet loss, but
also gives us a unique opportunity to not implement an actual timer_list
for the timeout timer (very complicated in terms of locking).
We could only declare a timestamp ID stale on demand (lazily), aka when
there's no other timestamp ID available. There are pros and cons to this
approach: the implementation is much more simple than per-packet timers
would be, but most of the stale packets would be quasi-leaked - not
really leaked, but blocked in driver memory, since this algorithm sees
no reason to free them.
An improved technique would be to check for stale timestamp IDs every
time we allocate a new one. Assuming a constant flux of PTP packets,
this avoids stale packets being blocked in memory, but of course,
packets lost at the end of the flux are still blocked until the flux
resumes (nobody left to kick them out).
Since implementing per-packet timers is way too complicated, this should
be good enough.
Testing procedure:
Persistently block traffic class 5 and try to run PTP on it:
$ tc qdisc replace dev swp3 parent root taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0xdf 100000 flags 0x2
[ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
$ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3
ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[70.406]: timed out while polling for tx timestamp
ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[70.406]: port 1 (swp3): send peer delay response failed
ptp4l[70.407]: port 1 (swp3): clearing fault immediately
ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1
ptp4l[71.400]: timed out while polling for tx timestamp
ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[71.401]: port 1 (swp3): send peer delay response failed
ptp4l[71.401]: port 1 (swp3): clearing fault immediately
[ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2
ptp4l[72.401]: timed out while polling for tx timestamp
ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[72.402]: port 1 (swp3): send peer delay response failed
ptp4l[72.402]: port 1 (swp3): clearing fault immediately
ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3
ptp4l[73.400]: timed out while polling for tx timestamp
ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[73.400]: port 1 (swp3): send peer delay response failed
ptp4l[73.400]: port 1 (swp3): clearing fault immediately
[ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4
ptp4l[74.400]: timed out while polling for tx timestamp
ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[74.401]: port 1 (swp3): send peer delay response failed
ptp4l[74.401]: port 1 (swp3): clearing fault immediately
ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[75.410]: timed out while polling for tx timestamp
ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[75.411]: port 1 (swp3): send peer delay response failed
ptp4l[75.411]: port 1 (swp3): clearing fault immediately
(...)
Remove the blocking condition and see that the port recovers:
$ same tc command as above, but use "sched-entry S 0xff" instead
$ same ptp4l command as above
ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost
[ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost
[ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost
[ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost
[ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d
ptp4l[104.966]: port 1 (swp3): assuming the grand master role
ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER
[ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0
(...)
Notice that in this new usage pattern, a non-congested port should
basically use timestamp ID 0 all the time, progressing to higher numbers
only if there are unacknowledged timestamps in flight. Compare this to
the old usage, where the timestamp ID used to monotonically increase
modulo OCELOT_MAX_PTP_ID.
In terms of implementation, this simplifies the bookkeeping of the
ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse
the list of two-step timestampable skbs for each new packet anyway, the
information can already be computed and does not need to be stored.
Also, ocelot_port->tx_skbs is always accessed under the switch-wide
ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's
lock and can use the unlocked primitives safely.
This problem was actually detected using the tc-taprio offload, and is
causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959)
supports but Ocelot (VSC7514) does not. Thus, I've selected the commit
to blame as the one adding initial timestamping support for the Felix
switch.
Fixes:
|
||
Vladimir Oltean
|
0c53cdb95e |
net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safe
ocelot_get_txtstamp() is a threaded IRQ handler, requested explicitly as such by both ocelot_ptp_rdy_irq_handler() and vsc9959_irq_handler(). As such, it runs with IRQs enabled, and not in hardirq context. Thus, ocelot_port_add_txtstamp_skb() has no reason to turn off IRQs, it cannot be preempted by ocelot_get_txtstamp(). For the same reason, dev_kfree_skb_any_reason() will always evaluate as kfree_skb_reason() in this calling context, so just simplify the dev_kfree_skb_any() call to kfree_skb(). Also, ocelot_port_txtstamp_request() runs from NET_TX softirq context, not with hardirqs enabled. Thus, ocelot_get_txtstamp() which shares the ocelot_port->tx_skbs.lock lock with it, has no reason to disable hardirqs. This is part of a larger rework of the TX timestamping procedure. A logical subportion of the rework has been split into a separate change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241205145519.1236778-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Vladimir Oltean
|
b6fba4b3f0 |
net: mscc: ocelot: improve handling of TX timestamp for unknown skb
This condition, theoretically impossible to trigger, is not really
handled well. By "continuing", we are skipping the write to SYS_PTP_NXT
which advances the timestamp FIFO to the next entry. So we are reading
the same FIFO entry all over again, printing stack traces and eventually
killing the kernel.
No real problem has been observed here. This is part of a larger rework
of the timestamp IRQ procedure, with this logical change split out into
a patch of its own. We will need to "goto next_ts" for other conditions
as well.
Fixes:
|
||
Vladimir Oltean
|
4b01bec25b |
net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb()
If ocelot_port_add_txtstamp_skb() fails, for example due to a full PTP
timestamp FIFO, we must undo the skb_clone_sk() call with kfree_skb().
Otherwise, the reference to the skb clone is lost.
Fixes:
|
||
Kuniyuki Iwashima
|
cdd0b9132d |
ip: Return drop reason if in_dev is NULL in ip_route_input_rcu().
syzkaller reported a warning in __sk_skb_reason_drop(). Commit |
||
Russell King (Oracle)
|
4c49f38e20 |
net: stmmac: fix TSO DMA API usage causing oops
Commit |
||
Eric Dumazet
|
0f6ede9fbc |
net: defer final 'struct net' free in netns dismantle
Ilya reported a slab-use-after-free in dst_destroy [1] Issue is in xfrm6_net_init() and xfrm4_net_init() : They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops. But net structure might be freed before all the dst callbacks are called. So when dst_destroy() calls later : if (dst->ops->destroy) dst->ops->destroy(dst); dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed. See a relevant issue fixed in : |
||
Eric Dumazet
|
a6d75ecee2 |
net: lapb: increase LAPB_HEADER_LEN
It is unclear if net/lapb code is supposed to be ready for 8021q.
We can at least avoid crashes like the following :
skbuff: skb_under_panic: text:ffffffff8aabe1f6 len:24 put:20 head:ffff88802824a400 data:ffff88802824a3fe tail:0x16 end:0x140 dev:nr0.2
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5508 Comm: dhcpcd Not tainted 6.12.0-rc7-syzkaller-00144-g66418447d27b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0d 8d 48 c7 c6 2e 9e 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 1a 6f 37 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc90002ddf638 EFLAGS: 00010282
RAX: 0000000000000086 RBX: dffffc0000000000 RCX: 7a24750e538ff600
RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff888034a86650 R08: ffffffff8174b13c R09: 1ffff920005bbe60
R10: dffffc0000000000 R11: fffff520005bbe61 R12: 0000000000000140
R13: ffff88802824a400 R14: ffff88802824a3fe R15: 0000000000000016
FS: 00007f2a5990d740(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c2631fd CR3: 0000000029504000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
nr_header+0x36/0x320 net/netrom/nr_dev.c:69
dev_hard_header include/linux/netdevice.h:3148 [inline]
vlan_dev_hard_header+0x359/0x480 net/8021q/vlan_dev.c:83
dev_hard_header include/linux/netdevice.h:3148 [inline]
lapbeth_data_transmit+0x1f6/0x2a0 drivers/net/wan/lapbether.c:257
lapb_data_transmit+0x91/0xb0 net/lapb/lapb_iface.c:447
lapb_transmit_buffer+0x168/0x1f0 net/lapb/lapb_out.c:149
lapb_establish_data_link+0x84/0xd0
lapb_device_event+0x4e0/0x670
notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:8922
devinet_ioctl+0xa4e/0x1aa0 net/ipv4/devinet.c:1188
inet_ioctl+0x3d7/0x4f0 net/ipv4/af_inet.c:1003
sock_do_ioctl+0x158/0x460 net/socket.c:1227
sock_ioctl+0x626/0x8e0 net/socket.c:1346
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
Fixes:
|
||
Jakub Kicinski
|
ff9b305315 |
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says: ==================== bnxt_en: Bug fixes There are 2 bug fixes in this series. This first one fixes the issue of setting the gso_type incorrectly for HW GRO packets on 5750X (Thor) chips. This can cause HW GRO packets to be dropped by the stack if they are re-segmented. The second one fixes a potential division by zero crash when dumping FW log coredump. ==================== Link: https://patch.msgid.link/20241204215918.1692597-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Hongguang Gao
|
fab4b4d2c9 |
bnxt_en: Fix potential crash when dumping FW log coredump
If the FW log context memory is retained after FW reset, the existing
code is not handling the condition correctly and zeroes out the data
structures. This potentially will cause a division by zero crash
when the user runs ethtool -w. The last_type is also not set
correctly when the context memory is retained. This will cause errors
because the last_type signals to the FW that all context memory types
have been configured.
Oops: divide error: 0000 1 PREEMPT SMP NOPTI
CPU: 53 UID: 0 PID: 7019 Comm: ethtool Kdump: loaded Tainted: G OE 6.12.0-rc7+ #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Supermicro SYS-621C-TN12R/X13DDW-A, BIOS 1.4 08/10/2023
RIP: 0010:__bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
Code: 0a 31 d2 4c 89 6c 24 10 45 8b a5 fc df ff ff 4c 8b 74 24 20 31 db 66 89 44 24 06 48 63 c5 c1 e5 09 4c 0f af e0 48 8b 44 24 30 <49> f7 f4 4c 89 64 24 08 48 63 c5 4d 89 ec 31 ed 48 89 44 24 18 49
RSP: 0018:ff480591603d78b8 EFLAGS: 00010206
RAX: 0000000000100000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ff23959e46740000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000100000 R09: ff23959e46740000
R10: ff480591603d7a18 R11: 0000000000000010 R12: 0000000000000000
R13: ff23959e46742008 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f04227c1740(0000) GS:ff2395adbf680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f04225b33a5 CR3: 000000108b9a4001 CR4: 0000000000773ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? die+0x33/0x90
? do_trap+0xd9/0x100
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? do_error_trap+0x65/0x80
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? exc_divide_error+0x36/0x50
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? asm_exc_divide_error+0x16/0x20
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0xda/0x160 [bnxt_en]
bnxt_get_ctx_coredump.constprop.0+0x1ed/0x390 [bnxt_en]
? __memcg_slab_post_alloc_hook+0x21c/0x3c0
? __bnxt_get_coredump+0x473/0x4b0 [bnxt_en]
__bnxt_get_coredump+0x473/0x4b0 [bnxt_en]
? security_file_alloc+0x74/0xe0
? cred_has_capability.isra.0+0x78/0x120
bnxt_get_coredump_length+0x4b/0xf0 [bnxt_en]
bnxt_get_dump_flag+0x40/0x60 [bnxt_en]
__dev_ethtool+0x17e4/0x1fc0
? syscall_exit_to_user_mode+0xc/0x1d0
? do_syscall_64+0x85/0x150
? unmap_page_range+0x299/0x4b0
? vma_interval_tree_remove+0x215/0x2c0
? __kmalloc_cache_noprof+0x10a/0x300
dev_ethtool+0xa8/0x170
dev_ioctl+0x1b5/0x580
? sk_ioctl+0x4a/0x110
sock_do_ioctl+0xab/0xf0
sock_ioctl+0x1ca/0x2e0
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x79/0x150
Fixes:
|