Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use ip4h_dscp()instead of reading ip_hdr()->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Also, remove the comment about the net/ip.h include file, since it's
now required for the ip4h_dscp() helper too.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use ip4h_dscp()instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A recent commit jumped over the dst hash computation and
left the symbol uninitialized. Fix this by explicitly
computing the dst hash before it is used.
Fixes: 0045e3d80613 ("xfrm: Cache used outbound xfrm states at the policy.")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In ethtool_check_max_channel(), the new RX count must not only cover the
max queue indices in RSS indirection tables and RXNFC destinations
separately, but must also, for RXNFC rules with FLOW_RSS, cover the sum
of the destination queue and the maximum index in the associated RSS
context's indirection table, since that is the highest queue that the
rule can actually deliver traffic to.
It could be argued that the max queue across all custom RSS contexts
(ethtool_get_max_rss_ctx_channel()) need no longer be considered, since
any context to which packets can actually be delivered will be targeted
by some RXNFC rule and its max will thus be allowed for by
ethtool_get_max_rxnfc_channel(). For simplicity we keep both checks, so
even RSS contexts unused by any RXNFC rule must fit the channel count.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/43257d375434bef388e36181492aa4c458b88336.1731499022.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ethtool ntuple filters with FLOW_RSS were originally defined as adding
the base queue ID (ring_cookie) to the value from the indirection table,
so that the same table could distribute over more than one set of queues
when used by different filters.
However, some drivers / hardware ignore the ring_cookie, and simply use
the indirection table entries as queue IDs directly. Thus, for drivers
which have not opted in by setting ethtool_ops.cap_rss_rxnfc_adds to
declare that they support the original (addition) semantics, reject in
ethtool_set_rxnfc any filter which combines FLOW_RSS and a nonzero ring.
(For a ring_cookie of zero, both behaviours are equivalent.)
Set the cap bit in sfc, as it is known to support this feature.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/cc3da0844083b0e301a33092a6299e4042b65221.1731499022.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZzZUwAAKCRAraS+Z+3y5
E54pAP9kim6BVXVngcMBmyAKa1Fr0zLGj/Ds1JB+KFfQ/0v80wD/ebVpoIEoKHs9
/Xl/3WfN3JzIi9+mqIauENH6DTUQPAo=
=MWOY
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2024-11-14
We've added 9 non-merge commits during the last 4 day(s) which contain
a total of 3 files changed, 226 insertions(+), 84 deletions(-).
The main changes are:
1) Fixes to bpf_msg_push/pop_data and test_sockmap. The changes has
dependency on the other changes in the bpf-next/net branch,
from Zijian Zhang.
2) Drop netns codes from mptcp test. Reuse the common helpers in
test_progs, from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf, sockmap: Fix sk_msg_reset_curr
bpf, sockmap: Several fixes to bpf_msg_pop_data
bpf, sockmap: Several fixes to bpf_msg_push_data
selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap
selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap
selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
selftests/bpf: Fix SENDPAGE data logic in test_sockmap
selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
selftests/bpf: Drop netns helpers in mptcp
====================
Link: https://patch.msgid.link/20241114202832.3187927-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().
Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/8338a12377c44f698a651d1ce357dd92bdf18120.1731064982.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().
Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/35eacc8955003e434afb1365d404193cc98a9579.1731064982.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds the initial implementation of MGMT_OP_HCI_CMD_SYNC as
documented in mgmt-api (BlueZ tree):
Send HCI command and wait for event Command
===========================================
Command Code: 0x005B
Controller Index: <controller id>
Command Parameters: Opcode (2 Octets)
Event (1 Octet)
Timeout (1 Octet)
Parameter Length (2 Octets)
Parameter (variable)
Return Parameters: Response (1-variable Octets)
This command may be used to send a HCI command and wait for an
(optional) event.
The HCI command is specified by the Opcode, any arbitrary is supported
including vendor commands, but contrary to the like of
Raw/User channel it is run as an HCI command send by the kernel
since it uses its command synchronization thus it is possible to wait
for a specific event as a response.
Setting event to 0x00 will cause the command to wait for either
HCI Command Status or HCI Command Complete.
Timeout is specified in seconds, setting it to 0 will cause the
default timeout to be used.
Possible errors: Failed
Invalid Parameters
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
BT_CONNECTED state is reached") there is no long the need to call
mgmt_device_connected as ACL data will be queued until BT_CONNECTED
state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219458
Link: https://github.com/bluez/bluez/issues/1014
Fixes: 333b4fd11e89 ("Bluetooth: L2CAP: Fix uaf in l2cap_connect")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Before issuing the LE BIG Create Sync command, an available BIG handle
is chosen by iterating through the conn_hash list and finding the first
unused value.
If a BIG is terminated, the associated hcons are removed from the list
and the LE BIG Terminate Sync command is sent via hci_sync queue.
However, a new LE BIG Create sync command might be issued via
hci_send_cmd, before the previous BIG sync was terminated. This
can cause the same BIG handle to be reused and the LE BIG Create Sync
to fail with Command Disallowed.
< HCI Command: LE Broadcast Isochronous Group Create Sync (0x08|0x006b)
BIG Handle: 0x00
BIG Sync Handle: 0x0002
Encryption: Unencrypted (0x00)
Broadcast Code[16]: 00000000000000000000000000000000
Maximum Number Subevents: 0x00
Timeout: 20000 ms (0x07d0)
Number of BIS: 1
BIS ID: 0x01
> HCI Event: Command Status (0x0f) plen 4
LE Broadcast Isochronous Group Create Sync (0x08|0x006b) ncmd 1
Status: Command Disallowed (0x0c)
< HCI Command: LE Broadcast Isochronous Group Terminate Sync (0x08|0x006c)
BIG Handle: 0x00
This commit fixes the ordering of the LE BIG Create Sync/LE BIG Terminate
Sync commands, to make sure that either the previous BIG sync is
terminated before reusing the handle, or that a new handle is chosen
for a new sync.
Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This removes the kzalloc memory allocation inside critical section in
create_pa_sync, fixing the following message that appears when the kernel
is compiled with CONFIG_DEBUG_ATOMIC_SLEEP enabled:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:321
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This make use of kref to keep track of reference of iso_conn which
allows better tracking of its lifetime with usage of things like
kref_get_unless_zero in a similar way as used in l2cap_chan.
In addition to it remove call to iso_sock_set_timer on iso_sock_disconn
since at that point it is useless to set a timer as the sk will be freed
there is nothing to be done in iso_sock_timeout.
Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This make use of kref to keep track of reference of sco_conn which
allows better tracking of its lifetime with usage of things like
kref_get_unless_zero in a similar way as used in l2cap_chan.
In addition to it remove call to sco_sock_set_timer on __sco_sock_close
since at that point it is useless to set a timer as the sk will be freed
there is nothing to be done in sco_sock_timeout.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Currently, hci_conn_hash_lookup_big only checks for BIS master connections,
by filtering out connections with the destination address set. This commit
updates this function to also consider BIS slave connections, since it is
also used for a Broadcast Receiver to set an available BIG handle before
issuing the LE BIG Create Sync command.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The Bluetooth Core spec does not allow a LE BIG Create sync command to be
sent to Controller if another one is pending (Vol 4, Part E, page 2586).
In order to avoid this issue, the HCI_CONN_CREATE_BIG_SYNC was added
to mark that the LE BIG Create Sync command has been sent for a hcon.
Once the BIG Sync Established event is received, the hcon flag is
erased and the next pending hcon is handled.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Currently, when a BIS slave connection is notified to the
ISO layer, the parent socket is tried to be matched by the
HCI_EVT_LE_BIG_SYNC_ESTABILISHED event. However, a BIS slave
connection is notified to the ISO layer after the Command
Complete for the LE Setup ISO Data Path command is received.
This causes the parent to be incorrectly matched if multiple
listen sockets are present.
This commit adds a fix by matching the parent based on the
BIG handle set in the notified connection.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The Bluetooth Core spec does not allow a LE PA Create sync command to be
sent to Controller if another one is pending (Vol 4, Part E, page 2493).
In order to avoid this issue, the HCI_CONN_CREATE_PA_SYNC was added
to mark that the LE PA Create Sync command has been sent for a hcon.
Once the PA Sync Established event is received, the hcon flag is
erased and the next pending hcon is handled.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This adds support for quirks for broken extended create connection,
and write auth payload timeout.
Signed-off-by: Danil Pylaev <danstiv404@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Commit 9bf4e919ccad worked around an issue introduced after an innocuous
optimisation change in LLVM main:
> len is defined as an 'int' because it is assigned from
> '__user int *optlen'. However, it is clamped against the result of
> sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit
> platforms). This is done with min_t() because min() requires compatible
> types, which results in both len and the result of sizeof() being casted
> to 'unsigned int', meaning len changes signs and the result of sizeof()
> is truncated. From there, len is passed to copy_to_user(), which has a
> third parameter type of 'unsigned long', so it is widened and changes
> signs again. This excessive casting in combination with the KCSAN
> instrumentation causes LLVM to fail to eliminate the __bad_copy_from()
> call, failing the build.
The same issue occurs in rfcomm in functions rfcomm_sock_getsockopt and
rfcomm_sock_getsockopt_old.
Change the type of len to size_t in both rfcomm_sock_getsockopt and
rfcomm_sock_getsockopt_old and replace min_t() with min().
Cc: stable@vger.kernel.org
Co-authored-by: Aleksei Vetrov <vvvvvv@google.com>
Improves: 9bf4e919ccad ("Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()")
Link: https://github.com/ClangBuiltLinux/linux/issues/2007
Link: https://github.com/llvm/llvm-project/issues/85647
Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes not checking if skb really contains an SCO header otherwise
the code may attempt to access some uninitilized/invalid memory past the
valid skb->data.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
An hci_conn_drop() call was immediately used after a null pointer check
for an hci_conn_link() call in two function implementations.
Thus call such a function only once instead directly before the checks.
This issue was transformed by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When tb[IPSET_ATTR_IP_TO] is not present but tb[IPSET_ATTR_CIDR] exists,
the values of ip and ip_to are slightly swapped. Therefore, the range check
for ip should be done later, but this part is missing and it seems that the
vulnerability occurs.
So we should add missing range checks and remove unnecessary range checks.
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+58c872f7790a4d2ac951@syzkaller.appspotmail.com
Fixes: 72205fc68bd1 ("netfilter: ipset: bitmap:ip set type support")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the timeout/expire/flag members from nft_trans_one_elem struct into
a dybamically allocated structure, only needed when timeout update was
requested.
This halves size of nft_trans_one_elem struct and allows to compact up to
124 elements in one transaction container rather than 62.
This halves memory requirements for a large flush or insert transaction,
where ->update remains NULL.
Care has to be taken to release the extra data in all spots, including
abort path.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When queueing a set element add or removal operation to the transaction
log, check if the previous operation already asks for a the identical
operation on the same set.
If so, store the element reference in the preceding operation.
This significantlty reduces memory consumption when many set add/delete
operations appear in a single transaction.
Example: 10k elements require 937kb of memory (10k allocations from
kmalloc-96 slab).
Assuming we can compact 4 elements in the same set, 468 kbytes
are needed (64 bytes for base struct, nft_trans_elemn, 32 bytes
for nft_trans_one_elem structure, so 2500 allocations from kmalloc-192
slab).
For large batch updates we can compact up to 62 elements
into one single nft_trans_elem structure (~65% mem reduction):
(64 bytes for base struct, nft_trans_elem, 32 byte for nft_trans_one_elem
struct).
We can halve size of nft_trans_one_elem struct by moving
timeout/expire/update_flags into a dynamically allocated structure,
this allows to store 124 elements in a 2k slab nft_trans_elem struct.
This is done in a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nftables audit log format emits the number of added/deleted rules, sets,
set elements and so on, to userspace:
table=t1 family=2 entries=4 op=nft_register_set
~~~~~~~~~
At this time, the 'entries' key is the number of transactions that will
be applied.
The upcoming set element compression will coalesce subsequent
adds/deletes to the same set requests in the same transaction
request to conseve memory.
Without this patch, we'd under-report the number of altered elements.
Increment the audit counter by the number of elements to keep the reported
entries value the same.
Without this, nft_audit.sh selftest fails because the recorded
(expected) entries key is smaller than the expected one.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add helpers to release the individual elements contained in the
trans_elem container structure.
No functional change intended.
Followup patch will add 'nelems' member and will turn 'priv' into
a flexible array.
These helpers can then loop over all elements.
Care needs to be taken to handle a mix of new elements and existing
elements that are being updated (e.g. timeout refresh).
Before this patch, NEWSETELEM transaction with update is released
early so nft_trans_set_elem_destroy() won't get called, so we need
to skip elements marked as update.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add and use a wrapper to append trans_elem structures to the
transaction log.
Unlike the existing helper, pass a gfp_t to indicate if sleeping
is allowed.
This will be used by a followup patch to realloc nft_trans_elem
structures after they gain a flexible array member to reduce
number of such container structures on the transaction list.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Both gcc-14 and clang-18 report that passing a non-string literal as the
format argument of request_module() is potentially insecure.
E.g. clang-18 says:
.../nf_bpf_link.c:46:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
46 | err = request_module(mod);
| ^~~
.../kmod.h:25:55: note: expanded from macro 'request_module'
25 | #define request_module(mod...) __request_module(true, mod)
| ^~~
.../nf_bpf_link.c:46:24: note: treat the string as an argument to avoid this
46 | err = request_module(mod);
| ^
| "%s",
.../kmod.h:25:55: note: expanded from macro 'request_module'
25 | #define request_module(mod...) __request_module(true, mod)
| ^
It is always the case where the contents of mod is safe to pass as the
format argument. That is, in my understanding, it never contains any
format escape sequences.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue as suggested by
clang-18.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nftables batch processing does not currently populate extack with
policy errors. Fix this by passing extack when parsing batch messages.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The function strncpy is deprecated since it does not guarantee the
destination buffer is NULL terminated. Recommended replacement is
strscpy. The padded version was used to remain consistent with the other
strscpy_pad usage in the modified function.
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Error handling is missing when call to nla_put_u32() fails.
Handle the error when the call to nla_put_u32() returns an error.
The error was reported by Coverity Scan.
Report:
CID 1601525: (#1 of 1): Unused value (UNUSED_VALUE)
returned_value: Assigning value from nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num)
to err here, but that stored value is overwritten before it can be used
Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Everest K.C. <everestkc@everestkc.com.np>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.
6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by RetransmitAggre/3519:
#0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
#1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
stack backtrace:
lockdep_rcu_suspicious
mr_table_dump
ipmr_rtm_dumproute
rtnl_dump_all
rtnl_dumpit
netlink_dump
__netlink_dump_start
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.
To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.
The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.
This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241108-ipmr_rcu-v2-1-c718998e209b@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are
used as expected to iterate over the list of local addresses, but
list_for_each_entry() was used instead of list_for_each_entry_rcu() in
__lookup_addr(). It is important to use this variant which adds the
required READ_ONCE() (and diagnostic checks if enabled).
Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it
is called under the pernet->lock and not rcu_read_lock(), an extra
condition is then passed to help the diagnostic checks making sure
either the associated spin lock or the RCU lock is held.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-3-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When traversing userspace_pm_local_addr_list and deleting an entry from
it in mptcp_pm_nl_remove_doit(), msk->pm.lock should be held.
This patch holds this lock before mptcp_userspace_pm_lookup_addr_by_id()
and releases it after list_move() in mptcp_pm_nl_remove_doit().
Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-2-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Just like in-kernel pm, when userspace pm does set_flags, it needs to send
out MP_PRIO signal, and also modify the flags of the corresponding address
entry in the local address list. This patch implements the missing logic.
Traverse all address entries on userspace_pm_local_addr_list to find the
local address entry, if bkup is true, set the flags of this entry with
FLAG_BACKUP, otherwise, clear FLAG_BACKUP.
Fixes: 892f396c8e68 ("mptcp: netlink: issue MP_PRIO signals from userspace PMs")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-1-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Most likely the last -next pull request for v6.13. Most changes are in
Realtek and Qualcomm drivers, otherwise not really anything
noteworthy.
Major changes:
mac80211
* EHT 1024 aggregation size for transmissions
ath12k
* switch to using wiphy_lock() and remove ar->conf_mutex
* firmware coredump collection support
* add debugfs support for a multitude of statistics
ath11k
* dt: document WCN6855 hardware inputs
ath9k
* remove include/linux/ath9k_platform.h
ath5k
* Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
rtw88:
* 8821au and 8812au USB adapters support
rtw89
* thermal protection
* firmware secure boot for WiFi 6 chip
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmc04UYRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZuckgf/RV0zy8gMuzJ/cSk1GDKoOYmEwAZ4JvtW
teAKghsODDW/bng2iKnXphJyx3spZRCNuvOmfPcHsWoResX+vqrKJOaER/3159OF
68xAPZNXPRF4M693IpIUB/P3uTw/jieXPI7ftSPuUOhStca/ALwQd5Lp3kNKkVtq
HipXJwCenVS7Hd8DdHbpvYFUckRWr3tHPFlOgG3qOQOVvfRen2z9rhM14oK9rn+h
f309ATHKTbpTKNagOPYAYcyHs3zE59hlVRgRqHL7Ew0a0HI8uPJ4KK2n5W6tZJFN
swhoQolc1uXrRYlZ3Bdr7mKSIqt557kRz7NJ9ITe7KKCU0CxM/7nhQ==
=v8bS
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2024-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.13
Most likely the last -next pull request for v6.13. Most changes are in
Realtek and Qualcomm drivers, otherwise not really anything
noteworthy.
Major changes:
mac80211
* EHT 1024 aggregation size for transmissions
ath12k
* switch to using wiphy_lock() and remove ar->conf_mutex
* firmware coredump collection support
* add debugfs support for a multitude of statistics
ath11k
* dt: document WCN6855 hardware inputs
ath9k
* remove include/linux/ath9k_platform.h
ath5k
* Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
rtw88:
* 8821au and 8812au USB adapters support
rtw89
* thermal protection
* firmware secure boot for WiFi 6 chip
* tag 'wireless-next-2024-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (154 commits)
Revert "wifi: iwlegacy: do not skip frames with bad FCS"
wifi: mac80211: pass MBSSID config by reference
wifi: mac80211: Support EHT 1024 aggregation size in TX
net: rfkill: gpio: Add check for clk_enable()
wifi: brcmfmac: Fix oops due to NULL pointer dereference in brcmf_sdiod_sglist_rw()
wifi: Switch back to struct platform_driver::remove()
wifi: ipw2x00: libipw_rx_any(): fix bad alignment
wifi: brcmfmac: release 'root' node in all execution paths
wifi: iwlwifi: mvm: don't call power_update_mac in fast suspend
wifi: iwlwifi: s/IWL_MVM_INVALID_STA/IWL_INVALID_STA
wifi: iwlwifi: bump minimum API version in BZ/SC to 92
wifi: iwlwifi: move IWL_LMAC_*_INDEX to fw/api/context.h
wifi: iwlwifi: be less noisy if the NIC is dead in S3
wifi: iwlwifi: mvm: tell iwlmei when we finished suspending
wifi: iwlwifi: allow fast resume on ax200
wifi: iwlwifi: mvm: support new initiator and responder command version
wifi: iwlwifi: mvm: use wiphy locked debugfs for low-latency
wifi: iwlwifi: mvm: MLO scan upon channel condition degradation
wifi: iwlwifi: mvm: support new versions of the wowlan APIs
wifi: iwlwifi: mvm: allow always calling iwl_mvm_get_bss_vif()
...
====================
Link: https://patch.msgid.link/20241113172918.A8A11C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge bpf fixes after downstream PR.
In particular to bring the fix in
commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long").
The follow up verifier work depends on it.
And the fix in
commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator").
It's fixing instability of BPF CI on s390 arch.
No conflicts.
Adjacent changes in:
Auto-merging arch/Kconfig
Auto-merging kernel/bpf/helpers.c
Auto-merging kernel/bpf/memalloc.c
Auto-merging kernel/bpf/verifier.c
Auto-merging mm/slab_common.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6
(Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets
upon kTLS context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZzQV0BUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIPFywD9Fx9Qc7LdWGmRAmWTqGKSOVPTBC1L
eC/uXop6sLqapP0A/1KsLQmntvXhp+gmxzPEBdwAwb7/DvyPCQV19FZ/sIkA
=lDzI
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
selftests/bpf: Use -4095 as the bad address for bits iterator
Commit 0f6deac3a079 ("net: page_pool: add page allocation stats for
two fast page allocate path") added increments for "fast path"
allocation to page frag alloc. It mentions performance degradation
analysis but the details are unclear. Could be that the author
was simply surprised by the alloc stats not matching packet count.
In my experience the key metric for page pool is the recycling rate.
Page return stats, however, count returned _pages_ not frags.
This makes it impossible to calculate recycling rate for drivers
using the frag API. Here is example output of the page-pool
YNL sample for a driver allocating 1200B frags (4k pages)
with nearly perfect recycling:
$ ./page-pool
eth0[2] page pools: 32 (zombies: 0)
refs: 291648 bytes: 1194590208 (refs: 0 bytes: 0)
recycling: 33.3% (alloc: 4557:2256365862 recycle: 200476245:551541893)
The recycling rate is reported as 33.3% because we give out
4096 // 1200 = 3 frags for every recycled page.
Effectively revert the aforementioned commit. This also aligns
with the stats we would see for drivers which do the fragmentation
themselves, although that's not a strong reason in itself.
On the (very unlikely) path where we can reuse the current page
let's bump the "cached" stat. The fact that we don't put the page
in the cache is just an optimization.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://patch.msgid.link/20241109023303.3366500-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":
tc filter add dev myve $FILTER1
tc filter add dev myve $FILTER2
for i in {1..2048}
do
echo $i
tc filter del dev myve $FILTER2
tc filter add dev myve $FILTER2
done
This patch adds the missing decoding logic for handles that
deserve it.
Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20241110172836.331319-1-alexandre.ferrieux@orange.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In 'mptcp_reset_tout_timer', promote 'probe_timestamp' to unsigned long
to avoid possible integer overflow. Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Kandybka <d.kandybka@gmail.com>
Link: https://patch.msgid.link/20241107103657.1560536-1-d.kandybka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
BT_CONNECTED state is reached") there is no long the need to call
mgmt_device_connected as ACL data will be queued until BT_CONNECTED
state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219458
Link: https://github.com/bluez/bluez/issues/1014
Fixes: 333b4fd11e89 ("Bluetooth: L2CAP: Fix uaf in l2cap_connect")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>