1043600 Commits

Author SHA1 Message Date
David S. Miller
578f393227 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/
ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2021-10-07

1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default.
   From Pavel Skripkin.

2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING
   messages were accidentally not paced at the end.
   Fix by Eugene Syromiatnikov.

3) Fix the uapi for the default policy, use explicit field and macros
   and make it accessible to userland.
   From Nicolas Dichtel.

4) Fix a missing rcu lock in xfrm_notify_userpolicy().
   From Nicolas Dichtel.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07 12:44:41 +01:00
David S. Miller
65f280bb65 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-
queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-10-06

This series contains updates to i40e and iavf drivers.

Jiri Benc expands an error check to prevent infinite loop for i40e.

Sylwester prevents freeing of uninitialized IRQ vector to resolve a
kernel oops for i40e.

Stefan Assmann fixes a double mutex unlock for iavf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07 12:38:15 +01:00
Stefan Assmann
54ee39439a iavf: fix double unlock of crit_lock
The crit_lock mutex could be unlocked twice as reported here
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210823/025525.html

Remove the superfluous unlock. Technically the problem was already
present before 5ac49f3c2702 as that commit only replaced the locking
primitive, but no functional change.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Fixes: bac8486116b0 ("iavf: Refactor the watchdog state machine")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06 09:49:34 -07:00
Sylwester Dziedziuch
2e5a20573a i40e: Fix freeing of uninitialized misc IRQ vector
When VSI set up failed in i40e_probe() as part of PF switch set up
driver was trying to free misc IRQ vectors in
i40e_clear_interrupt_scheme and produced a kernel Oops:

   Trying to free already-free IRQ 266
   WARNING: CPU: 0 PID: 5 at kernel/irq/manage.c:1731 __free_irq+0x9a/0x300
   Workqueue: events work_for_cpu_fn
   RIP: 0010:__free_irq+0x9a/0x300
   Call Trace:
   ? synchronize_irq+0x3a/0xa0
   free_irq+0x2e/0x60
   i40e_clear_interrupt_scheme+0x53/0x190 [i40e]
   i40e_probe.part.108+0x134b/0x1a40 [i40e]
   ? kmem_cache_alloc+0x158/0x1c0
   ? acpi_ut_update_ref_count.part.1+0x8e/0x345
   ? acpi_ut_update_object_reference+0x15e/0x1e2
   ? strstr+0x21/0x70
   ? irq_get_irq_data+0xa/0x20
   ? mp_check_pin_attr+0x13/0xc0
   ? irq_get_irq_data+0xa/0x20
   ? mp_map_pin_to_irq+0xd3/0x2f0
   ? acpi_register_gsi_ioapic+0x93/0x170
   ? pci_conf1_read+0xa4/0x100
   ? pci_bus_read_config_word+0x49/0x70
   ? do_pci_enable_device+0xcc/0x100
   local_pci_probe+0x41/0x90
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1a7/0x360
   worker_thread+0x1cf/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x112/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x1f/0x40

The problem is that at that point misc IRQ vectors
were not allocated yet and we get a call trace
that driver is trying to free already free IRQ vectors.

Add a check in i40e_clear_interrupt_scheme for __I40E_MISC_IRQ_REQUESTED
PF state before calling i40e_free_misc_vector. This state is set only if
misc IRQ vectors were properly initialized.

Fixes: c17401a1dd21 ("i40e: use separate state bit for miscellaneous IRQ setup")
Reported-by: PJ Waskiewicz <pwaskiewicz@jumptrading.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06 09:49:34 -07:00
Jiri Benc
857b6c6f66 i40e: fix endless loop under rtnl
The loop in i40e_get_capabilities can never end. The problem is that
although i40e_aq_discover_capabilities returns with an error if there's
a firmware problem, the returned error is not checked. There is a check for
pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most
firmware problems.

When i40e_aq_discover_capabilities encounters a firmware problem, it will
encounter the same problem on its next invocation. As the result, the loop
becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking
at the code, it can happen with a range of other firmware errors.

I don't know what the correct behavior should be: whether the firmware
should be retried a few times, or whether pf->hw.aq.asq_last_status should
be always set to the encountered firmware error (but then it would be
pointless and can be just replaced by the i40e_aq_discover_capabilities
return value). However, the current behavior with an endless loop under the
rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we
explained the bug to them 7 months ago.

This may not be the best possible fix but it's better than hanging the whole
system on a firmware bug.

Fixes: 56a62fc86895 ("i40e: init code and hardware support")
Tested-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06 09:49:23 -07:00
Marcel Ziswiler
a50a059523 dt-bindings: net: dsa: marvell: fix compatible in example
While the MV88E6390 switch chip exists, one is supposed to use a
compatible of "marvell,mv88e6190" for it. Fix this in the given example.

Signed-off-by: Marcel Ziswiler <marcel@ziswiler.com>
Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:20:40 +01:00
Shannon Nelson
3707428dda ionic: move filter sync_needed bit set
Move the setting of the filter-sync-needed bit to the error
case in the filter add routine to be sure we're checking the
live filter status rather than a copy of the pre-sync status.

Fixes: 969f84394604 ("ionic: sync the filters in the work task")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:15:05 +01:00
Eric Dumazet
17c37d748f gve: report 64bit tx_bytes counter from gve_handle_report_stats()
Each tx queue maintains a 64bit counter for bytes, there is
no reason to truncate this to 32bit (or this has not been
documented)

Fixes: 24aeb56f2d38 ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yangchun Fu <yangchun@google.com>
Cc: Kuo Zhao <kuozhao@google.com>
Cc: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:11:51 +01:00
Eric Dumazet
2f57d4975f gve: fix gve_get_stats()
gve_get_stats() can report wrong numbers if/when u64_stats_fetch_retry()
returns true.

What is needed here is to sample values in temporary variables,
and only use them after each loop is ended.

Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Sagi Shahar <sagis@google.com>
Cc: Jon Olson <jonolson@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Tao Liu <xliutaox@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:11:51 +01:00
Eric Dumazet
d343679919 rtnetlink: fix if_nlmsg_stats_size() under estimation
rtnl_fill_statsinfo() is filling skb with one mandatory if_stats_msg structure.

nlmsg_put(skb, pid, seq, type, sizeof(struct if_stats_msg), flags);

But if_nlmsg_stats_size() never considered the needed storage.

This bug did not show up because alloc_skb(X) allocates skb with
extra tailroom, because of added alignments. This could very well
be changed in the future to have deterministic behavior.

Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:09:46 +01:00
Catherine Sullivan
d4b111fda6 gve: Properly handle errors in gve_assign_qpl
Ignored errors would result in crash.

Fixes: ede3fcf5ec67f ("gve: Add support for raw addressing to the rx path")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:04:26 +01:00
Tao Liu
922aa9bcac gve: Avoid freeing NULL pointer
Prevent possible crashes when cleaning up after unsuccessful
initializations.

Fixes: 893ce44df5658 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sully <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:04:26 +01:00
Catherine Sullivan
d03477ee10 gve: Correct available tx qpl check
The qpl_map_size is rounded up to a multiple of sizeof(long), but the
number of qpls doesn't have to be.

Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:04:26 +01:00
David S. Miller
9997080df0 Merge branch 'stmmac-eee-fix'
Wong Vee Khee says:

====================
net: stmmac: Turn off EEE on MAC link down

This patch series ensure PCS EEE is turned off on the event of MAC
link down.

Tested on Intel AlderLake-S (STMMAC + MaxLinear GPY211 PHY).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 11:18:27 +01:00
Wong Vee Khee
d4aeaed80b net: stmmac: trigger PCS EEE to turn off on link down
The current implementation enable PCS EEE feature in the event of link
up, but PCS EEE feature is not disabled on link down.

This patch makes sure PCE EEE feature is disabled on link down.

Fixes: 656ed8b015f1 ("net: stmmac: fix EEE init issue when paired with EEE capable PHYs")
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 11:18:27 +01:00
Wong Vee Khee
590df78bc7 net: pcs: xpcs: fix incorrect steps on disable EEE
When Energy-Efficient Ethernet(EEE) is disable from the MAC side,
we need to clear the DW_VR_MII_EEE_TRN_LPI bit of DW_VR_MII_EEE_MCTRL1
register.

Fixes: 7617af3d1a5e ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 11:18:27 +01:00
Eric Dumazet
7707a4d01a netlink: annotate data races around nlk->bound
While existing code is correct, KCSAN is reporting
a data-race in netlink_insert / netlink_sendmsg [1]

It is correct to read nlk->bound without a lock, as netlink_autobind()
will acquire all needed locks.

[1]
BUG: KCSAN: data-race in netlink_insert / netlink_sendmsg

write to 0xffff8881031c8b30 of 1 bytes by task 18752 on cpu 0:
 netlink_insert+0x5cc/0x7f0 net/netlink/af_netlink.c:597
 netlink_autobind+0xa9/0x150 net/netlink/af_netlink.c:842
 netlink_sendmsg+0x479/0x7c0 net/netlink/af_netlink.c:1892
 sock_sendmsg_nosec net/socket.c:703 [inline]
 sock_sendmsg net/socket.c:723 [inline]
 ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
 ___sys_sendmsg net/socket.c:2446 [inline]
 __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
 __do_sys_sendmsg net/socket.c:2484 [inline]
 __se_sys_sendmsg net/socket.c:2482 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff8881031c8b30 of 1 bytes by task 18751 on cpu 1:
 netlink_sendmsg+0x270/0x7c0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:703 [inline]
 sock_sendmsg net/socket.c:723 [inline]
 __sys_sendto+0x2a8/0x370 net/socket.c:2019
 __do_sys_sendto net/socket.c:2031 [inline]
 __se_sys_sendto net/socket.c:2027 [inline]
 __x64_sys_sendto+0x74/0x90 net/socket.c:2027
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00 -> 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 18751 Comm: syz-executor.0 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: da314c9923fe ("netlink: Replace rhash_portid with bound")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:11:09 +01:00
Wong Vee Khee
e3cf002d5a net: pcs: xpcs: fix incorrect CL37 AN sequence
According to Synopsys DesignWare Cores Ethernet PCS databook, it is
required to disable Clause 37 auto-negotiation by programming bit-12
(AN_ENABLE) to 0 if it is already enabled, before programming various
fields of VR_MII_AN_CTRL registers.

After all these programming are done, it is then required to enable
Clause 37 auto-negotiation by programming bit-12 (AN_ENABLE) to 1.

Fixes: b97b5331b8ab ("net: pcs: add C37 SGMII AN support for intel mGbE controller")
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:10:24 +01:00
Sean Anderson
25a9da6641 net: sfp: Fix typo in state machine debug string
The string should be "tx_disable" to match the state enum.

Fixes: 4005a7cb4f55 ("net: phy: sftp: print debug message with text, not numbers")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:08:13 +01:00
Eric Dumazet
a56d447f19 net/sched: sch_taprio: properly cancel timer from taprio_destroy()
There is a comment in qdisc_create() about us not calling ops->reset()
in some cases.

err_out4:
	/*
	 * Any broken qdiscs that would require a ops->reset() here?
	 * The qdisc was never in action so it shouldn't be necessary.
	 */

As taprio sets a timer before actually receiving a packet, we need
to cancel it from ops->destroy, just in case ops->reset has not
been called.

syzbot reported:

ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22
WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000130f330 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff88802baeb880 RSI: ffffffff815d87b5 RDI: fffff52000261e58
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815d25ee R11: 0000000000000000 R12: ffffffff898dd020
R13: ffffffff89e3ce20 R14: ffffffff81653630 R15: dffffc0000000000
FS:  0000000000f0d300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffb64b3e000 CR3: 0000000036557000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __debug_check_no_obj_freed lib/debugobjects.c:987 [inline]
 debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018
 slab_free_hook mm/slub.c:1603 [inline]
 slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653
 slab_free mm/slub.c:3213 [inline]
 kfree+0xe4/0x540 mm/slub.c:4267
 qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299
 tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2403
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2457
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80

Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:45:27 +01:00
David S. Miller
64506cb928 Merge branch 'bridge-fixes'
Eric Dumazet says:

====================
net: bridge: br_get_linkxstats_size() fixes

This patch series attempts to fix the following syzbot report.

WARNING: CPU: 1 PID: 21425 at net/core/rtnetlink.c:5388 rtnl_stats_get+0x80f/0x8c0 net/core/rtnetlink.c:5388
Modules linked in:
CPU: 1 PID: 21425 Comm: syz-executor394 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rtnl_stats_get+0x80f/0x8c0 net/core/rtnetlink.c:5388
Code: e9 9c fc ff ff 4c 89 e7 89 0c 24 e8 ab 8b a8 fa 8b 0c 24 e9 bc fc ff ff 4c 89 e7 e8 9b 8b a8 fa e9 df fe ff ff e8 61 85 63 fa <0f> 0b e9 f7 fc ff ff 41 be ea ff ff ff e9 f9 fc ff ff 41 be 97 ff
RSP: 0018:ffffc9000cf77688 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 000000000000012c RCX: 0000000000000000
RDX: ffff8880211754c0 RSI: ffffffff8711571f RDI: 0000000000000003
RBP: ffff8880175aa780 R08: 00000000ffffffa6 R09: ffff88823bd5c04f
R10: ffffffff87115413 R11: 0000000000000001 R12: ffff8880175aab74
R13: ffff8880175aab40 R14: 00000000ffffffa6 R15: 0000000000000006
FS:  0000000001ff9300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005cfd58 CR3: 000000002cd43000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5562
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4440d9
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:39:19 +01:00
Eric Dumazet
0854a05133 net: bridge: fix under estimation in br_get_linkxstats_size()
Commit de1799667b00 ("net: bridge: add STP xstats")
added an additional nla_reserve_64bit() in br_fill_linkxstats(),
but forgot to update br_get_linkxstats_size() accordingly.

This can trigger the following in rtnl_stats_get()

	WARN_ON(err == -EMSGSIZE);

Fixes: de1799667b00 ("net: bridge: add STP xstats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:39:07 +01:00
Eric Dumazet
dbe0b88064 net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size()
bridge_fill_linkxstats() is using nla_reserve_64bit().

We must use nla_total_size_64bit() instead of nla_total_size()
for corresponding data structure.

Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:39:07 +01:00
Hayes Wang
baf33d7a75 r8152: avoid to resubmit rx immediately
For the situation that the disconnect event comes very late when the
device is unplugged, the driver would resubmit the RX bulk transfer
after getting the callback with -EPROTO immediately and continually.
Finally, soft lockup occurs.

This patch avoids to resubmit RX immediately. It uses a workqueue to
schedule the RX NAPI. And the NAPI would resubmit the RX. It let the
disconnect event have opportunity to stop the submission before soft
lockup.

Reported-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Tested-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:38:39 +01:00
Jakub Kicinski
3f6cffb860 etherdevice: use __dev_addr_set()
Andrew points out that eth_hw_addr_set() replaces memcpy()
calls so we can't use ether_addr_copy() which assumes
both arguments are 2-bytes aligned.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:38:39 +01:00
Andrew Lunn
b44d52a50b dsa: tag_dsa: Fix mask for trunked packets
A packet received on a trunk will have bit 2 set in Forward DSA tagged
frame. Bit 1 can be either 0 or 1 and is otherwise undefined and bit 0
indicates the frame CFI. Masking with 7 thus results in frames as
being identified as being from a trunk when in fact they are not. Fix
the mask to just look at bit 2.

Fixes: 5b60dadb71db ("net: dsa: tag_dsa: Support reception of packets from LAG devices")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-04 13:43:55 +01:00
Andy Shevchenko
7cd8b1542a ptp_pch: Load module automatically if ID matches
The driver can't be loaded automatically because it misses
module alias to be provided. Add corresponding MODULE_DEVICE_TABLE()
call to the driver.

Fixes: 863d08ece9bf ("supports eg20t ptp clock")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02 16:13:00 +01:00
Pali Rohár
eed183abc0 powerpc/fsl/dts: Fix phy-connection-type for fm1mac3
Property phy-connection-type contains invalid value "sgmii-2500" per scheme
defined in file ethernet-controller.yaml.

Correct phy-connection-type value should be "2500base-x".

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 84e0f1c13806 ("powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)")
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02 14:11:46 +01:00
David S. Miller
dade7f9d81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net (v2)

The following patchset contains Netfilter fixes for net:

1) Move back the defrag users fields to the global netns_nf area.
   Kernel fails to boot if conntrack is builtin and kernel is booted
   with: nf_conntrack.enable_hooks=1. From Florian Westphal.

2) Rule event notification is missing relevant context such as
   the position handle and the NLM_F_APPEND flag.

3) Rule replacement is expanded to add + delete using the existing
   rule handle, reverse order of this operation so it makes sense
   from rule notification standpoint.

4) Propagate to userspace the NLM_F_CREATE and NLM_F_EXCL flags
   from the rule notification path.

Patches #2, #3 and #4 are used by 'nft monitor' and 'iptables-monitor'
userspace utilities which are not correctly representing the following
operations through netlink notifications:

- rule insertions
- rule addition/insertion from position handle
- create table/chain/set/map/flowtable/...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02 13:55:02 +01:00
Pablo Neira Ayuso
6fb721cf78 netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification
Include the NLM_F_CREATE and NLM_F_EXCL flags in netlink event
notifications, otherwise userspace cannot distiguish between create and
add commands.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-02 12:00:17 +02:00
Leon Romanovsky
5cfe5109a1 MAINTAINERS: Remove Bin Luo as his email bounces
The emails sent to luobin9@huawei.com bounce with error:
 "Recipient address rejected: Failed recipient validation check."

So let's remove his entry and change the status of hinic driver till
someone in Huawei will step-in to maintain it again.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/045a32ccf394de66b7899c8b732f44dc5f4a1154.1632978665.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 15:17:10 -07:00
Punit Agrawal
aec3f415f7 net: stmmac: dwmac-rk: Fix ethernet on rk3399 based devices
Commit 2d26f6e39afb ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
while getting rid of a runtime PM warning ended up breaking ethernet
on rk3399 based devices. By dropping an extra reference to the device,
the commit ends up enabling suspend / resume of the ethernet device -
which appears to be broken.

While the issue with runtime pm is being investigated, partially
revert commit 2d26f6e39afb to restore the network on rk3399.

Fixes: 2d26f6e39afb ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
Suggested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Cc: Michael Riesch <michael.riesch@wolfvision.net>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20210929135049.3426058-1-punitagrawal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 15:15:29 -07:00
Vladimir Oltean
019d9329e7 net: mscc: ocelot: fix VCAP filters remaining active after being deleted
When ocelot_flower.c calls ocelot_vcap_filter_add(), the filter has a
given filter->id.cookie. This filter is added to the block->rules list.

However, when ocelot_flower.c calls ocelot_vcap_block_find_filter_by_id()
which passes the cookie as argument, the filter is never found by
filter->id.cookie when searching through the block->rules list.

This is unsurprising, since the filter->id.cookie is an unsigned long,
but the cookie argument provided to ocelot_vcap_block_find_filter_by_id()
is a signed int, and the comparison fails.

Fixes: 50c6cc5b9283 ("net: mscc: ocelot: store a namespaced VCAP filter ID")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210930125330.2078625-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 15:13:20 -07:00
Eric Dumazet
560ee196fe net_sched: fix NULL deref in fifo_set_limit()
syzbot reported another NULL deref in fifo_set_limit() [1]

I could repro the issue with :

unshare -n
tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit
tc qd replace dev lo parent 1:0 pfifo_fast
tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit

pfifo_fast does not have a change() operation.
Make fifo_set_limit() more robust about this.

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 1cf99067 P4D 1cf99067 PUD 7ca49067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
RSP: 0018:ffffc9000e2f7310 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff8d6ecc00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff888024c27910 RDI: ffff888071e34000
RBP: ffff888071e34000 R08: 0000000000000001 R09: ffffffff8fcfb947
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888024c27910
R13: ffff888071e34018 R14: 0000000000000000 R15: ffff88801ef74800
FS:  00007f321d897700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000000722c3000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 fifo_set_limit net/sched/sch_fifo.c:242 [inline]
 fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227
 tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418
 qdisc_change net/sched/sch_api.c:1332 [inline]
 tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: fb0305ce1b03 ("net-sched: consolidate default fifo qdisc setup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 14:59:10 -07:00
Eric Dumazet
5fb14d20f8 net: add kerneldoc comment for sk_peer_lock
Fixes following warning:

include/net/sock.h:533: warning: Function parameter or member 'sk_peer_lock' not described in 'sock'

Fixes: 35306eb23814 ("af_unix: fix races in sk_peer_pid and sk_peer_cred accesses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20211001164622.58520-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 11:58:38 -07:00
David S. Miller
78764f450b mlx5-fixes-2021-09-30
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFWJy0ACgkQSD+KveBX
 +j7wJQgAqlx3rBiJGYgtnEr/HHzBSgZXkIICGfoQHfiOORoNTjpq46Csxuiu2Fg1
 JvxB1Dahoht/8Vfcm+GooKKDRlkUXx8vYhQF3BSQ7oboVGkWZ1a6ZuEUSfW3rrag
 oBkSCuVcn6EwVBFF0NvAi9ARd2G5GAYsew9yQHnXVQzp9ZWqzZcMMsp3pdhQvKLu
 CL0SlY0uZhXHyHNl5Gz79xQDPwSyXbVlhhEK9lIaPi5tcqA3X1Y7ZTqP/ouDjZBj
 /VLCIYJZ40471c4a2YVLEXiVZu0E0BDgOLlEWRiHIkTrQf+gjcjOCOUT1CkMOuid
 F7Ny4xnVaCzCIx8V5i3FXAOfcOeOIA==
 =TCu2
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-fixes-2021-09-30
2021-10-01 14:16:52 +01:00
David S. Miller
5abab4982d wireless-drivers fixes for v5.15
Second set of fixes for v5.15, nothing major this time. Most important
 here are reverting a brcmfmac regression and a fix for an old rare
 ath5k build error.
 
 iwlwifi
 
 * fixes to NULL dereference, off by one and missing unlock
 
 * add support for Killer AX1650 on Dell XPS 15 (9510) laptop
 
 ath5k
 
 * build fix with LEDS=m
 
 brcmfmac
 
 * revert a regression causing BCM4359/9 devices stop working as access point
 
 mwifiex
 
 * fix clang warning about null pointer arithmetic
 -----BEGIN PGP SIGNATURE-----
 
 iQFJBAABCgAzFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmFWy3oVHGt2YWxvQGNv
 ZGVhdXJvcmEub3JnAAoJEG4XJFUm622bBRYIAKBa0NeCkgX8Jnfj492SQj8E9B2o
 +fim/W1OcBQqJnfFKaTgEHIYvudnaLuOAiCnTpzvQ5+Dc+dvM3RaPz9c405uzhwj
 TaHYs8zoB+//VWTGwmL+ACBYpUCdlVHRzwREv6hCiprd7M57rs1IYkgDgQ8EFCop
 prm3z4+Gcr48cNB0CYY39xOEz8K0+pjq82DMK5DnBrfBhO9xjf9PybXqKwjnEcxm
 qoRApBjNtDXrQVYHg+AcF8XQ+Vix9H5kSOW+60nGMCtkQszxvu/SQi6hJWALY1U2
 0iFNGj/uY92wibQBenqW/nc96vq99HSWPPJeAD52KPXIWZFIUt3V5vJ8aRU=
 =hYgS
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-2021-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

wireless-drivers fixes for v5.15

Second set of fixes for v5.15, nothing major this time. Most important
here are reverting a brcmfmac regression and a fix for an old rare
ath5k build error.

iwlwifi

* fixes to NULL dereference, off by one and missing unlock

* add support for Killer AX1650 on Dell XPS 15 (9510) laptop

ath5k

* build fix with LEDS=m

brcmfmac

* revert a regression causing BCM4359/9 devices stop working as access point

mwifiex

* fix clang warning about null pointer arithmetic
2021-10-01 14:16:29 +01:00
Pavel Skripkin
ca6e11c337 phy: mdio: fix memory leak
Syzbot reported memory leak in MDIO bus interface, the problem was in
wrong state logic.

MDIOBUS_ALLOCATED indicates 2 states:
	1. Bus is only allocated
	2. Bus allocated and __mdiobus_register() fails, but
	   device_register() was called

In case of device_register() has been called we should call put_device()
to correctly free the memory allocated for this device, but mdiobus_free()
calls just kfree(dev) in case of MDIOBUS_ALLOCATED state

To avoid this behaviour we need to set bus->state to MDIOBUS_UNREGISTERED
_before_ calling device_register(), because put_device() should be
called even in case of device_register() failure.

Link: https://lore.kernel.org/netdev/YVMRWNDZDUOvQjHL@shell.armlinux.org.uk/
Fixes: 46abc02175b3 ("phylib: give mdio buses a device tree presence")
Reported-and-tested-by: syzbot+398e7dc692ddbbb4cfec@syzkaller.appspotmail.com
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/eceae1429fbf8fa5c73dd2a0d39d525aa905074d.1633024062.git.paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 17:11:59 -07:00
Pavel Skripkin
10eff1f578 Revert "net: mdiobus: Fix memory leak in __mdiobus_register"
This reverts commit ab609f25d19858513919369ff3d9a63c02cd9e2e.

This patch is correct in the sense that we _should_ call device_put() in
case of device_register() failure, but the problem in this code is more
vast.

We need to set bus->state to UNMDIOBUS_REGISTERED before calling
device_register() to correctly release the device in mdiobus_free().
This patch prevents us from doing it, since in case of device_register()
failure put_device() will be called 2 times and it will cause UAF or
something else.

Also, Reported-by: tag in revered commit was wrong, since syzbot
reported different leak in same function.

Link: https://lore.kernel.org/netdev/20210928092657.GI2048@kadam/
Acked-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/f12fb1faa4eccf0f355788225335eb4309ff2599.1633024062.git.paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 17:11:58 -07:00
Linus Torvalds
4de593fb96 Networking fixes for 5.15-rc4, including fixes from mac80211, netfilter
and bpf.
 
 Current release - regressions:
 
  - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
    interrupt
 
  - mdio: revert mechanical patches which broke handling of optional
    resources
 
  - dev_addr_list: prevent address duplication
 
 Previous releases - regressions:
 
  - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
    (NULL deref)
 
  - Revert "mac80211: do not use low data rates for data frames with no
    ack flag", fixing broadcast transmissions
 
  - mac80211: fix use-after-free in CCMP/GCMP RX
 
  - netfilter: include zone id in tuple hash again, minimize collisions
 
  - netfilter: nf_tables: unlink table before deleting it (race -> UAF)
 
  - netfilter: log: work around missing softdep backend module
 
  - mptcp: don't return sockets in foreign netns
 
  - sched: flower: protect fl_walk() with rcu (race -> UAF)
 
  - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup
 
  - smsc95xx: fix stalled rx after link change
 
  - enetc: fix the incorrect clearing of IF_MODE bits
 
  - ipv4: fix rtnexthop len when RTA_FLOW is present
 
  - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU
 
  - e100: fix length calculation & buffer overrun in ethtool::get_regs
 
 Previous releases - always broken:
 
  - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx
 
  - mac80211: drop frames from invalid MAC address in ad-hoc mode
 
  - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
    (race -> UAF)
 
  - bpf, x86: Fix bpf mapping of atomic fetch implementation
 
  - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
 
  - netfilter: ip6_tables: zero-initialize fragment offset
 
  - mhi: fix error path in mhi_net_newlink
 
  - af_unix: return errno instead of NULL in unix_create1() when
    over the fs.file-max limit
 
 Misc:
 
  - bpf: exempt CAP_BPF from checks against bpf_jit_limit
 
  - netfilter: conntrack: make max chain length random, prevent guessing
    buckets by attackers
 
  - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
    generic, defer conntrack walk to work queue (prevent hogging RTNL lock)
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFV5KYACgkQMUZtbf5S
 Irs3cRAAqNsgaQXSVOXcPKsndeDDKHIv1Ktes8aOP9tgEXZw5rpsbct7g9Yxc0os
 Oolyt6HThjWr3u1/e3HHJO9I5Klr/J8eQReoRZnKW+6TYZflmmzfuf8u1nx6SLP/
 tliz5y8wKbp8BqqNuTMdRpm+R1QQcNkXTeruUoR1PgREcY4J7bC2BRrqeZhBGHSR
 Z5yPOIietFN3nITxNwbe4AYJXlesMc6QCWhBtXjMPGQ4Zc4/sjDNfqi7eHJi2H2y
 kW2dHeXG86gnlgFllOBBWP85ptxynyxoNQJuhrxgC9T+/FpSVST7cwKbtmkwDI3M
 5WGmeE6B3yfF8iOQuR8fbKQmsnLgQlYhjpbbhgN0GxzkyI7RpGYOFroX0Pht4IVZ
 mwprDOtvoLs4UeDjULRMB0JZfRN75PCtVlhfUkhhJxXGCCmnhGYaxG/pE+6OQWlr
 +n8RXYYMoOzPaHIYTS9NGSGqT0r32IUy/W5Yfv3rEzSeehy2/fxzGr2fOyBGs+q7
 xrnqpsOnM8cODDwGMy3TclCI4Dd72WoHNCHPhA/bk/ZMjHpBd4CSEZPm8IROY3Ja
 g1t68cncgL8fB7TSD9WLFgYu67Lg5j0gC/BHOzUQDQMX5IbhOq/fj1xRy5Lc6SYp
 mqW1f7LdnixBe4W61VjDAYq5jJRqrwEWedx+rvV/ONLvr77KULk=
 =rSti
 -----END PGP SIGNATURE-----

Merge tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from mac80211, netfilter and bpf.

  Current release - regressions:

   - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
     interrupt

   - mdio: revert mechanical patches which broke handling of optional
     resources

   - dev_addr_list: prevent address duplication

  Previous releases - regressions:

   - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
     (NULL deref)

   - Revert "mac80211: do not use low data rates for data frames with no
     ack flag", fixing broadcast transmissions

   - mac80211: fix use-after-free in CCMP/GCMP RX

   - netfilter: include zone id in tuple hash again, minimize collisions

   - netfilter: nf_tables: unlink table before deleting it (race -> UAF)

   - netfilter: log: work around missing softdep backend module

   - mptcp: don't return sockets in foreign netns

   - sched: flower: protect fl_walk() with rcu (race -> UAF)

   - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup

   - smsc95xx: fix stalled rx after link change

   - enetc: fix the incorrect clearing of IF_MODE bits

   - ipv4: fix rtnexthop len when RTA_FLOW is present

   - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
     SKU

   - e100: fix length calculation & buffer overrun in ethtool::get_regs

  Previous releases - always broken:

   - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx

   - mac80211: drop frames from invalid MAC address in ad-hoc mode

   - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
     -> UAF)

   - bpf, x86: Fix bpf mapping of atomic fetch implementation

   - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog

   - netfilter: ip6_tables: zero-initialize fragment offset

   - mhi: fix error path in mhi_net_newlink

   - af_unix: return errno instead of NULL in unix_create1() when over
     the fs.file-max limit

  Misc:

   - bpf: exempt CAP_BPF from checks against bpf_jit_limit

   - netfilter: conntrack: make max chain length random, prevent
     guessing buckets by attackers

   - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
     generic, defer conntrack walk to work queue (prevent hogging RTNL
     lock)"

* tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
  net: stmmac: fix EEE init issue when paired with EEE capable PHYs
  net: dev_addr_list: handle first address in __hw_addr_add_ex
  net: sched: flower: protect fl_walk() with rcu
  net: introduce and use lock_sock_fast_nested()
  net: phy: bcm7xxx: Fixed indirect MMD operations
  net: hns3: disable firmware compatible features when uninstall PF
  net: hns3: fix always enable rx vlan filter problem after selftest
  net: hns3: PF enable promisc for VF when mac table is overflow
  net: hns3: fix show wrong state when add existing uc mac address
  net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
  net: hns3: don't rollback when destroy mqprio fail
  net: hns3: remove tc enable checking
  net: hns3: do not allow call hns3_nic_net_open repeatedly
  ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
  net: bridge: mcast: Associate the seqcount with its protecting lock.
  net: mdio-ipq4019: Fix the error for an optional regs resource
  net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
  net: mdio: mscc-miim: Fix the mdio controller
  af_unix: Return errno instead of NULL in unix_create1().
  ...
2021-09-30 14:28:05 -07:00
Aya Levin
3bf1742f3c net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode
TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
conflicts with MQPRIO in channel mode, which specifies explicitly which
TC accepts the packet. This patch mutually excludes the above
configuration.

Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:57 -07:00
Lama Kayal
dd1979cf3c net/mlx5e: Fix the presented RQ index in PTP stats
PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.

Before applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq15_packets: 0
     ptp_rq8_bytes: 0
     ptp_rq6_csum_complete: 0
     ptp_rq14_csum_complete_tail: 0
     ptp_rq3_csum_complete_tail_slow : 0
     ptp_rq9_csum_unnecessary: 0
     ptp_rq1_csum_unnecessary_inner: 0
     ptp_rq7_csum_none: 0
     ptp_rq10_xdp_drop: 0
     ptp_rq9_xdp_redirect: 0
     ptp_rq13_lro_packets: 0
     ptp_rq12_lro_bytes: 0
     ptp_rq10_ecn_mark: 0
     ptp_rq9_removed_vlan_packets: 0
     ptp_rq5_wqe_err: 0
     ptp_rq8_mpwqe_filler_cqes: 0
     ptp_rq2_mpwqe_filler_strides: 0
     ptp_rq5_oversize_pkts_sw_drop: 0
     ptp_rq6_buff_alloc_err: 0
     ptp_rq15_cqe_compress_blks: 0
     ptp_rq2_cqe_compress_pkts: 0
     ptp_rq2_cache_reuse: 0
     ptp_rq12_cache_full: 0
     ptp_rq11_cache_empty: 256
     ptp_rq12_cache_busy: 0
     ptp_rq11_cache_waive: 0
     ptp_rq12_congst_umr: 0
     ptp_rq11_arfs_err: 0
     ptp_rq9_recover: 0

After applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq0_packets: 0
     ptp_rq0_bytes: 0
     ptp_rq0_csum_complete: 0
     ptp_rq0_csum_complete_tail: 0
     ptp_rq0_csum_complete_tail_slow : 0
     ptp_rq0_csum_unnecessary: 0
     ptp_rq0_csum_unnecessary_inner: 0
     ptp_rq0_csum_none: 0
     ptp_rq0_xdp_drop: 0
     ptp_rq0_xdp_redirect: 0
     ptp_rq0_lro_packets: 0
     ptp_rq0_lro_bytes: 0
     ptp_rq0_ecn_mark: 0
     ptp_rq0_removed_vlan_packets: 0
     ptp_rq0_wqe_err: 0
     ptp_rq0_mpwqe_filler_cqes: 0
     ptp_rq0_mpwqe_filler_strides: 0
     ptp_rq0_oversize_pkts_sw_drop: 0
     ptp_rq0_buff_alloc_err: 0
     ptp_rq0_cqe_compress_blks: 0
     ptp_rq0_cqe_compress_pkts: 0
     ptp_rq0_cache_reuse: 0
     ptp_rq0_cache_full: 0
     ptp_rq0_cache_empty: 256
     ptp_rq0_cache_busy: 0
     ptp_rq0_cache_waive: 0
     ptp_rq0_congst_umr: 0
     ptp_rq0_arfs_err: 0
     ptp_rq0_recover: 0

Fixes: a28359e922c6 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:57 -07:00
Shay Drory
f88c487634 net/mlx5: Fix setting number of EQs of SFs
When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.

Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Shay Drory
ac8b7d50ae net/mlx5: Fix length of irq_index in chars
The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.

Fixes: 3af26495a247 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Aya Levin
99b9a678b2 net/mlx5: Avoid generating event after PPS out in Real time mode
When in Real-time mode, HW clock is synced with the PTP daemon. Hence
driver should not re-calibrate the next pulse (via MTPPSE repetitive
events mechanism).

This patch arms repetitive events only in free-running mode.

Fixes: 432119de33d9 ("net/mlx5: Add cyc2time HW translation mode support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Aya Levin
6472829470 net/mlx5: Force round second at 1PPS out start time
Allow configuration of 1PPS start time only with time-stamp representing
a round second. Prior to this patch driver allowed setting of a
non-round-second which is not supported by the device. Avoid unexpected
behavior by restricting start-time configuration to a round-second.

Fixes: 4272f9b88db9 ("net/mlx5e: Change 1PPS out scheme")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Moshe Shemesh
a586775f83 net/mlx5: E-Switch, Fix double allocation of acl flow counter
Flow counter is allocated in eswitch legacy acl setting functions
without checking if already allocated by previous setting. Add a check
to avoid such double allocation.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: ea651a86d468 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Tariq Toukan
7dbc849b2a net/mlx5e: Improve MQPRIO resiliency
* Add netdev->tc_to_txq rollback in case of failure in
  mlx5e_update_netdev_queues().
* Fix broken transition between the two modes:
  MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
* Disable MQPRIO channel mode if re-attaching with a different number
  of channels.
* Improve code sharing.

Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Tariq Toukan
9d758d4a3a net/mlx5e: Keep the value for maximum number of channels in-sync
The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.

To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.

Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.

Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.

Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.

There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.

Fixes: a1f240f18017 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:54 -07:00
Raed Salem
f9a10440f0 net/mlx5e: IPSEC RX, enable checksum complete
Currently in Rx data path IPsec crypto offloaded packets uses
csum_none flag, so checksum is handled by the stack, this naturally
have some performance/cpu utilization impact on such flows. As Nvidia
NIC starting from ConnectX6DX provides checksum complete value out of
the box also for such flows there is no sense in taking csum_none path,
furthermore the stack (xfrm) have the method to handle checksum complete
corrections for such flows i.e. IPsec trailer removal and consequently
checksum value adjustment.

Because of the above and in addition the ConnectX6DX is the first HW
which supports IPsec crypto offload then it is safe to report csum
complete for IPsec offloaded traffic.

Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:54 -07:00