linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-09 06:43:09 +00:00

Author	SHA1	Message	Date
Vincent Mailhol	d9447f768b	can: etas_es58x: es58x_rx_err_msg(): fix memory leak in error path In es58x_rx_err_msg(), if can->do_set_mode() fails, the function directly returns without calling netif_rx(skb). This means that the skb previously allocated by alloc_can_err_skb() is not freed. In other terms, this is a memory leak. This patch simply removes the return statement in the error branch and let the function continue. Issue was found with GCC -fanalyzer, please follow the link below for details. Fixes: `8537257874` ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces") Link: https://lore.kernel.org/all/20211026180740.1953265-1-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-11-06 17:29:40 +01:00
Zhang Changzhong	164051a6ab	can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM The TP.CM_BAM message must be sent to the global address [1], so add a check to drop TP.CM_BAM sent to a non-global address. Without this patch, the receiver will treat the following packets as normal RTS/CTS transport: 18EC0102#20090002FF002301 18EB0102#0100000000000000 18EB0102#020000FFFFFFFFFF [1] SAE-J1939-82 2015 A.3.3 Row 1. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-11-06 17:29:32 +01:00
Zhang Changzhong	a79305e156	can: j1939: j1939_can_recv(): ignore messages with invalid source address According to SAE-J1939-82 2015 (A.3.6 Row 2), a receiver should never send TP.CM_CTS to the global address, so we can add a check in j1939_can_recv() to drop messages with invalid source address. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1635431907-15617-3-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-11-06 17:29:24 +01:00
Zhang Changzhong	c0f49d9800	can: j1939: j1939_tp_cmd_recv(): ignore abort message in the BAM transport This patch prevents BAM transport from being closed by receiving abort message, as specified in SAE-J1939-82 2015 (A.3.3 Row 4). Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1635431907-15617-2-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-11-06 17:29:14 +01:00
Nghia Le	70bf363d7a	ipv6: remove useless assignment to newinet in tcp_v6_syn_recv_sock() The newinet value is initialized with inet_sk() in a block code to handle sockets for the ETH_P_IP protocol. Along this code path, newinet is never read. Thus, assignment to newinet is needless and can be removed. Signed-off-by: Nghia Le <nghialm78@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211104143740.32446-1-nghialm78@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-05 19:49:40 -07:00
Jakub Kicinski	9bea6aa498	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2021-11-05 We've added 15 non-merge commits during the last 3 day(s) which contain a total of 14 files changed, 199 insertions(+), 90 deletions(-). The main changes are: 1) Fix regression from stack spill/fill of <8 byte scalars, from Martin KaFai Lau. 2) Fix perf's build of bpftool's bootstrap version due to missing libbpf headers, from Quentin Monnet. 3) Fix riscv{32,64} BPF exception tables build errors and warnings, from Björn Töpel. 4) Fix bpf fs to allow RENAME_EXCHANGE support for atomic upgrades on sk_lookup control planes, from Lorenz Bauer. 5) Fix libbpf's error reporting in bpf_map_lookup_and_delete_elem_flags() due to missing libbpf_err_errno(), from Mehrdad Arshad Rad. 6) Various fixes to make xdp_redirect_multi selftest more reliable, from Hangbin Liu. 7) Fix netcnt selftest to make it run serial and thus avoid conflicts with other cgroup/skb selftests run in parallel that could cause flakes, from Andrii Nakryiko. 8) Fix reuseport_bpf_numa networking selftest to skip unavailable NUMA nodes, from Kleber Sacilotto de Souza. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: riscv, bpf: Fix RV32 broken build, and silence RV64 warning selftests/bpf/xdp_redirect_multi: Limit the tests in netns selftests/bpf/xdp_redirect_multi: Give tcpdump a chance to terminate cleanly selftests/bpf/xdp_redirect_multi: Use arping to accurate the arp number selftests/bpf/xdp_redirect_multi: Put the logs to tmp folder libbpf: Fix lookup_and_delete_elem_flags error reporting bpftool: Install libbpf headers for the bootstrap version, too selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodes selftests/bpf: Verifier test on refill from a smaller spill bpf: Do not reject when the stack read size is different from the tracked scalar size selftests/bpf: Make netcnt selftests serial to avoid spurious failures selftests/bpf: Test RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs selftests/bpf: Convert test_bpffs to ASSERT macros libfs: Support RENAME_EXCHANGE in simple_rename() libfs: Move shmem_exchange to simple_rename_exchange ==================== Link: https://lore.kernel.org/r/20211105165803.29372-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-05 14:13:40 -07:00
Björn Töpel	f47d4ffe3a	riscv, bpf: Fix RV32 broken build, and silence RV64 warning Commit `252c765bd7` ("riscv, bpf: Add BPF exception tables") only addressed RV64, and broke the RV32 build [1]. Fix by gating the exception tables code with CONFIG_ARCH_RV64I. Further, silence a "-Wmissing-prototypes" warning [2] in the RV64 BPF JIT. [1] https://lore.kernel.org/llvm/202111020610.9oy9Rr0G-lkp@intel.com/ [2] https://lore.kernel.org/llvm/202110290334.2zdMyRq4-lkp@intel.com/ Fixes: `252c765bd7` ("riscv, bpf: Add BPF exception tables") Signed-off-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Tong Tiangen <tongtiangen@huawei.com> Link: https://lore.kernel.org/bpf/20211103115453.397209-1-bjorn@kernel.org	2021-11-05 16:52:34 +01:00
Hangbin Liu	8955c1a329	selftests/bpf/xdp_redirect_multi: Limit the tests in netns As I want to test both DEVMAP and DEVMAP_HASH in XDP multicast redirect, I limited DEVMAP max entries to a small value for performace. When the test runs after amount of interface creating/deleting tests. The interface index will exceed the map max entries and xdp_redirect_multi will error out with "Get interfacesInterface index to large". Fix this issue by limit the tests in netns and specify the ifindex when creating interfaces. Fixes: `d232924762` ("selftests/bpf: Add xdp_redirect_multi test") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-5-liuhangbin@gmail.com	2021-11-05 16:42:56 +01:00
Hangbin Liu	648c367706	selftests/bpf/xdp_redirect_multi: Give tcpdump a chance to terminate cleanly No need to kill tcpdump with -9. Fixes: `d232924762` ("selftests/bpf: Add xdp_redirect_multi test") Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-4-liuhangbin@gmail.com	2021-11-05 16:38:26 +01:00
Hangbin Liu	f53ea9dbf7	selftests/bpf/xdp_redirect_multi: Use arping to accurate the arp number The arp request number triggered by ping none exist address is not accurate, which may lead the test false negative/positive. Change to use arping to accurate the arp number. Also do not use grep pattern match for dot. Fixes: `d232924762` ("selftests/bpf: Add xdp_redirect_multi test") Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-3-liuhangbin@gmail.com	2021-11-05 16:38:02 +01:00
Hangbin Liu	8b4ac13abe	selftests/bpf/xdp_redirect_multi: Put the logs to tmp folder The xdp_redirect_multi test logs are created in selftest folder and not cleaned after test. Let's creat a tmp dir and remove the logs after testing. Fixes: `d232924762` ("selftests/bpf: Add xdp_redirect_multi test") Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-2-liuhangbin@gmail.com	2021-11-05 16:37:51 +01:00
Mehrdad Arshad Rad	64165ddf8e	libbpf: Fix lookup_and_delete_elem_flags error reporting Fix bpf_map_lookup_and_delete_elem_flags() to pass the return code through libbpf_err_errno() as we do similarly in bpf_map_lookup_and_delete_elem(). Fixes: `f12b654327` ("libbpf: Streamline error reporting for low-level APIs") Signed-off-by: Mehrdad Arshad Rad <arshad.rad@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211104171354.11072-1-arshad.rad@gmail.com	2021-11-05 16:26:08 +01:00
Quentin Monnet	e41ac2020b	bpftool: Install libbpf headers for the bootstrap version, too We recently changed bpftool's Makefile to make it install libbpf's headers locally instead of pulling them from the source directory of the library. Although bpftool needs two versions of libbpf, a "regular" one and a "bootstrap" version, we would only install headers for the regular libbpf build. Given that this build always occurs before the bootstrap build when building bpftool, this is enough to ensure that the bootstrap bpftool will have access to the headers exported through the regular libbpf build. However, this did not account for the case when we only want the bootstrap version of bpftool, through the "bootstrap" target. For example, perf needs the bootstrap version only, to generate BPF skeletons. In that case, when are the headers installed? For some time, the issue has been masked, because we had a step (the installation of headers internal to libbpf) which would depend on the regular build of libbpf and hence trigger the export of the headers, just for the sake of creating a directory. But this changed with commit `8b6c46241c` ("bpftool: Remove Makefile dep. on $(LIBBPF) for $(LIBBPF_INTERNAL_HDRS)"), where we cleaned up that stage and removed the dependency on the regular libbpf build. As a result, when we only want the bootstrap bpftool version, the regular libbpf is no longer built. The bootstrap libbpf version is built, but headers are not exported, and the bootstrap bpftool build fails because of the missing headers. To fix this, we also install the library headers for the bootstrap version of libbpf, to use them for the bootstrap bpftool and for generating the skeletons. Fixes: `f012ade10b` ("bpftool: Install libbpf headers instead of including the dir") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/bpf/20211105015813.6171-1-quentin@isovalent.com	2021-11-05 16:19:48 +01:00
Volodymyr Mytnyk	a46a5036e7	net: marvell: prestera: fix patchwork build problems fix the remaining build issues reported by patchwork in firmware v4.0 support commit which has been already merged. Fix patchwork issues: - source inline - checkpatch Fixes: `bb5dbf2cc6` ("net: marvell: prestera: add firmware v4.0 support") Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 14:26:22 +00:00
Zhang Mingyu	dce981c421	amt: remove duplicate include in amt.c 'net/protocol.h' included in 'drivers/net/amt.c' is duplicated. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 14:24:37 +00:00
Arnd Bergmann	a6785bd7d8	octeontx2-nicvf: fix ioctl callback The mii ioctls are now handled by the ndo_eth_ioctl() callback, not the old ndo_do_ioctl(), but octeontx2-nicvf introduced the function for the old way. Move it over to ndo_eth_ioctl() to actually allow calling it from user space. Fixes: `43510ef4dd` ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Fixes: `a76053707d` ("dev_ioctl: split out ndo_eth_ioctl") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 14:23:21 +00:00
Arnd Bergmann	9dcc00715a	ax88796c: fix ioctl callback The timestamp ioctls are now handled by the ndo_eth_ioctl() callback, not the old ndo_do_ioctl(), but oax88796 introduced the function for the old way. Move it over to ndo_eth_ioctl() to actually allow calling it from user space. Fixes: `a97c69ba4f` ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver") Fixes: `a76053707d` ("dev_ioctl: split out ndo_eth_ioctl") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Lukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 14:23:21 +00:00
Yang Li	3f81c57991	amt: Fix NULL but dereferenced coccicheck error Eliminate the following coccicheck warning: ./drivers/net/amt.c:2795:6-9: ERROR: amt is NULL but dereferenced. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:45:03 +00:00
Jakub Kicinski	6789a4c051	net: ax88796c: hide ax88796c_dt_ids if !CONFIG_OF Build bot says: >> drivers/net/ethernet/asix/ax88796c_main.c:1116:34: warning: unused variable 'ax88796c_dt_ids' [-Wunused-const-variable] static const struct of_device_id ax88796c_dt_ids[] = { ^ The only reference to this array is wrapped in of_match_ptr(). Reported-by: kernel test robot <lkp@intel.com> Fixes: `a97c69ba4f` ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:44:04 +00:00
Menglong Dong	69dfccbc11	net: udp: correct the document for udp_mem udp_mem is a vector of 3 INTEGERs, which is used to limit the number of pages allowed for queueing by all UDP sockets. However, sk_has_memory_pressure() in __sk_mem_raise_allocated() always return false for udp, as memory pressure is not supported by udp, which means that __sk_mem_raise_allocated() will fail once pages allocated for udp socket exceeds udp_mem[0]. Therefor, udp_mem[0] is the only one that limit the number of pages. However, the document of udp_mem just express that udp_mem[2] is the limitation. So, just fix it. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:42:46 +00:00
Xu Wang	827beb7781	net: ethernet: litex: Remove unnecessary print function dev_err() The print function dev_err() is redundant because platform_get_irq() already prints an error. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Reviewed-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:16:12 +00:00
Arnd Bergmann	9cbc336796	octeontx2-pf: select CONFIG_NET_DEVLINK The octeontx2 pf nic driver failsz to link when the devlink support is not reachable: aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_get': otx2_devlink.c:(.text+0x10): undefined reference to `devlink_priv' aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_validate': otx2_devlink.c:(.text+0x50): undefined reference to `devlink_priv' aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_set': otx2_devlink.c:(.text+0xd0): undefined reference to `devlink_priv' aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_devlink_info_get': otx2_devlink.c:(.text+0x150): undefined reference to `devlink_priv' This is already selected by the admin function driver, but not the actual nic, which might be built-in when the af driver is not. Fixes: `2da4894327` ("octeontx2-pf: devlink params support to set mcam entry count") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:14:38 +00:00
Yang Guang	f6a510102c	sfc: use swap() to make code cleaner Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:14:38 +00:00
Yang Guang	d7be1d1cfb	octeontx2-af: use swap() to make code cleaner Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:14:38 +00:00
luo penghao	0c500ef5d3	tg3: Remove redundant assignments The assignment of err will be overwritten next, so this statement should be deleted. The clang_analyzer complains as follows: drivers/net/ethernet/broadcom/tg3.c:5506:2: warning: Value stored to 'expected_sg_dig_ctrl' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:14:38 +00:00
Tony Lu	af1877b6ca	net/smc: Print function name in smcr_link_down tracepoint This makes the output of smcr_link_down tracepoint easier to use and understand without additional translating function's pointer address. It prints the function name with offset: <idle>-0 [000] ..s. 69.087164: smcr_link_down: lnk=00000000dab41cdc lgr=000000007d5d8e24 state=0 rc=1 dev=mlx5_0 location=smc_wr_tx_tasklet_fn+0x5ef/0x6f0 [smc] Link: https://lore.kernel.org/netdev/11f17a34-fd35-f2ec-3f20-dd0c34e55fde@linux.ibm.com/ Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:14:38 +00:00
Huang Guobin	b93c6a911a	bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed When I do fuzz test for bonding device interface, I got the following use-after-free Calltrace: ================================================================== BUG: KASAN: use-after-free in bond_enslave+0x1521/0x24f0 Read of size 8 at addr ffff88825bc11c00 by task ifenslave/7365 CPU: 5 PID: 7365 Comm: ifenslave Tainted: G E 5.15.0-rc1+ #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 Call Trace: dump_stack_lvl+0x6c/0x8b print_address_description.constprop.0+0x48/0x70 kasan_report.cold+0x82/0xdb __asan_load8+0x69/0x90 bond_enslave+0x1521/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f19159cf577 Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 78 RSP: 002b:00007ffeb3083c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffeb3084bca RCX: 00007f19159cf577 RDX: 00007ffeb3083ce0 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 00007ffeb3084bc4 R08: 0000000000000040 R09: 0000000000000000 R10: 00007ffeb3084bc0 R11: 0000000000000246 R12: 00007ffeb3083ce0 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeb3083cb0 Allocated by task 7365: kasan_save_stack+0x23/0x50 __kasan_kmalloc+0x83/0xa0 kmem_cache_alloc_trace+0x22e/0x470 bond_enslave+0x2e1/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 7365: kasan_save_stack+0x23/0x50 kasan_set_track+0x20/0x30 kasan_set_free_info+0x24/0x40 __kasan_slab_free+0xf2/0x130 kfree+0xd1/0x5c0 slave_kobj_release+0x61/0x90 kobject_put+0x102/0x180 bond_sysfs_slave_add+0x7a/0xa0 bond_enslave+0x11b6/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Last potentially related work creation: kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xb7/0xd0 insert_work+0x43/0x190 __queue_work+0x2e3/0x970 delayed_work_timer_fn+0x3e/0x50 call_timer_fn+0x148/0x470 run_timer_softirq+0x8a8/0xc50 __do_softirq+0x107/0x55f Second to last potentially related work creation: kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xb7/0xd0 insert_work+0x43/0x190 __queue_work+0x2e3/0x970 __queue_delayed_work+0x130/0x180 queue_delayed_work_on+0xa7/0xb0 bond_enslave+0xe25/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88825bc11c00 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 0 bytes inside of 1024-byte region [ffff88825bc11c00, ffff88825bc12000) The buggy address belongs to the page: page:ffffea00096f0400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25bc10 head:ffffea00096f0400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab\|head\|node=1\|zone=2\|lastcpupid=0x7ff) raw: 057ff00000010200 ffffea0009a71c08 ffff888240001968 ffff88810004dbc0 raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88825bc11b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88825bc11b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88825bc11c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88825bc11c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88825bc11d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Put new_slave in bond_sysfs_slave_add() will cause use-after-free problems when new_slave is accessed in the subsequent error handling process. Since new_slave will be put in the subsequent error handling process, remove the unnecessary put to fix it. In addition, when sysfs_create_file() fails, if some files have been crea- ted successfully, we need to call sysfs_remove_file() to remove them. Since there are sysfs_create_files() & sysfs_remove_files() can be used, use these two functions instead. Fixes: `7afcaec496` (bonding: use kobject_put instead of _del after kobject_add) Signed-off-by: Huang Guobin <huangguobin4@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-05 10:14:38 +00:00
Jakub Kicinski	436014e860	Merge branch 'mctp-sockaddr-padding-check-initialisation-fixup' Eugene Syromiatnikov says: ==================== MCTP sockaddr padding check/initialisation fixup This pair of patches introduces checks for padding fields of struct sockaddr_mctp/sockaddr_mctp_ext to ease their re-use for possible extensions in the future; as well as zeroing of these fields in the respective sockaddr filling routines. While the first commit is definitely an ABI breakage, it is proposed in hopes that the change is made soon enough (the interface appeared only in Linux 5.15) to avoid affecting any existing user space. ==================== Link: https://lore.kernel.org/r/cover.1635965993.git.esyr@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-04 19:17:49 -07:00
Eugene Syromiatnikov	e9ea574ec1	mctp: handle the struct sockaddr_mctp_ext padding field struct sockaddr_mctp_ext.__smctp_paddin0 has to be checked for being set to zero, otherwise it cannot be utilised in the future. Fixes: `99ce45d5e7` ("mctp: Implement extended addressing") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-04 19:17:48 -07:00
Eugene Syromiatnikov	1e4b50f06d	mctp: handle the struct sockaddr_mctp padding fields In order to have the padding fields actually usable in the future, there have to be checks that user space doesn't supply non-zero garbage there. It is also worth setting these padding fields to zero, unless it is known that they have been already zeroed. Cc: stable@vger.kernel.org # v5.15 Fixes: `5a20dd46b8` ("mctp: Be explicit about struct sockaddr_mctp padding") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-04 19:17:34 -07:00
Jakub Kicinski	a5bda90884	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-11-03 Brett fixes issues with promiscuous mode settings not being properly enabled and removes setting of VF antispoof along with promiscuous mode. He also ensures that VF Tx queues are always disabled and resolves a race between virtchnl handling and VF related ndo ops. Sylwester fixes an issue where a VF MAC could not be set to its primary MAC if the address is already present. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix race conditions between virtchnl handling and VF ndo ops ice: Fix not stopping Tx queues for VFs ice: Fix replacing VF hardware MAC to existing MAC filter ice: Remove toggling of antispoof for VF trusted promiscuous mode ice: Fix VF true promiscuous mode ==================== Link: https://lore.kernel.org/r/20211103161935.2997369-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-04 16:51:50 -07:00
Heiner Kallweit	a4db9055fd	net: phy: fix duplex out of sync problem while changing settings As reported by Zhang there's a small issue if in forced mode the duplex mode changes with the link staying up [0]. In this case the MAC isn't notified about the change. The proposed patch relies on the phylib state machine and ignores the fact that there are drivers that uses phylib but not the phylib state machine. So let's don't change the behavior for such drivers and fix it w/o re-adding state PHY_FORCING for the case that phylib state machine is used. [0] https://lore.kernel.org/netdev/a5c26ffd-4ee4-a5e6-4103-873208ce0dc5@huawei.com/T/ Fixes: `2bd229df5e` ("net: phy: remove state PHY_FORCING") Reported-by: Zhang Changzhong <zhangchangzhong@huawei.com> Tested-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/7b8b9456-a93f-abbc-1dc5-a2c2542f932c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-04 16:46:29 -07:00
Guo Zhengkui	96d0c9be43	devlink: fix flexible_array.cocci warning Fix following coccicheck warning: ./net/core/devlink.c:69:6-10: WARNING use flexible-array member instead Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Link: https://lore.kernel.org/r/20211103121607.27490-1-guozhengkui@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-04 16:43:55 -07:00
Kleber Sacilotto de Souza	a38bc45a08	selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodes In some platforms the numa node numbers are not necessarily consecutive, meaning that not all nodes from 0 to the value returned by numa_max_node() are available on the system. Using node numbers which are not available results on errors from libnuma such as: ---- IPv4 UDP ---- send node 0, receive socket 0 libnuma: Warning: Cannot read node cpumask from sysfs ./reuseport_bpf_numa: failed to pin to node: No such file or directory Fix it by checking if the node number bit is set on numa_nodes_ptr, which is defined on libnuma as "Set with all nodes the kernel has exposed to userspace". Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211101145317.286118-1-kleber.souza@canonical.com	2021-11-04 16:21:13 +01:00
Eric Dumazet	d00c8ee317	net: fix possible NULL deref in sock_reserve_memory Sanity check in sock_reserve_memory() was not enough to prevent malicious user to trigger a NULL deref. In this case, the isse is that sk_prot->memory_allocated is NULL. Use standard sk_has_account() helper to deal with this. BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline] BUG: KASAN: null-ptr-deref in atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline] BUG: KASAN: null-ptr-deref in sk_memory_allocated_add include/net/sock.h:1371 [inline] BUG: KASAN: null-ptr-deref in sock_reserve_memory net/core/sock.c:994 [inline] BUG: KASAN: null-ptr-deref in sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443 Write of size 8 at addr 0000000000000000 by task syz-executor.0/11270 CPU: 1 PID: 11270 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 __kasan_report mm/kasan/report.c:446 [inline] kasan_report.cold+0x66/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:101 [inline] atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline] sk_memory_allocated_add include/net/sock.h:1371 [inline] sock_reserve_memory net/core/sock.c:994 [inline] sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443 __sys_setsockopt+0x4f8/0x610 net/socket.c:2172 __do_sys_setsockopt net/socket.c:2187 [inline] __se_sys_setsockopt net/socket.c:2184 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2184 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f56076d5ae9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5604c4b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007f56077e8f60 RCX: 00007f56076d5ae9 RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 00007f560772ff25 R08: 000000000000fec7 R09: 0000000000000000 R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffb61a100f R14: 00007f5604c4b300 R15: 0000000000022000 </TASK> Fixes: `2bb2f5fb21` ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-04 11:28:04 +00:00
Leonard Crestez	3b65abb8d8	tcp: Use BIT() for OPTION_* constants Extending these flags using the existing (1 << x) pattern triggers complaints from checkpatch. Instead of ignoring checkpatch modify the existing values to use BIT(x) style in a separate commit. Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-04 11:26:15 +00:00
Andrea Righi	a985442fde	selftests: net: properly support IPv6 in GSO GRE test Explicitly pass -6 to netcat when the test is using IPv6 to prevent failures. Also make sure to pass "-N" to netcat to close the socket after EOF on the client side, otherwise we would always hit the timeout and the test would fail. Without this fix applied: TEST: GREv6/v4 - copy file w/ TSO [FAIL] TEST: GREv6/v4 - copy file w/ GSO [FAIL] TEST: GREv6/v6 - copy file w/ TSO [FAIL] TEST: GREv6/v6 - copy file w/ GSO [FAIL] With this fix applied: TEST: GREv6/v4 - copy file w/ TSO [ OK ] TEST: GREv6/v4 - copy file w/ GSO [ OK ] TEST: GREv6/v6 - copy file w/ TSO [ OK ] TEST: GREv6/v6 - copy file w/ GSO [ OK ] Fixes: `025efa0a82` ("selftests: add simple GSO GRE test") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-04 11:23:57 +00:00
Brett Creeley	e6ba5273d4	ice: Fix race conditions between virtchnl handling and VF ndo ops The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: `7c710869d6` ("ice: Add handlers for VF netdevice operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-11-03 08:16:32 -07:00
Brett Creeley	b385cca473	ice: Fix not stopping Tx queues for VFs When a VF is removed and/or reset its Tx queues need to be stopped from the PF. This is done by calling the ice_dis_vf_qs() function, which calls ice_vsi_stop_lan_tx_rings(). Currently ice_dis_vf_qs() is protected by the VF state bit ICE_VF_STATE_QS_ENA. Unfortunately, this is causing the Tx queues to not be disabled in some cases and when the VF tries to re-enable/reconfigure its Tx queues over virtchnl the op is failing. This is because a VF can be reset and/or removed before the ICE_VF_STATE_QS_ENA bit is set, but the Tx queues were already configured via ice_vsi_cfg_single_txq() in the VIRTCHNL_OP_CONFIG_VSI_QUEUES op. However, the ICE_VF_STATE_QS_ENA bit is set on a successful VIRTCHNL_OP_ENABLE_QUEUES, which will always happen after the VIRTCHNL_OP_CONFIG_VSI_QUEUES op. This was causing the following error message when loading the ice driver, creating VFs, and modifying VF trust in an endless loop: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by always calling ice_dis_vf_qs() and silencing the error message in ice_vsi_stop_tx_ring() since the calling code ignores the return anyway. Also, all other places that call ice_vsi_stop_tx_ring() catch the error, so this doesn't affect those flows since there was no change to the values the function returns. Other solutions were considered (i.e. tracking which VF queues had been "started/configured" in VIRTCHNL_OP_CONFIG_VSI_QUEUES, but it seemed more complicated than it was worth. This solution also brings in the chance for other unexpected conditions due to invalid state bit checks. So, the proposed solution seemed like the best option since there is no harm in failing to stop Tx queues that were never started. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: `77ca27c417` ("ice: add support for virtchnl_queue_select.[tx\|rx]_queues bitmap") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-11-03 08:16:08 -07:00
Sylwester Dziedziuch	ce572a5b88	ice: Fix replacing VF hardware MAC to existing MAC filter VF was not able to change its hardware MAC address in case the new address was already present in the MAC filter list. Change the handling of VF add mac request to not return if requested MAC address is already present on the list and check if its hardware MAC needs to be updated in this case. Fixes: `ed4c068d46` ("ice: Enable ip link show on the PF to display VF unicast MAC(s)") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-11-03 08:16:08 -07:00
Brett Creeley	0299faeaf8	ice: Remove toggling of antispoof for VF trusted promiscuous mode Currently when a trusted VF enables promiscuous mode spoofchk will be disabled. This is wrong and should only be modified from the ndo_set_vf_spoofchk callback. Fix this by removing the call to toggle spoofchk for trusted VFs. Fixes: `01b5e89aab` ("ice: Add VF promiscuous support") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-11-03 08:16:06 -07:00
Brett Creeley	1a8c7778bc	ice: Fix VF true promiscuous mode When a VF requests promiscuous mode and it's trusted and true promiscuous mode is enabled the PF driver attempts to enable unicast and/or multicast promiscuous mode filters based on the request. This is fine, but there are a couple issues with the current code. [1] The define to configure the unicast promiscuous mode mask also includes bits to configure the multicast promiscuous mode mask, which causes multicast to be set/cleared unintentionally. [2] All 4 cases for enable/disable unicast/multicast mode are not handled in the promiscuous mode message handler, which causes unexpected results regarding the current promiscuous mode settings. To fix [1] make sure any promiscuous mask defines include the correct bits for each of the promiscuous modes. To fix [2] make sure that all 4 cases are handled since there are 2 bits (FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be either set or cleared. Also, since either unicast and/or multicast promiscuous configuration can fail, introduce two separate error values to handle each of these cases. Fixes: `01b5e89aab` ("ice: Add VF promiscuous support") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-11-03 08:15:27 -07:00
Martin KaFai Lau	c08455dec5	selftests/bpf: Verifier test on refill from a smaller spill This patch adds a verifier test to ensure the verifier can read 8 bytes from the stack after two 32bit write at fp-4 and fp-8. The test is similar to the reported case from bcc [0]. [0] https://github.com/iovisor/bcc/pull/3683 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211102064541.316414-1-kafai@fb.com	2021-11-03 15:55:43 +01:00
Martin KaFai Lau	f30d4968e9	bpf: Do not reject when the stack read size is different from the tracked scalar size Below is a simplified case from a report in bcc [0]: r4 = 20 (u32 )(r10 -4) = r4 (u32 )(r10 -8) = r4 /* r4 state is tracked / r4 = (u64 )(r10 -8) / Read more than the tracked 32bit scalar. * verifier rejects as 'corrupted spill memory'. */ After commit `354e8f1970` ("bpf: Support <8-byte scalar spill and refill"), the 8-byte aligned 32bit spill is also tracked by the verifier and the register state is stored. However, if 8 bytes are read from the stack instead of the tracked 4 byte scalar, then verifier currently rejects the program as "corrupted spill memory". This patch fixes this case by allowing it to read but marks the register as unknown. Also note that, if the prog is trying to corrupt/leak an earlier spilled pointer by spilling another <8 bytes register on top, this has already been rejected in the check_stack_write_fixed_off(). [0] https://github.com/iovisor/bcc/pull/3683 Fixes: `354e8f1970` ("bpf: Support <8-byte scalar spill and refill") Reported-by: Hengqi Chen <hengqi.chen@gmail.com> Reported-by: Yonghong Song <yhs@gmail.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211102064535.316018-1-kafai@fb.com	2021-11-03 15:46:46 +01:00
Andrii Nakryiko	401a33da3a	selftests/bpf: Make netcnt selftests serial to avoid spurious failures When running `./test_progs -j` test_netcnt fails with a very high probability, undercounting number of packets received (9999 vs expected 10000). It seems to be conflicting with other cgroup/skb selftests. So make it serial for now to make parallel mode more robust. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211103054113.2130582-1-andrii@kernel.org	2021-11-03 15:43:09 +01:00
Lorenz Bauer	7e5ad817ec	selftests/bpf: Test RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs Add tests to exercise the behaviour of RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs. The former checks that after an exchange the inode of two directories has changed. The latter checks that the source still exists after a failed rename. Generally, having support for renameat2(RENAME_EXCHANGE) in bpffs fixes atomic upgrades of our sk_lookup control plane. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211028094724.59043-5-lmb@cloudflare.com	2021-11-03 15:43:08 +01:00
Lorenz Bauer	9fc23c22e5	selftests/bpf: Convert test_bpffs to ASSERT macros Remove usage of deprecated CHECK macros. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211028094724.59043-4-lmb@cloudflare.com	2021-11-03 15:43:08 +01:00
Lorenz Bauer	3871cb8cf7	libfs: Support RENAME_EXCHANGE in simple_rename() Allow atomic exchange via RENAME_EXCHANGE when using simple_rename. This affects binderfs, ramfs, hubetlbfs and bpffs. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20211028094724.59043-3-lmb@cloudflare.com	2021-11-03 15:43:08 +01:00
Lorenz Bauer	6429e46304	libfs: Move shmem_exchange to simple_rename_exchange Move shmem_exchange and make it available to other callers. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20211028094724.59043-2-lmb@cloudflare.com	2021-11-03 15:43:00 +01:00
Vladimir Oltean	92f62485b3	net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit `c0bcf53766` ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: `ea440cd2d9` ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-03 14:22:00 +00:00

1 2 3 4 5 ...

1049620 Commits