41802 Commits

Author SHA1 Message Date
David Vernet
fb2211a57c bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs
Now that we're not invoking kfunc destructors when the kptr in a map was
NULL, we no longer require NULL checks in many of our KF_RELEASE kfuncs.
This patch removes those NULL checks.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 16:56:22 -07:00
David Vernet
1431d0b584 bpf: Only invoke kptr dtor following non-NULL xchg
When a map value is being freed, we loop over all of the fields of the
corresponding BPF object and issue the appropriate cleanup calls
corresponding to the field's type. If the field is a referenced kptr, we
atomically xchg the value out of the map, and invoke the kptr's
destructor on whatever was there before (or bpf_obj_drop() it if it was
a local kptr).

Currently, we always invoke the destructor (either bpf_obj_drop() or the
kptr's registered destructor) on any KPTR_REF-type field in a map, even
if there wasn't a value in the map. This means that any function serving
as the kptr's KF_RELEASE destructor must always treat the argument as
possibly NULL, as the following can and regularly does happen:

void *xchgd_field;

/* No value was in the map, so xchgd_field is NULL */
xchgd_field = (void *)xchg(unsigned long *field_ptr, 0);
field->kptr.dtor(xchgd_field);

These are odd semantics to impose on KF_RELEASE kfuncs -- BPF programs
are prohibited by the verifier from passing NULL pointers to KF_RELEASE
kfuncs, so it doesn't make sense to require this of BPF programs, but
not the main kernel destructor path. It's also unnecessary to invoke any
cleanup logic for local kptrs. If there is no object there, there's
nothing to drop.

So as to allow KF_RELEASE kfuncs to fully assume that an argument is
non-NULL, this patch updates a KPTR_REF's destructor to only be invoked
when a non-NULL value is xchg'd out of the kptr map field.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 16:56:22 -07:00
Linus Torvalds
f768b35a23 Fixes for 6.3-rc3:
* Fix a race in the percpu counters summation code where the summation
    failed to add in the values for any CPUs that were dying but not yet
    dead.  This fixes some minor discrepancies and incorrect assertions
    when running generic/650.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZBdAbgAKCRBKO3ySh0YR
 pkltAQCs4QO5LjYReqjUxd4cSsLtNnNon09qswRsl2GuRyI36AEAxI9QMq4Q6D9V
 ZasNbiTCkV3KPKfmp6gf1mQNLk1lGQ0=
 =Bz3q
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs percpu counter fixes from Darrick Wong:
 "We discovered a filesystem summary counter corruption problem that was
  traced to cpu hot-remove racing with the call to percpu_counter_sum
  that sets the free block count in the superblock when writing it to
  disk. The root cause is that percpu_counter_sum doesn't cull from
  dying cpus and hence misses those counter values if the cpu shutdown
  hooks have not yet run to merge the values.

  I'm hoping this is a fairly painless fix to the problem, since the
  dying cpu mask should generally be empty. It's been in for-next for a
  week without any complaints from the bots.

   - Fix a race in the percpu counters summation code where the
     summation failed to add in the values for any CPUs that were dying
     but not yet dead. This fixes some minor discrepancies and incorrect
     assertions when running generic/650"

* tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  pcpcntr: remove percpu_counter_sum_all()
  fork: remove use of percpu_counter_sum_all
  pcpcntrs: fix dying cpu summation race
  cpumask: introduce for_each_cpu_or
2023-03-25 12:57:34 -07:00
Kees Cook
b69edab47f kheaders: Use array declaration instead of char
Under CONFIG_FORTIFY_SOURCE, memcpy() will check the size of destination
and source buffers. Defining kernel_headers_data as "char" would trip
this check. Since these addresses are treated as byte arrays, define
them as arrays (as done everywhere else).

This was seen with:

  $ cat /sys/kernel/kheaders.tar.xz >> /dev/null

  detected buffer overflow in memcpy
  kernel BUG at lib/string_helpers.c:1027!
  ...
  RIP: 0010:fortify_panic+0xf/0x20
  [...]
  Call Trace:
   <TASK>
   ikheaders_read+0x45/0x50 [kheaders]
   kernfs_fop_read_iter+0x1a4/0x2f0
  ...

Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20230302112130.6e402a98@kernel.org/
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 43d8ce9d65a5 ("Provide in-kernel headers to make extending kernel easier")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230302224946.never.243-kees@kernel.org
2023-03-24 20:10:59 -07:00
Linus Torvalds
65aca32efd 21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder are
for other subsystems.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZB48xAAKCRDdBJ7gKXxA
 js2rAP4zvcMn90vBJhWNElsA7pBgDYD66QCK6JBDHGe3J1qdeQEA8D606pjMBWkL
 ly7NifwCjOtFhfDRgEHOXu8g8g1k1QM=
 =Cswg
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder
  are for other subsystems"

* tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  mm: mmap: remove newline at the end of the trace
  mailmap: add entries for Richard Leitner
  kcsan: avoid passing -g for test
  kfence: avoid passing -g for test
  mm: kfence: fix using kfence_metadata without initialization in show_object()
  lib: dhry: fix unstable smp_processor_id(_) usage
  mailmap: add entry for Enric Balletbo i Serra
  mailmap: map Sai Prakash Ranjan's old address to his current one
  mailmap: map Rajendra Nayak's old address to his current one
  Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
  mailmap: add entry for Tobias Klauser
  kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
  mm/ksm: fix race with VMA iteration and mm_struct teardown
  kselftest: vm: fix unused variable warning
  mm: fix error handling for map_deny_write_exec
  mm: deduplicate error handling for map_deny_write_exec
  checksyscalls: ignore fstat to silence build warning on LoongArch
  nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
  test_maple_tree: add more testing for mas_empty_area()
  maple_tree: fix mas_skip_node() end slot detection
  ...
2023-03-24 18:06:11 -07:00
Martin KaFai Lau
55fbae0547 bpf: Check IS_ERR for the bpf_map_get() return value
This patch fixes a mistake in checking NULL instead of
checking IS_ERR for the bpf_map_get() return value.

It also fixes the return value in link_update_map() from -EINVAL
to PTR_ERR(*_map).

Reported-by: syzbot+71ccc0fe37abb458406b@syzkaller.appspotmail.com
Fixes: 68b04864ca42 ("bpf: Create links for BPF struct_ops maps.")
Fixes: aef56f2e918b ("bpf: Update the struct_ops of a bpf_link.")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230324184241.1387437-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-24 12:40:47 -07:00
Jim Cromie
33c951f629 module: already_uses() - reduce pr_debug output volume
already_uses() is unnecessarily chatty.

`modprobe i915` yields 491 messages like:

  [   64.108744] i915 uses drm!

This is a normal situation, and isn't worth all the log entries.

NOTE: I've preserved the "does not use %s" messages, which happens
less often, but does happen.  Its not clear to me what it tells a
reader, or what info might improve the pr_debug's utility.

[ 6847.584999] main:already_uses:569: amdgpu does not use ttm!
[ 6847.585001] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585014] main:already_uses:569: amdgpu does not use drm!
[ 6847.585016] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585024] main:already_uses:569: amdgpu does not use drm_display_helper!
[ 6847.585025] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585084] main:already_uses:569: amdgpu does not use drm_kms_helper!
[ 6847.585086] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585175] main:already_uses:569: amdgpu does not use drm_buddy!
[ 6847.585176] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585202] main:already_uses:569: amdgpu does not use i2c_algo_bit!
[ 6847.585204] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585249] main:already_uses:569: amdgpu does not use gpu_sched!
[ 6847.585250] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585314] main:already_uses:569: amdgpu does not use video!
[ 6847.585315] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585409] main:already_uses:569: amdgpu does not use iommu_v2!
[ 6847.585410] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6847.585816] main:already_uses:569: amdgpu does not use drm_ttm_helper!
[ 6847.585818] main:add_module_usage:584: Allocating new usage for amdgpu.
[ 6848.762268] dyndbg: add-module: amdgpu.2533 sites

no functional changes.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:09 -07:00
Jim Cromie
66a2301edf module: add section-size to move_module pr_debug
move_module() pr_debug's "Final section addresses for $modname".
Add section addresses to the message, for anyone looking at these.

no functional changes.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:09 -07:00
Jim Cromie
b10addf37b module: add symbol-name to pr_debug Absolute symbol
The pr_debug("Absolute symbol" ..) reports value, (which is usually
0), but not the name, which is more informative.  So add it.

no functional changes

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:09 -07:00
Jim Cromie
6ed81802d4 module: in layout_sections, move_module: add the modname
layout_sections() and move_module() each issue ~50 messages for each
module loaded.  Add mod-name into their 2 header lines, to help the
reader find his module.

no functional changes.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:09 -07:00
Luis Chamberlain
25be451aa4 module: fold usermode helper kmod into modules directory
The kernel/kmod.c is already only built if we enabled modules, so
just stuff it under kernel/module/kmod.c and unify the MAINTAINERS
file for it.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
3d40bb903e module: merge remnants of setup_load_info() to elf validation
The setup_load_info() was actually had ELF validation checks of its
own. To later cache useful variables as an secondary step just means
looping again over the ELF sections we just validated. We can simply
keep tabs of the key sections of interest as we validate the module
ELF section in one swoop, so do that and merge the two routines
together.

Expand a bit on the documentation / intent / goals.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
1bb49db991 module: move more elf validity checks to elf_validity_check()
The symbol and strings section validation currently happen in
setup_load_info() but since they are also doing validity checks
move this to elf_validity_check().

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
c7ee8aebf6 module: add stop-grap sanity check on module memcpy()
The integrity of the struct module we load is important, and although
our ELF validator already checks that the module section must match
struct module, add a stop-gap check before we memcpy() the final minted
module. This also makes those inspecting the code what the goal is.

While at it, clarify the goal behind updating the sh_addr address.
The current comment is pretty misleading.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
46752820f9 module: add sanity check for ELF module section
The ELF ".gnu.linkonce.this_module" section is special, it is what we
use to construct the struct module __this_module, which THIS_MODULE
points to. When userspace loads a module we always deal first with a
copy of the userspace buffer, and twiddle with the userspace copy's
version of the struct module. Eventually we allocate memory to do a
memcpy() of that struct module, under the assumption that the module
size is right. But we have no validity checks against the size or
the requirements for the section.

Add some validity checks for the special module section early and while
at it, cache the module section index early, so we don't have to do that
later.

While at it, just move over the assigment of the info->mod to make the
code clearer. The validity checker also adds an explicit size check to
ensure the module section size matches the kernel's run time size for
sizeof(struct module). This should prevent sloppy loads of modules
which are built today *without* actually increasing the size of
the struct module. A developer today can for example expand the size
of struct module, rebuild a directoroy 'make fs/xfs/' for example and
then try to insmode the driver there. That module would in effect have
an incorrect size. This new size check would put a stop gap against such
mistakes.

This also makes the entire goal of ".gnu.linkonce.this_module" pretty
clear. Before this patch verification of the goal / intent required some
Indian Jones whips, torches and cleaning up big old spider webs.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
419e1a20f7 module: rename check_module_license_and_versions() to check_export_symbol_versions()
This makes the routine easier to understand what the check its checking for.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
72f08b3cc6 module: converge taint work together
Converge on a compromise: so long as we have a module hit our linked
list of modules we taint. That is, the module was about to become live.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
c3bbf62ebf module: move signature taint to module_augment_kernel_taints()
Just move the signature taint into the helper:

  module_augment_kernel_taints()

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
a12b94511c module: move tainting until after a module hits our linked list
It is silly to have taints spread out all over, we can just compromise
and add them if the module ever hit our linked list. Our sanity checkers
should just prevent crappy drivers / bogus ELF modules / etc and kconfig
options should be enough to let you *not* load things you don't want.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
437c1f9cc6 module: split taint adding with info checking
check_modinfo() actually does two things:

 a) sanity checks, some of which are fatal, and so we
    prevent the user from completing trying to load a module
 b) taints the kernel

The taints are pretty heavy handed because we're tainting the kernel
*before* we ever even get to load the module into the modules linked
list. That is, it it can fail for other reasons later as we review the
module's structure.

But this commit makes no functional changes, it just makes the intent
clearer and splits the code up where needed to make that happen.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
ed52cabecb module: split taint work out of check_modinfo_livepatch()
The work to taint the kernel due to a module should be split
up eventually. To aid with this, split up the tainting on
check_modinfo_livepatch().

This let's us bring more early checks together which do return
a value, and makes changes easier to read later where we stuff
all the work to do the taints in one single routine.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
ad8d3a36e9 module: rename set_license() to module_license_taint_check()
The set_license() routine would seem to a reader to do some sort of
setting, but it does not. It just adds a taint if the license is
not set or proprietary.

This makes what the code is doing clearer, so much we can remove
the comment about it.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:08 -07:00
Luis Chamberlain
02da2cbab4 module: move check_modinfo() early to early_mod_check()
This moves check_modinfo() to early_mod_check(). This
doesn't make any functional changes either, as check_modinfo()
was the first call on layout_and_allocate(), so we're just
moving it back one routine and at the end.

This let's us keep separate the checkers from the allocator.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:33:06 -07:00
Luis Chamberlain
85e6f61c13 module: move early sanity checks into a helper
Move early sanity checkers for the module into a helper.
This let's us make it clear when we are working with the
local copy of the module prior to allocation.

This produces no functional changes, it just makes subsequent
changes easier to read.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:31:35 -07:00
Luis Chamberlain
1e68417235 module: add a for_each_modinfo_entry()
Add a for_each_modinfo_entry() to make it easier to read and use.
This produces no functional changes but makes this code easiert
to read as we are used to with loops in the kernel and trims more
lines of code.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:05:15 -07:00
Luis Chamberlain
feb5b784a2 module: rename next_string() to module_next_tag_pair()
This makes it clearer what it is doing. While at it,
make it available to other code other than main.c.
This will be used in the subsequent patch and make
the changes easier to read.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:05:15 -07:00
Luis Chamberlain
b66973b82d module: move get_modinfo() helpers all above
Instead of forward declaring routines for get_modinfo() just move
everything up. This makes no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-24 11:05:15 -07:00
Jakub Kicinski
dc0a7b5200 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  6e9d51b1a5cb ("net/mlx5e: Initialize link speed to zero")
  1bffcea42926 ("net/mlx5e: Add devlink hairpin queues parameters")
https://lore.kernel.org/all/20230324120623.4ebbc66f@canb.auug.org.au/
https://lore.kernel.org/all/20230321211135.47711-1-saeed@kernel.org/

Adjacent changes:

drivers/net/phy/phy.c
  323fe43cf9ae ("net: phy: Improved PHY error reporting in state machine")
  4203d84032e2 ("net: phy: Ensure state transitions are processed from phy_stop()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-24 10:10:20 -07:00
Thomas Weißschuh
9f33a88c0a kernel/ksysfs.c: use sysfs_emit for sysfs show handlers
sysfs_emit() is the recommended way to format strings for sysfs as per
Documentation/filesystems/sysfs.rst.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230324-ksysfs-sysfs_emit-v1-1-67c03cddc8a6@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-24 17:09:14 +01:00
Linus Torvalds
608f1b1366 Including fixes from bpf, wifi and bluetooth.
Current release - regressions:
 
  - wifi: mt76: mt7915: add back 160MHz channel width support for MT7915
 
  - libbpf: revert poisoning of strlcpy, it broke uClibc-ng
 
 Current release - new code bugs:
 
  - bpf: improve the coverage of the "allow reads from uninit stack"
    feature to fix verification complexity problems
 
  - eth: am65-cpts: reset PPS genf adj settings on enable
 
 Previous releases - regressions:
 
  - wifi: mac80211: serialize ieee80211_handle_wake_tx_queue()
 
  - wifi: mt76: do not run mt76_unregister_device() on unregistered hw,
    fix null-deref
 
  - Bluetooth: btqcomsmd: fix command timeout after setting BD address
 
  - eth: igb: revert rtnl_lock() that causes a deadlock
 
  - dsa: mscc: ocelot: fix device specific statistics
 
 Previous releases - always broken:
 
  - xsk: add missing overflow check in xdp_umem_reg()
 
  - wifi: mac80211:
    - fix QoS on mesh interfaces
    - fix mesh path discovery based on unicast packets
 
  - Bluetooth:
    - ISO: fix timestamped HCI ISO data packet parsing
    - remove "Power-on" check from Mesh feature
 
  - usbnet: more fixes to drivers trusting packet length
 
  - wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
 
  - Bluetooth: btintel: iterate only bluetooth device ACPI entries
 
  - eth: iavf: fix inverted Rx hash condition leading to disabled hash
 
  - eth: igc: fix the validation logic for taprio's gate list
 
  - dsa: tag_brcm: legacy: fix daisy-chained switches
 
 Misc:
 
  - bpf: adjust insufficient default bpf_jit_limit to account for
    growth of BPF use over the last 5 years
 
  - xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating
    no driver support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQc4vkACgkQMUZtbf5S
 IruG/w//XixBtdFMHE0/fcGv77jTovlJNiDYeaa+KtyjvIseieYwOKW5F31r3xvl
 Mf/YhNEjAc++V8Zna/1UM5i/WOj1PJdHgSC+wMUGUXjMF+MfzL57nM83CllOpUB5
 Z9YtUqGfolf2Vtx03wnV14qawmVnJWYKHn3AU11cueE5dUu6KNyBTCefQ7uzgcJN
 zMtHAxw96MRQIDxSfKvZsePk4FnQ4qoSOLkslji5iikcMnKePaqZaxQla2oTcEIR
 zue9V+ILmi62Y8mPcdT4ePpZQsjB39bpemh+9EL6l03/cjsjqmuiCw/d1+6g9kuy
 ZD5LgZzUOb6xalhSseiwJL+vj8x2gQhshEfoHQvgp7fzr6agta6sisRX611wtmJl
 hv4k2PMRqFrMv2S+8m8XC177bXIaGbiWh4vBFOWjf4u0lG55cGlzclbXWWQ80njy
 C5cE4V7qPRk8Cl/+uT10CLNQx6JmaX8kcddtFrYpu0PZHKx1WfUYKIpgkiiMPRKT
 njLkDQbFRa8Y3p7UX0wU1TbeuMzzLz+aTBrFEN864IJmbnUnWimeluQzD60WbkSx
 6dciqq11LtvYDsR1HZ1pb7IoHYuDsDrO2Rx4zuqsB/SyfrGdRKJoKOnYvsk+AdCL
 N/e4wivie8s6b+G3yL6p+IdlpEaVo2ZiLINp7JSW8jhW1hRcZUI=
 =XBLi
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, wifi and bluetooth.

  Current release - regressions:

   - wifi: mt76: mt7915: add back 160MHz channel width support for
     MT7915

   - libbpf: revert poisoning of strlcpy, it broke uClibc-ng

  Current release - new code bugs:

   - bpf: improve the coverage of the "allow reads from uninit stack"
     feature to fix verification complexity problems

   - eth: am65-cpts: reset PPS genf adj settings on enable

  Previous releases - regressions:

   - wifi: mac80211: serialize ieee80211_handle_wake_tx_queue()

   - wifi: mt76: do not run mt76_unregister_device() on unregistered hw,
     fix null-deref

   - Bluetooth: btqcomsmd: fix command timeout after setting BD address

   - eth: igb: revert rtnl_lock() that causes a deadlock

   - dsa: mscc: ocelot: fix device specific statistics

  Previous releases - always broken:

   - xsk: add missing overflow check in xdp_umem_reg()

   - wifi: mac80211:
      - fix QoS on mesh interfaces
      - fix mesh path discovery based on unicast packets

   - Bluetooth:
      - ISO: fix timestamped HCI ISO data packet parsing
      - remove "Power-on" check from Mesh feature

   - usbnet: more fixes to drivers trusting packet length

   - wifi: iwlwifi: mvm: fix mvmtxq->stopped handling

   - Bluetooth: btintel: iterate only bluetooth device ACPI entries

   - eth: iavf: fix inverted Rx hash condition leading to disabled hash

   - eth: igc: fix the validation logic for taprio's gate list

   - dsa: tag_brcm: legacy: fix daisy-chained switches

  Misc:

   - bpf: adjust insufficient default bpf_jit_limit to account for
     growth of BPF use over the last 5 years

   - xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating
     no driver support"

* tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
  Bluetooth: HCI: Fix global-out-of-bounds
  Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
  Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
  Bluetooth: L2CAP: Fix responding with wrong PDU type
  Bluetooth: btqcomsmd: Fix command timeout after setting BD address
  Bluetooth: btinel: Check ACPI handle for NULL before accessing
  net: mdio: thunder: Add missing fwnode_handle_put()
  net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
  net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
  net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
  net: asix: fix modprobe "sysfs: cannot create duplicate filename"
  gve: Cache link_speed value from device
  tools: ynl: Fix genlmsg header encoding formats
  net: enetc: fix aggregate RMON counters not showing the ranges
  Bluetooth: Remove "Power-on" check from Mesh feature
  Bluetooth: Fix race condition in hci_cmd_sync_clear
  Bluetooth: btintel: Iterate only bluetooth device ACPI entries
  Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
  Bluetooth: btusb: Remove detection of ISO packets over bulk
  Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
  ...
2023-03-24 08:48:12 -07:00
Peter Zijlstra
5c3124975e trace,smp: Trace all smp_function_call*() invocations
(Ab)use the trace_ipi_send_cpu*() family to trace all
smp_function_call*() invocations, not only those that result in an
actual IPI.

The queued entries log their callback function while the actual IPIs
are traced on generic_smp_call_function_single_interrupt().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2023-03-24 11:01:30 +01:00
Peter Zijlstra
68e2d17c9e trace: Add trace_ipi_send_cpu()
Because copying cpumasks around when targeting a single CPU is a bit
daft...

Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230322103004.GA571242%40hirez.programming.kicks-ass.net
2023-03-24 11:01:29 +01:00
Valentin Schneider
68f4ff04db sched, smp: Trace smp callback causing an IPI
Context
=======

The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Changes
=======

send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).

Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.

Do the same thing for the multi IPI case.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230307143558.294354-8-vschneid@redhat.com
2023-03-24 11:01:29 +01:00
Valentin Schneider
253a0fb4c6 smp: reword smp call IPI comment
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea01 ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230307143558.294354-7-vschneid@redhat.com
2023-03-24 11:01:28 +01:00
Valentin Schneider
4468161a5c irq_work: Trace self-IPIs sent via arch_irq_work_raise()
IPIs sent to remote CPUs via irq_work_queue_on() are now covered by
trace_ipi_send_cpumask(), add another instance of the tracepoint to cover
self-IPIs.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307143558.294354-5-vschneid@redhat.com
2023-03-24 11:01:27 +01:00
Valentin Schneider
08407b5f61 smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307143558.294354-4-vschneid@redhat.com
2023-03-24 11:01:27 +01:00
Valentin Schneider
cc9cb0a717 sched, smp: Trace IPIs sent via send_call_function_single_ipi()
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230307143558.294354-3-vschneid@redhat.com
2023-03-24 11:01:27 +01:00
Paul E. McKenney
203e435844 kernel/smp: Make csdlock_debug= resettable
It is currently possible to set the csdlock_debug_enabled static
branch, but not to reset it.  This is an issue when several different
entities supply kernel boot parameters and also for kernels built with
CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT=y.

Therefore, make the csdlock_debug=0 kernel boot parameter turn off
debugging.  Last one wins!

Reported-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-4-paulmck@kernel.org
2023-03-24 11:01:26 +01:00
Paul E. McKenney
6366d062e7 locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
The diagnostics added by this commit were extremely useful in one instance:

a5aabace5fb8 ("locking/csd_lock: Add more data to CSD lock debugging")

However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.

Therefore, manually revert the following commit preparatory commit:

de7b09ef658d ("locking/csd_lock: Prepare more CSD lock debugging")

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-3-paulmck@kernel.org
2023-03-24 11:01:26 +01:00
Paul E. McKenney
1771257cb4 locking/csd_lock: Remove added data from CSD lock debugging
The diagnostics added by this commit were extremely useful in one instance:

a5aabace5fb8 ("locking/csd_lock: Add more data to CSD lock debugging")

However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.

Therefore, manually revert this commit, but leave a comment telling
people where to find these diagnostics.

[ paulmck: Apply Juergen Gross feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org
2023-03-24 11:01:25 +01:00
Paul E. McKenney
c521986016 locking/csd_lock: Add Kconfig option for csd_debug default
The csd_debug kernel parameter works well, but is inconvenient in cases
where it is more closely associated with boot loaders or automation than
with a particular kernel version or release.  Thererfore, provide a new
CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
1 when selected and 0 otherwise, with this latter being the default.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
2023-03-24 11:01:25 +01:00
Haifeng Xu
8e4645226b cpuset: Clean up cpuset_node_allowed
Commit 002f290627c2 ("cpuset: use static key better and convert to new API")
has used __cpuset_node_allowed() instead of cpuset_node_allowed() to check
whether we can allocate on a memory node. Now this function isn't used by
anyone, so we can do the follow things to clean up it.

1. remove unused codes
2. rename __cpuset_node_allowed() to cpuset_node_allowed()
3. update comments in mm/page_alloc.c

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-23 16:02:27 -10:00
Jungseung Lee
704bc669e1 workqueue: Introduce show_freezable_workqueues
Currently show_all_workqueue is called if freeze fails at the time of
freeze the workqueues, which shows the status of all workqueues and of
all worker pools. In this cases we may only need to dump state of only
workqueues that are freezable and busy.

This patch defines show_freezable_workqueues, which uses
show_one_workqueue, a granular function that shows the state of individual
workqueues, so that dump only the state of freezable workqueues
at that time.

tj: Minor message adjustment.

Signed-off-by: Jungseung Lee <js07.lee@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-23 15:55:38 -10:00
Marco Elver
5eb39cde1e kcsan: avoid passing -g for test
Nathan reported that when building with GNU as and a version of clang that
defaults to DWARF5, the assembler will complain with:

  Error: non-constant .uleb128 is not supported

This is because `-g` defaults to the compiler debug info default. If the
assembler does not support some of the directives used, the above errors
occur. To fix, remove the explicit passing of `-g`.

All the test wants is that stack traces print valid function names, and
debug info is not required for that. (I currently cannot recall why I
added the explicit `-g`.)

Link: https://lkml.kernel.org/r/20230316224705.709984-2-elver@google.com
Fixes: 1fe84fd4a402 ("kcsan: Add test suite")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-23 17:18:35 -07:00
Jakub Kicinski
1b4ae19e43 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZBzSGQAKCRDbK58LschI
 g+dhAP95enbrlwaQ+9aoqrU+GqCq+uo4SkaqnUtq6GSvRNiVBQD8C6iZxrAjyXnm
 1wRr3JN/HszPBzgjl3HvDc9y69I/PAI=
 =8JwR
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-03-23

We've added 8 non-merge commits during the last 13 day(s) which contain
a total of 21 files changed, 238 insertions(+), 161 deletions(-).

The main changes are:

1) Fix verification issues in some BPF programs due to their stack usage
   patterns, from Eduard Zingerman.

2) Fix to add missing overflow checks in xdp_umem_reg and return an error
   in such case, from Kal Conley.

3) Fix and undo poisoning of strlcpy in libbpf given it broke builds for
   libcs which provided the former like uClibc-ng, from Jesus Sanchez-Palencia.

4) Fix insufficient bpf_jit_limit default to avoid users running into hard
   to debug seccomp BPF errors, from Daniel Borkmann.

5) Fix driver return code when they don't support a bpf_xdp_metadata kfunc
   to make it unambiguous from other errors, from Jesper Dangaard Brouer.

6) Two BPF selftest fixes to address compilation errors from recent changes
   in kernel structures, from Alexei Starovoitov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
  bpf: Adjust insufficient default bpf_jit_limit
  xsk: Add missing overflow check in xdp_umem_reg
  selftests/bpf: Fix progs/test_deny_namespace.c issues.
  selftests/bpf: Fix progs/find_vma_fail1.c build error.
  libbpf: Revert poisoning of strlcpy
  selftests/bpf: Tests for uninitialized stack reads
  bpf: Allow reads from uninit stack
====================

Link: https://lore.kernel.org/r/20230323225221.6082-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-23 16:03:33 -07:00
Mike Christie
e297cd54b3
vhost_task: Allow vhost layer to use copy_process
Qemu will create vhost devices in the kernel which perform network, SCSI,
etc IO and management operations from worker threads created by the
kthread API. Because the kthread API does a copy_process on the kthreadd
thread, the vhost layer has to use kthread_use_mm to access the Qemu
thread's memory and cgroup_attach_task_all to add itself to the Qemu
thread's cgroups, and it bypasses the RLIMIT_NPROC limit which can result
in VMs creating more threads than the admin expected.

This patch adds a new struct vhost_task which can be used instead of
kthreads. They allow the vhost layer to use copy_process and inherit
the userspace process's mm and cgroups, the task is accounted for
under the userspace's nproc count and can be seen in its process tree,
and other features like namespaces work and are inherited by default.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-03-23 12:45:36 +01:00
Kui-Feng Lee
aef56f2e91 bpf: Update the struct_ops of a bpf_link.
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
to conveniently switch between different struct_ops on a single
bpf_link. This would enable smoother transitions from one struct_ops
to another.

The struct_ops maps passing along with BPF_LINK_UPDATE should have the
BPF_F_LINK flag.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22 22:53:02 -07:00
Kui-Feng Lee
68b04864ca bpf: Create links for BPF struct_ops maps.
Make bpf_link support struct_ops.  Previously, struct_ops were always
used alone without any associated links. Upon updating its value, a
struct_ops would be activated automatically. Yet other BPF program
types required to make a bpf_link with their instances before they
could become active. Now, however, you can create an inactive
struct_ops, and create a link to activate it later.

With bpf_links, struct_ops has a behavior similar to other BPF program
types. You can pin/unpin them from their links and the struct_ops will
be deactivated when its link is removed while previously need someone
to delete the value for it to be deactivated.

bpf_links are responsible for registering their associated
struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
set to create a bpf_link, while a structs without this flag behaves in
the same manner as before and is registered upon updating its value.

The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
used to craft the links for BPF struct_ops programs, but also to
create links for BPF struct_ops them-self.  Since the links of BPF
struct_ops programs are only used to create trampolines internally,
they are never seen in other contexts. Thus, they can be reused for
struct_ops themself.

To maintain a reference to the map supporting this link, we add
bpf_struct_ops_link as an additional type. The pointer of the map is
RCU and won't be necessary until later in the patchset.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22 22:53:02 -07:00
Kui-Feng Lee
b671c2067a bpf: Retire the struct_ops map kvalue->refcnt.
We have replaced kvalue-refcnt with synchronize_rcu() to wait for an
RCU grace period.

Maintenance of kvalue->refcnt was a complicated task, as we had to
simultaneously keep track of two reference counts: one for the
reference count of bpf_map. When the kvalue->refcnt reaches zero, we
also have to reduce the reference count on bpf_map - yet these steps
are not performed in an atomic manner and require us to be vigilant
when managing them. By eliminating kvalue->refcnt, we can make our
maintenance more straightforward as the refcount of bpf_map is now
solely managed!

To prevent the trampoline image of a struct_ops from being released
while it is still in use, we wait for an RCU grace period. The
setsockopt(TCP_CONGESTION, "...") command allows you to change your
socket's congestion control algorithm and can result in releasing the
old struct_ops implementation. It is fine. However, this function is
exposed through bpf_setsockopt(), it may be accessed by BPF programs
as well. To ensure that the trampoline image belonging to struct_op
can be safely called while its method is in use, the trampoline
safeguarde the BPF program with rcu_read_lock(). Doing so prevents any
destruction of the associated images before returning from a
trampoline and requires us to wait for an RCU grace period.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-2-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22 22:51:47 -07:00
Andrii Nakryiko
b63cbc490e bpf: remember meta->iter info only for initialized iters
For iter_new() functions iterator state's slot might not be yet
initialized, in which case iter_get_spi() will return -ERANGE. This is
expected and is handled properly. But for iter_next() and iter_destroy()
cases iter slot is supposed to be initialized and correct, so -ERANGE is
not possible.

Move meta->iter.{spi,frameno} initialization into iter_next/iter_destroy
handling branch to make it more explicit that valid information will be
remembered in meta->iter block for subsequent use in process_iter_next_call(),
avoiding confusingly looking -ERANGE assignment for meta->iter.spi.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230322232502.836171-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22 17:04:47 -07:00