that helps the debugging of IRQ affinity logic bugs, and fix a memory leak.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8nEn8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ipYA/+KOWjDuRp1YBZeZ4/55RjGzimsW5jkLIY
0Na3WGjN/QBKCzmRJNnMyW1UjRgpHpBhOsphTcHVdhJo9jg5+DX+XdVTwKGTqAI+
7DqzP4dzifSgUwdcxIbKwtZquBRzKk1K0Z25b6Jc0WJwkGRx3LWhhRDERPUEHtXg
Sl07XxiuqFLcQZz9o3hisKzEfA2llB4bfXOjLCJlLK3HUZKccoBjWKbTrI3ymCiz
f0iV9a7kNzo4fJNddKOBTtDWFEhpj6NgEVtLNdAaDti7MSSjPbB1BsiK64UInGMQ
4881ItYAOHGuCHe8yYnjlWA5kmwX14KjN6c3RAXK3n4+wvf+17RJC+FLH1PbkFIx
hZ8k9x2Y5Dpt8vD8fGkoqi2nr2JYbIiOm79AjrD+Li+wWKG3iw4AGEoBHBJzKHUb
naEGiUDJpn7pdpPWMACoctAIhy7/gDA1pPyb5F7Bf/RwoskIyu4i/d/xz05zBg3H
HZMC2Lqcgh7LTS91NnmCx8XELdgL14mN19LK5enH3QTIPtdxmZ5x4quKw6ajMAAQ
jwRpExqy6E1TQkIG5T5hjT0EMuj4uA6OzaoeOroFzKuzo+jiEDl49WAx+9Im9oBb
i7hT4PM/wR7BcfmTMVhmmns4Dp0LkW7dRxHIjo7Fzft5iF8UkO7o7A4VUoAIxrSm
xDFlBO/mo3w=
=oKU9
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2020-08-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Fix a recent IRQ affinities regression, add in a missing debugfs
printout that helps the debugging of IRQ affinity logic bugs, and fix
a memory leak"
* tag 'irq-urgent-2020-08-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/debugfs: Add missing irqchip flags
genirq/affinity: Make affinity setting if activated opt-in
irqdomain/treewide: Free firmware node after domain removal
- Removal of the tremendously unpopular read_barrier_depends() barrier,
which is a NOP on all architectures apart from Alpha, in favour of
allowing architectures to override READ_ONCE() and do whatever dance
they need to do to ensure address dependencies provide LOAD ->
LOAD/STORE ordering. This work also offers a potential solution if
compilers are shown to convert LOAD -> LOAD address dependencies into
control dependencies (e.g. under LTO), as weakly ordered architectures
will effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic, augment
the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the device
ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions and
kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl8oTcsACgkQa9axLQDI
XvEj6hAAkn39mO5xrR/Vhpg3DyFPk63ZlMSX9SsOeVyaLbovT6stTs1XAZXPpnkt
rV3gwACyGSrqH6+uey9pHgHJuPF2TdrGEVK08yVKo9KGW/6yXSIncdKFE4jUJ/WJ
wF5j7eMET2aGzcpm5AlzMmq6HOrKB8nZac9H8/x6H+Ox2WdgJkEjOkDvyqACUyum
N3FsTZkWj2pIkTXHNgDZ8KjxVLO8HlFaB2hkxFDl9NPlX2UTCQJ8Tg1KiPLafKaK
gUvH4usQDFdb5RU/UWogre37J4emO0ZTApZOyju+U+PMMWlWVHjZ4isUIS9zz/AE
JNZ23dnKZX2HrYa5p8HZx175zwj/vXUqUHCZPLvQXaAudCEhF8BVljPiG0e80FV5
GHFUgUbylKspp01I/9L+2JvsG96Mr0e+P3Sx7L2HTI42cmtoSa14+MpoSRj7zlft
Qcl8hfrVOjCjUnFRHa/1y1cGvnD9GbgnKJR7zgVxl9bD/Jd48r1HUtwRORZCzWFr
mRPVbPS72fWxMzMV9DZYJm02jJY9kLX2BMl49njbB8MhAhzOvrMVzoVVtMMeRFLR
XHeJpmg36W09FiRGe7LRXlkXIhCQzQG2bJfiphuupCfhjRAitPoq8I925G6Pig60
c8RWaXGU7PrEsdMNrL83vekvGKgqrkoFkRVtsCoQ2X6Hvu/XdYI=
=mh79
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 and cross-arch updates from Catalin Marinas:
"Here's a slightly wider-spread set of updates for 5.9.
Going outside the usual arch/arm64/ area is the removal of
read_barrier_depends() series from Will and the MSI/IOMMU ID
translation series from Lorenzo.
The notable arm64 updates include ARMv8.4 TLBI range operations and
translation level hint, time namespace support, and perf.
Summary:
- Removal of the tremendously unpopular read_barrier_depends()
barrier, which is a NOP on all architectures apart from Alpha, in
favour of allowing architectures to override READ_ONCE() and do
whatever dance they need to do to ensure address dependencies
provide LOAD -> LOAD/STORE ordering.
This work also offers a potential solution if compilers are shown
to convert LOAD -> LOAD address dependencies into control
dependencies (e.g. under LTO), as weakly ordered architectures will
effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at
LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic,
augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the
device ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions
and kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words"
Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAl8eiUATHHJwcHRAbGlu
dXguaWJtLmNvbQAKCRA5A4Ymyw79kWs8B/4wEWVJGTkjyrMX57/Ew8yRYAJE6JjA
kSONPjElVrPR1pRLYyjyde+zqumkJFhk+41De09J2byL29p7tK8ISNrTwJrIN7n/
dzT73CmuNEjI0rZJxPX+USKFph75FQVvAVOOWs+6fiBFxdUaIsBheVntH7/NsCTk
HFrIjIn5wXFVs5Nh+2cHydvEpOVoUWzjvs+uJIEpHCVCBz6gaYq2dxEmeTquKuz1
k7PZqCqVsyB9iWLqN65/Q+30N8znJwcUl8HAzs5nvPrXLjGxwuEjOxtYYhbdLAfP
OBiIF9J77sZxBlms0WNomDW3Rr5Vlt5nF9oUWpi3AmHNWIuX0GkM4i0C
=V+Kl
-----END PGP SIGNATURE-----
Merge tag 'rm-unicore32' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux
Pull unicore32 removal from Mike Rapoport:
"Remove unicore32 support.
The unicore32 port do not seem maintained for a long time now, there
is no upstream toolchain that can create unicore32 binaries and all
the links to prebuilt toolchains for unicore32 are dead. Even
compilers that were available are not supported by the kernel anymore.
Guenter Roeck says:
"I have stopped building unicore32 images since v4.19 since there is
no available compiler that is still supported by the kernel. I am
surprised that support for it has not been removed from the kernel"
However, it's worth pointing out two things:
- Guan Xuetao is still listed as maintainer and asked for the port to
be kept around the last time Arnd suggested removing it two years
ago. He promised that there would be compiler sources (presumably
llvm), but has not made those available since.
- https://github.com/gxt has patches to linux-4.9 and qemu-2.7, both
released in 2016, with patches dated early 2019. These patches
mainly restore a syscall ABI that was never part of mainline Linux
but apparently used in production. qemu-2.8 removed support for
that ABI and newer kernels (4.19+) can no longer be built with the
old toolchain, so apparently there will not be any future updates
to that git tree"
* tag 'rm-unicore32' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux:
MAINTAINERS: remove "PKUNITY SOC DRIVERS" entry
rtc: remove fb-puv3 driver
video: fbdev: remove fb-puv3 driver
pwm: remove pwm-puv3 driver
input: i8042: remove support for 8042-unicore32io
i2c/buses: remove i2c-puv3 driver
cpufreq: remove unicore32 driver
arch: remove unicore32 port
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8m7YwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpt+dEAC7a0HYuX2OrkyawBnsgd1QQR/soC7surec
yDDa7SMM8cOq3935bfzcYHV9FWJszEGIknchiGb9R3/T+vmSohbvDsM5zgwya9u/
FHUIuTq324I6JWXKl30k4rwjiX9wQeMt+WZ5gC8KJYCWA296i2IpJwd0A45aaKuS
x4bTjxqknE+fD4gQiMUSt+bmuOUAp81fEku3EPapCRYDPAj8f5uoY7R2arT/POwB
b+s+AtXqzBymIqx1z0sZ/XcdZKmDuhdurGCWu7BfJFIzw5kQ2Qe3W8rUmrQ3pGut
8a21YfilhUFiBv+B4wptfrzJuzU6Ps0BXHCnBsQjzvXwq5uFcZH495mM/4E4OJvh
SbjL2K4iFj+O1ngFkukG/F8tdEM1zKBYy2ZEkGoWKUpyQanbAaGI6QKKJA+DCdBi
yPEb7yRAa5KfLqMiocm1qCEO1I56HRiNHaJVMqCPOZxLmpXj19Fs71yIRplP1Trv
GGXdWZsccjuY6OljoXWdEfnxAr5zBsO3Yf2yFT95AD+egtGsU1oOzlqAaU1mtflw
ABo452pvh6FFpxGXqz6oK4VqY4Et7WgXOiljA4yIGoPpG/08L1Yle4eVc2EE01Jb
+BL49xNJVeUhGFrvUjPGl9kVMeLmubPFbmgrtipW+VRg9W8+Yirw7DPP6K+gbPAR
RzAUdZFbWw==
=abJG
-----END PGP SIGNATURE-----
Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.
- Softirq completion cleanups (Christoph)
- Stop using ->queuedata (Christoph)
- Cleanup bd claiming (Christoph)
- Use check_events, moving away from the legacy media change
(Christoph)
- Use inode i_blkbits consistently (Christoph)
- Remove old unused writeback congestion bits (Christoph)
- Cleanup/unify submission path (Christoph)
- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)
- sbitmap cleared bits handling (John)
- Request merging blktrace event addition (Jan)
- sysfs add/remove race fixes (Luis)
- blk-mq tag fixes/optimizations (Ming)
- Duplicate words in comments (Randy)
- Flush deferral cleanup (Yufen)
- IO context locking/retry fixes (John)
- struct_size() usage (Gustavo)
- blk-iocost fixes (Chengming)
- blk-cgroup IO stats fixes (Boris)
- Various little fixes"
* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...
Pull crypto updates from Herbert Xu:
"API:
- Add support for allocating transforms on a specific NUMA Node
- Introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY for storage users
Algorithms:
- Drop PMULL based ghash on arm64
- Fixes for building with clang on x86
- Add sha256 helper that does the digest in one go
- Add SP800-56A rev 3 validation checks to dh
Drivers:
- Permit users to specify NUMA node in hisilicon/zip
- Add support for i.MX6 in imx-rngc
- Add sa2ul crypto driver
- Add BA431 hwrng driver
- Add Ingenic JZ4780 and X1000 hwrng driver
- Spread IRQ affinity in inside-secure and marvell/cesa"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (157 commits)
crypto: sa2ul - Fix inconsistent IS_ERR and PTR_ERR
hwrng: core - remove redundant initialization of variable ret
crypto: x86/curve25519 - Remove unused carry variables
crypto: ingenic - Add hardware RNG for Ingenic JZ4780 and X1000
dt-bindings: RNG: Add Ingenic RNG bindings.
crypto: caam/qi2 - add module alias
crypto: caam - add more RNG hw error codes
crypto: caam/jr - remove incorrect reference to caam_jr_register()
crypto: caam - silence .setkey in case of bad key length
crypto: caam/qi2 - create ahash shared descriptors only once
crypto: caam/qi2 - fix error reporting for caam_hash_alloc
crypto: caam - remove deadcode on 32-bit platforms
crypto: ccp - use generic power management
crypto: xts - Replace memcpy() invocation with simple assignment
crypto: marvell/cesa - irq balance
crypto: inside-secure - irq balance
crypto: ecc - SP800-56A rev 3 local public key validation
crypto: dh - SP800-56A rev 3 local public key validation
crypto: dh - check validity of Z before export
lib/mpi: Add mpi_sub_ui()
...
Pull networking fixes from David Miller:
1) Encap offset calculation is incorrect in esp6, from Sabrina Dubroca.
2) Better parameter validation in pfkey_dump(), from Mark Salyzyn.
3) Fix several clang issues on powerpc in selftests, from Tanner Love.
4) cmsghdr_from_user_compat_to_kern() uses the wrong length, from Al
Viro.
5) Out of bounds access in mlx5e driver, from Raed Salem.
6) Fix transfer buffer memleak in lan78xx, from Johan Havold.
7) RCU fixups in rhashtable, from Herbert Xu.
8) Fix ipv6 nexthop refcnt leak, from Xiyu Yang.
9) vxlan FDB dump must be done under RCU, from Ido Schimmel.
10) Fix use after free in mlxsw, from Ido Schimmel.
11) Fix map leak in HASH_OF_MAPS bpf code, from Andrii Nakryiko.
12) Fix bug in mac80211 Tx ack status reporting, from Vasanthakumar
Thiagarajan.
13) Fix memory leaks in IPV6_ADDRFORM code, from Cong Wang.
14) Fix bpf program reference count leaks in mlx5 during
mlx5e_alloc_rq(), from Xin Xiong.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
vxlan: fix memleak of fdb
rds: Prevent kernel-infoleak in rds_notify_queue_get()
net/sched: The error lable position is corrected in ct_init_module
net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq
net/mlx5e: E-Switch, Specify flow_source for rule with no in_port
net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring
net/mlx5e: CT: Support restore ipv6 tunnel
net: gemini: Fix missing clk_disable_unprepare() in error path of gemini_ethernet_port_probe()
ionic: unlock queue mutex in error path
atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
net: ethernet: mtk_eth_soc: fix MTU warnings
net: nixge: fix potential memory leak in nixge_probe()
devlink: ignore -EOPNOTSUPP errors on dumpit
rxrpc: Fix race between recvmsg and sendmsg on immediate call failure
MAINTAINERS: Replace Thor Thayer as Altera Triple Speed Ethernet maintainer
selftests/bpf: fix netdevsim trap_flow_action_cookie read
ipv6: fix memory leaks on IPV6_ADDRFORM path
net/bpfilter: Initialize pos in __bpfilter_process_sockopt
igb: reinit_locked() should be called with rtnl_lock
e1000e: continue to init PHY even when failed to disable ULP
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXyXDTQAKCRCRxhvAZXjc
olxlAQDCiyWstd8pmtyX4vuaoyDZ6re6P/TCr3mzr6tQyux/zgD/chlfAvJdyzk8
2Tw44odp3gF5EfzF+5wx2whZZPfVrQY=
=Hv2c
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fix from Christian Brauner:
"A simple spelling fix for dequeue_synchronous_signal()"
* tag 'for-linus-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
signal: fix typo in dequeue_synchronous_signal()
Daniel Borkmann says:
====================
pull-request: bpf 2020-07-31
The following pull-request contains BPF updates for your *net* tree.
We've added 5 non-merge commits during the last 21 day(s) which contain
a total of 5 files changed, 126 insertions(+), 18 deletions(-).
The main changes are:
1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko.
2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no
btf_vmlinux is available, from Peilin Ye.
3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig.
4) Fix a cgroup sockopt verifier test by specifying expected attach type,
from Jean-Philippe Brucker.
Note that when net gets merged into net-next later on, there is a small
merge conflict in kernel/bpf/btf.c between commit 5b801dfb7feb ("bpf: Fix
NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree
and commit 138b9a0511c7 ("bpf: Remove btf_id helpers resolving") from the
net-next tree.
Resolve as follows: remove the old hunk with the __btf_resolve_helper_id()
function. Change the btf_resolve_helper_id() so it actually tests for a
NULL btf_vmlinux and bails out:
int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int arg)
{
int id;
if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux)
return -EINVAL;
id = fn->btf_id[arg];
if (!id || id > btf_vmlinux->nr_types)
return -EINVAL;
return id;
}
Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in
the loop with regards to merge conflict resolution).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* for-next/misc:
: Miscellaneous fixes and cleanups
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.
arm64: Reserve HWCAP2_MTE as (1 << 18)
arm64/entry: deduplicate SW PAN entry/exit routines
arm64: s/AMEVTYPE/AMEVTYPER
arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs
arm64: stacktrace: Move export for save_stack_trace_tsk()
smccc: Make constants available to assembly
arm64/mm: Redefine CONT_{PTE, PMD}_SHIFT
arm64/defconfig: Enable CONFIG_KEXEC_FILE
arm64: Document sysctls for emulated deprecated instructions
arm64/panic: Unify all three existing notifier blocks
arm64/module: Optimize module load time by optimizing PLT counting
* for-next/vmcoreinfo:
: Export the virtual and physical address sizes in vmcoreinfo
arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
* for-next/cpufeature:
: CPU feature handling cleanups
arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]
arm64/cpufeature: Replace all open bits shift encodings with macros
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register
* for-next/acpi:
: ACPI updates for arm64
arm64/acpi: disallow writeable AML opregion mapping for EFI code regions
arm64/acpi: disallow AML memory opregions to access kernel memory
* for-next/perf:
: perf updates for arm64
arm64: perf: Expose some new events via sysfs
tools headers UAPI: Update tools's copy of linux/perf_event.h
arm64: perf: Add cap_user_time_short
perf: Add perf_event_mmap_page::cap_user_time_short ABI
arm64: perf: Only advertise cap_user_time for arch_timer
arm64: perf: Implement correct cap_user_time
time/sched_clock: Use raw_read_seqcount_latch()
sched_clock: Expose struct clock_read_data
arm64: perf: Correct the event index in sysfs
perf/smmuv3: To simplify code for ioremap page in pmcg
* for-next/timens:
: Time namespace support for arm64
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
arm64/vdso: Add time namespace page
arm64/vdso: Zap vvar pages when switching to a time namespace
arm64/vdso: use the fault callback to map vvar pages
* for-next/msi-iommu:
: Make the MSI/IOMMU input/output ID translation PCI agnostic, augment the
: MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID bus-specific parameter
: and apply the resulting changes to the device ID space provided by the
: Freescale FSL bus
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
* for-next/trivial:
: Trivial fixes
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
Fix HASH_OF_MAPS bug of not putting inner map pointer on bpf_map_elem_update()
operation. This is due to per-cpu extra_elems optimization, which bypassed
free_htab_elem() logic doing proper clean ups. Make sure that inner map is put
properly in optimized case as well.
Fixes: 8c290e60fa2a ("bpf: fix hashmap extra_elems logic")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729040913.2815687-1-andriin@fb.com
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl8hgm0UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPc4xAAxWSkLThFbdC+dWA8cFQvyJhXdcl6
C3ALyBnx2hyr/MxJ9OcfYDl8TMafKFkXzq4+2vLiZPl/UBSpnr47ralUHl+aAh+I
cZdV9bF3aSlsb4mIEg3H03xkPBCWfTR+UMzdrYAgqxyeYoZ/VteR1O3yWi80caQK
vh2UlbuPyiEsz1A21ems88dDw28RkzETNFmBARSh7cPrvGorQNJKYGkMNqsVpUbb
elx+DCSh4J+QYqByeQUY64L1n7jHGQkTpdZaVA7FhBeAilelL6PIa4qpyHU28VGg
ZzOWJBkZwYz1lVEhHu1h3Jzv9dwTzzyopJ/YpPZUsvZ+GPuIfYmY+C1InkMvGd4S
Ytj9WO+rNpvJR8EWUhl1O7J/0HN+dy3MGst9MkJOMea0gsgf9cTgnIEohFawYZRt
t1pKB2VximglOx2IRVK/2//8u/s8d7c5/5uVY4akS++tbrk5j8uPcO+4wIf/njMM
WqfUT58M6oY9mQkErewNrZEi2CHBg71GT4hJQ+1qnyrTSe9WfrmA01m/pIUNHzu3
j1hhZH2KCT5IKF4b5dA2DmssorfVgC1VnAoa0UM9jC+awqSYI83S20d8EF48msIW
XqEUSURh/bfn3T9Y75YVsNJ6EOvrhsf9TSCb43oNhAXBv0+XgO3bKOpBB6W+UIZ7
86vGfemi82Rt+Sk=
=zLU9
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20200729' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fixes from Paul Moore:
"One small audit fix that you can hopefully merge before v5.8 is
released. Unfortunately it is a revert of a patch that went in during
the v5.7 window and we just recently started to see some bug reports
relating to that commit.
We are working on a proper fix, but I'm not yet clear on when that
will be ready and we need to fix the v5.7 kernels anyway, so in the
interest of time a revert seemed like the best solution right now"
* tag 'audit-pr-20200729' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
revert: 1320a4052ea1 ("audit: trigger accompanying records when no rules present")
This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.
Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.
In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state. For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.
Reported-by: Amit Klein <aksecurity@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unfortunately the commit listed in the subject line above failed
to ensure that the task's audit_context was properly initialized/set
before enabling the "accompanying records". Depending on the
situation, the resulting audit_context could have invalid values in
some of it's fields which could cause a kernel panic/oops when the
task/syscall exists and the audit records are generated.
We will revisit the original patch, with the necessary fixes, in a
future kernel but right now we just want to fix the kernel panic
with the least amount of added risk.
Cc: stable@vger.kernel.org
Fixes: 1320a4052ea1 ("audit: trigger accompanying records when no rules present")
Reported-by: j2468h@googlemail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:
"It looks like what happens is that because the interrupts are not per-CPU
in the hardware, armpmu_request_irq() calls irq_force_affinity() while
the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
IRQF_NOBALANCING.
Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
irq_setup_affinity() which returns early because IRQF_PERCPU and
IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.
Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...
Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
If a tracee is uprobed and it hits int3 inserted by debugger, handle_swbp()
does send_sig(SIGTRAP, current, 0) which means si_code == SI_USER. This used
to work when this code was written, but then GDB started to validate si_code
and now it simply can't use breakpoints if the tracee has an active uprobe:
# cat test.c
void unused_func(void)
{
}
int main(void)
{
return 0;
}
# gcc -g test.c -o test
# perf probe -x ./test -a unused_func
# perf record -e probe_test:unused_func gdb ./test -ex run
GNU gdb (GDB) 10.0.50.20200714-git
...
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7ddf909 in dl_main () from /lib64/ld-linux-x86-64.so.2
(gdb)
The tracee hits the internal breakpoint inserted by GDB to monitor shared
library events but GDB misinterprets this SIGTRAP and reports a signal.
Change handle_swbp() to use force_sig(SIGTRAP), this matches do_int3_user()
and fixes the problem.
This is the minimal fix for -stable, arch/x86/kernel/uprobes.c is equally
wrong; it should use send_sigtrap(TRAP_TRACE) instead of send_sig(SIGTRAP),
but this doesn't confuse GDB and needs another x86-specific patch.
Reported-by: Aaron Merey <amerey@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200723154420.GA32043@redhat.com
Since the default_wake_function() passes its flags onto
try_to_wake_up(), warn if those flags collide with internal values.
Given that the supplied flags are garbage, no repair can be done but at
least alert the user to the damage they are causing.
In the belief that these errors should be picked up during testing, the
warning is only compiled in under CONFIG_SCHED_DEBUG.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20200723201042.18861-1-chris@chris-wilson.co.uk
Only its reorder field is actually used now, so remove the struct and
embed @reorder directly in parallel_data.
No functional change, just a cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There's no reason to have two interfaces when there's only one caller.
Removing _possible saves text and simplifies future changes.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A padata instance has effective cpumasks that store the user-supplied
masks ANDed with the online mask, but this middleman is unnecessary.
parallel_data keeps the same information around. Removing this saves
text and code churn in future changes.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pd_setup_cpumasks() has only one caller. Move its contents inline to
prepare for the next cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
padata_stop() has two callers and is unnecessary in both cases. When
pcrypt calls it before padata_free(), it's being unloaded so there are
no outstanding padata jobs[0]. When __padata_free() calls it, it's
either along the same path or else pcrypt initialization failed, which
of course means there are also no outstanding jobs.
Removing it simplifies padata and saves text.
[0] https://lore.kernel.org/linux-crypto/20191119225017.mjrak2fwa5vccazl@gondor.apana.org.au/
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
padata_start() is only used right after pcrypt allocates an instance
with all possible CPUs, when PADATA_INVALID can't happen, so there's no
need for a separate "start" step. It can be done during allocation to
save text, make using padata easier, and avoid unneeded calls in the
future.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is apparently one site that violates the rule that only current
and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced()
will change task->state for a remote task.
Oleg explains:
"TASK_TRACED/TASK_STOPPED was always protected by siglock. In
particular, ttwu(__TASK_TRACED) must be always called with siglock
held. That is why ptrace_freeze_traced() assumes it can safely do
s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)."
This breaks the ordering scheme introduced by commit:
dbfb089d360b ("sched: Fix loadavg accounting race")
Specifically, the reload not matching no longer implies we don't have
to block.
Simply things by noting that what we need is a LOAD->STORE ordering
and this can be provided by a control dependency.
So replace:
prev_state = prev->state;
raw_spin_lock(&rq->lock);
smp_mb__after_spinlock(); /* SMP-MB */
if (... && prev_state && prev_state == prev->state)
deactivate_task();
with:
prev_state = prev->state;
if (... && prev_state) /* CTRL-DEP */
deactivate_task();
Since that already implies the 'prev->state' load must be complete
before allowing the 'prev->on_rq = 0' store to become visible.
Fixes: dbfb089d360b ("sched: Fix loadavg accounting race")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.
Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it also has one read
memory barrier less than the currently used raw_read_seqcount() API.
Use raw_read_seqcount_latch() for the seqcount_t latch read path.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
Link: https://lore.kernel.org/r/20200716051130.4359-3-leo.yan@linaro.org
References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Will Deacon <will@kernel.org>
In order to support perf_event_mmap_page::cap_time features, an
architecture needs, aside from a userspace readable counter register,
to expose the exact clock data so that userspace can convert the
counter register into a correct timestamp.
Provide struct clock_read_data and two (seqcount) helpers so that
architectures (arm64 in specific) can expose the numbers to userspace.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-2-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
- A timer which is already expired at enqueue time can set the
base->next_expiry value backwards. As a consequence base->clk can be set
back as well. This can lead to timers expiring early. Add a sanity check
to prevent this.
- When a timer is queued with an expiry time beyond the wheel capacity
then it should be queued in the bucket of the last wheel level which is
expiring last. The code adjusts expiry time to the maximum wheel
capacity, which is only correct when the wheel clock is 0. Aside of that
the check whether the delta is larger than wheel capacity does not check
the delta, it checks the expiry value itself. As a result timers can
expire at random.
Fix this by checking the right variable and adjust expiry time so it
becomes base->clock plus capacity which places it into the outmost
bucket in the last wheel level.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8URRQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQVMD/0VMkT36A8SKbPudMLZ5REp63E629wQ
yuGJz9IJPE1NYB25PXL0TmVAQpseXKDKh3eSP2ac6Ao1FTUk/He/CwF2tsGvu+tm
kxpuPQgeUF8BeF7WzE21k4NeAmTv8eaIxirQPRQBRJldHuNG9u0l1u8dr0rT2mQG
N0djinQvM4bRUVa10l4dz6gE2F0Egjv5sIZohv3E6ORwisJxJoZUUFMlqfuS+2Xt
lOebR8juJahIDRM3ihhZfXJI2tCPD/FnrcMWbk1z3NbsE6C2MiG4ncrjxR2MY81Q
zRr3CrN6TgjTUkvSMOP1SuFePEKLc/2rl5dg9EcGEFNOyggPEezSB/sL1HavRsV9
2s/hmLB6VR5GQwhMnhbLTq3jAI9M9P1S4VEoKHlDs8LoCxtQ+g+2IKmSVqKWXFuO
6AscBbNQkEbrkTx+OkbHWYc7+RLQE87ryCNODeETzSwE0H3NLk/GRQirq6LO9ESq
AjVg5085YZXEIzistsSON0aTdY0eIIVsmaYmFOI0qNPnSUCOPlHIXwD+ju1WEW4h
QtM6BW6xggydgSLgOWQQzKpgBfLW3j7F4r7cFsNCjaQ7UtDQMPMMm+ATBpoT8vdA
EHR/FC4U8ABiXpnleh87B1WCpQr6p6qo95eIbe5UxY3yPdPb32s1/+ycFngW9XPj
B4353TQp7aNRUw==
=aCiv
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull timer fixes from Thomas Gleixner:
"Two fixes for the timer wheel:
- A timer which is already expired at enqueue time can set the
base->next_expiry value backwards. As a consequence base->clk can
be set back as well. This can lead to timers expiring early. Add a
sanity check to prevent this.
- When a timer is queued with an expiry time beyond the wheel
capacity then it should be queued in the bucket of the last wheel
level which is expiring last.
The code adjusted the expiry time to the maximum wheel capacity,
which is only correct when the wheel clock is 0. Aside of that the
check whether the delta is larger than wheel capacity does not
check the delta, it checks the expiry value itself. As a result
timers can expire at random.
Fix this by checking the right variable and adjust expiry time so
it becomes base->clock plus capacity which places it into the
outmost bucket in the last wheel level"
* tag 'timers-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Fix wheel index calculation on last level
timer: Prevent base->clk from moving backward
- Plug a load average accounting race which was introduced with a recent
optimization casing load average to show bogus numbers.
- Fix the rseq CPU id initialization for new tasks. sched_fork() does not
update the rseq CPU id so the id is the stale id of the parent task,
which can cause user space data corruption.
- Handle a 0 return value of task_h_load() correctly in the load balancer,
which does not decrease imbalance and therefore pulls until the maximum
number of loops is reached, which might be all tasks just created by a
fork bomb.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UQrITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTNgD/4+uP0wmIuYAJd1WmpifX+G9h+3NIiU
zfLTxGeo+D/I+rdeS7ClyjeSTcZHl1fQfZBIopMevsEMymMu2BbQd+OeAlkESbS6
dp6G3dv0ZGbm9Sn4G3CEEPltoCJi7pOgRrixGi/o4APkNfy3U2r+w/kM1N6AwHE0
PYztzvq5Q++m+MEHOALsB1J8mc7vygU26EO4s/rRrV6/RnNZXL269PeZRFxxEvYn
rtmyUw53Lc72Y+23FuityE/jb2xkr80yuXQWxTOxbhzBtHO1omWQQVhBTMam5RDg
NUYzeZvK/nZW3i6WOuHyaaLj7+2ML7RmNpaYRueymJinda409GDXRcDOYXNFtxcI
lcVmsxzNF5rb7b9mXqdgdSJKuZotKLnTjXAIGhHzkSkl2uYfYW6PUGxq6BmSCKvR
GpewHQ8Ynf4JcsjioOTQjRNjJYmlrTsHcUUKXsyTIfYaEEw+i/7s/7G5G7bXxJ6G
Sma52oTyrsFQEG+AjT2CxhOzxQumtT5vQ9/l8EvnQXQdG7fZzIimgWnTBc6IE83J
OPYI8WomKhj+EkJSltxUQm+ZwqhDv4rBHQ+SqPr+jhvPobUN6jS0HkOoW+SIGuo4
oMRvMiNhCyUWLFYMVL2pflJANyiFczfKWyAqyjwgiSjfNaTqSmYCcPOc1NWz/Ic4
fGMLMqFQ2fW/rg==
=bCPw
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull scheduler fixes from Thomas Gleixner:
"A set of scheduler fixes:
- Plug a load average accounting race which was introduced with a
recent optimization casing load average to show bogus numbers.
- Fix the rseq CPU id initialization for new tasks. sched_fork() does
not update the rseq CPU id so the id is the stale id of the parent
task, which can cause user space data corruption.
- Handle a 0 return value of task_h_load() correctly in the load
balancer, which does not decrease imbalance and therefore pulls
until the maximum number of loops is reached, which might be all
tasks just created by a fork bomb"
* tag 'sched-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: handle case of task_h_load() returning 0
sched: Fix unreliable rseq cpu_id for new tasks
sched: Fix loadavg accounting race
- Make the handling of the firmware node consistent and do not free the
node after the domain has been created successfully. The core code
stores a pointer to it which can lead to a use after free or double
free.
This used to "work" because the pointer was not stored when the initial
code was written, but at some point later it was required to store
it. Of course nobody noticed that the existing users break that way.
- Handle affinity setting on inactive interrupts correctly when
hierarchical irq domains are enabled. When interrupts are inactive with
the modern hierarchical irqdomain design, the interrupt chips are not
necessarily in a state where affinity changes can be handled. The legacy
irq chip design allowed this because interrupts are immediately fully
initialized at allocation time. X86 has a hacky workaround for this, but
other implementations do not. This cased malfunction on GIC-V3. Instead
of playing whack a mole to find all affected drivers, change the core
code to store the requested affinity setting and then establish it when
the interrupt is allocated, which makes the X86 hack go away.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UP+4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoSuZD/9tNPR4fIDt4mC9ciSvwSqGTV+q1y1D
zhXSDro4cJNjzy/9D475IJqOlvchaF9Nfun55b60Q6vnA4VN8G+kABEaG8uwr8mV
ijTB4f0qKfW/9kUDTJRScq3nNmC3miqg8ZFgFEn6Ecxj3NHmwidATIi5sF6f/XVG
DdhL0Jys7ycNeGBf7yIKbT5/NOULMHYy9rK1NDAeBo9u3klvmrwrHgdNsiDDhEaU
KlHtwuQLCdjFY3Lf67YpSah+Hx/gXPI1VHUxDDFRoFmC4RlB0VjyXGydjsisOrSQ
Cl2gnkQ6VOlLaLbN38nmia9nyb6npzE5iK1h9EDcaRhBACG9O23Bdo+YZYxl6BOP
mXuyIVKJYczJEp7j1fGHW/aNCoEqC8dGVyN7toxMVfGZmF12JzMSt4SYItPeSjFC
bPNPRCscpiMOQdgwgO0woK1764V46g1BlmxXtJRdWB4iwWgXcryaz65xzSfNeZF4
0+TvdYs2FYjxwwIyWj8xJ3Npe1lKhH+06DA6gziwJt1u4it8rl82UcqMFyf/ty1w
o5LHyMBWYm7SJXSeaZZj+nv7moJKJnmRYKnpry21cUzsK/vQEPX0vqhwh4dSFN3O
BaBocDsOk+9wkmUwi6haP+6+vpadAFQrsqVhURtwc6OVSWn2/vsf2ZH5P36xwFWD
tlFanb8hX9y2NQ==
=elM3
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull irq fixes from Thomas Gleixner:
"Two fixes for the interrupt subsystem:
- Make the handling of the firmware node consistent and do not free
the node after the domain has been created successfully. The core
code stores a pointer to it which can lead to a use after free or
double free.
This used to "work" because the pointer was not stored when the
initial code was written, but at some point later it was required
to store it. Of course nobody noticed that the existing users break
that way.
- Handle affinity setting on inactive interrupts correctly when
hierarchical irq domains are enabled.
When interrupts are inactive with the modern hierarchical irqdomain
design, the interrupt chips are not necessarily in a state where
affinity changes can be handled. The legacy irq chip design allowed
this because interrupts are immediately fully initialized at
allocation time. X86 has a hacky workaround for this, but other
implementations do not.
This cased malfunction on GIC-V3. Instead of playing whack a mole
to find all affected drivers, change the core code to store the
requested affinity setting and then establish it when the interrupt
is allocated, which makes the X86 hack go away"
* tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Handle affinity setting on inactive interrupts correctly
irqdomain/treewide: Keep firmware node unconditionally allocated
Setting interrupt affinity on inactive interrupts is inconsistent when
hierarchical irq domains are enabled. The core code should just store the
affinity and not call into the irq chip driver for inactive interrupts
because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which
causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in
the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask
- Update the effective affinity to reflect that so user space has
a consistent view
- Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly
because the affinity setting is established in the irq chip when the
interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled
by the architecture. Doing it unconditionally would break legacy irq chip
implementations.
For hierarchial irq domains this works correctly as none of the drivers can
have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Reported-by: Ali Saidi <alisaidi@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
When an expiration delta falls into the last level of the wheel, that delta
has be compared against the maximum possible delay and reduced to fit in if
necessary.
However instead of comparing the delta against the maximum, the code
compares the actual expiry against the maximum. Then instead of fixing the
delta to fit in, it sets the maximum delta as the expiry value.
This can result in various undesired outcomes, the worst possible one
being a timer expiring 15 days ahead to fire immediately.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200717140551.29076-2-frederic@kernel.org
task_h_load() can return 0 in some situations like running stress-ng
mmapfork, which forks thousands of threads, in a sched group on a 224 cores
system. The load balance doesn't handle this correctly because
env->imbalance never decreases and it will stop pulling tasks only after
reaching loop_max, which can be equal to the number of running tasks of
the cfs. Make sure that imbalance will be decreased by at least 1.
misfit task is the other feature that doesn't handle correctly such
situation although it's probably more difficult to face the problem
because of the smaller number of CPUs and running tasks on heterogenous
system.
We can't simply ensure that task_h_load() returns at least one because it
would imply to handle underflow in other places.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: <stable@vger.kernel.org> # v4.4+
Link: https://lkml.kernel.org/r/20200710152426.16981-1-vincent.guittot@linaro.org
There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. So stop using it.
Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask")
Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When allocating DMA memory from a pool, the core can only guess which
atomic pool will fit a device's constraints. If it doesn't, get a safer
atomic pool and try again.
Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask")
Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
dma-pool's dev_to_pool() creates the false impression that there is a
way to grantee a mapping between a device's DMA constraints and an
atomic pool. It tuns out it's just a guess, and the device might need to
use an atomic pool containing memory from a 'safer' (or lower) memory
zone.
To help mitigate this, introduce dma_guess_pool() which can be fed a
device's DMA constraints and atomic pools already known to be faulty, in
order for it to provide an better guess on which pool to use.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function is only used once and can be simplified to a one-liner.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
dma_coherent_ok() checks if a physical memory area fits a device's DMA
constraints.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
I have a few KGDB-related fixes that I'd like to target for 5.8-rc5. They're
mostly fixes for build warnings, but there's also:
* Support for the qSupported and qXfer packets, which are necessary to pass
around GDB XML information which we need for the RISC-V GDB port to fully
function.
* Users can now select STRICT_KERNEL_RWX instead of forcing it on.
I know it's a bit late for rc5, as these are not critical it's not a big deal
if they don't make it in.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl8KH4UTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYifPsEACcpQJRzLaYxjTP6INLtUK2J1jvx3Md
D0QfzGQsWLOtqtk37vXUt+0KPS8vErvDHzfD1ZkHKDVFIVt4ZEVfDyPPx74nuvns
qpyFkHuv2f+icTf+YnZyH+MZW8iFesOwqbfXC5YnhI/vcqeieafd8U3t3oDik5SI
NuT0uiWAiTqUPan2vu1xrBBynxpCyCM/U/ZONf3J38wL6Mck0GTc2NjAsAsmpnZJ
pxhkGFiDIuOUuJDCDbQBoC5bWamDYYZOuhrjMizILdqiDlxdBSTSmLWpCfXtp7ls
xZL+/QV0BSR8ymSnMMAowXCrK+TTFY62bxOLhpvk5uDGEtW6F9jOh7VsW8vAtz+x
WmqcgTtPrtyvNn4hM/1Md0IV58pKU+VaeLeKQQu3V5jH6h3s+YSSyWtuheLsnhI8
KWdd88xU0Tp7ym7BcaQqXM6UbmT61YAyr1R2VcwsiSz/uRwpKYdfo12FDmTr6FxN
Br5HL0okfmDnE9KgEhEY9kbRt3FM2aoLvYlVTdRX5yAnoF1/Dnh0Jry5kOkD6OuO
lIbzvwzziTqA/STJ5UuoXRrUfwHQ+XLEMo9zGhEAv6mfXYoIkX9txVeIKFrIDkKU
dBGKL3mSruntDp/FfCgksDlZUy111VcwwdxpeplHCcyI8YGPsaavO9B8qkI5iJbG
WEukopxoA5Yj0g==
=kyQD
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"I have a few KGDB-related fixes. They're mostly fixes for build
warnings, but there's also:
- Support for the qSupported and qXfer packets, which are necessary
to pass around GDB XML information which we need for the RISC-V GDB
port to fully function.
- Users can now select STRICT_KERNEL_RWX instead of forcing it on"
* tag 'riscv-for-linus-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Avoid kgdb.h including gdb_xml.h to solve unused-const-variable warning
kgdb: Move the extern declaration kgdb_has_hit_break() to generic kgdb.h
riscv: Fix "no previous prototype" compile warning in kgdb.c file
riscv: enable the Kconfig prompt of STRICT_KERNEL_RWX
kgdb: enable arch to support XML packet.
Pull networking fixes from David Miller:
1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
BPF programs, from Maciej Żenczykowski.
2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
Mariappan.
3) Slay memory leak in nl80211 bss color attribute parsing code, from
Luca Coelho.
4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.
5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
Dumazet.
6) xsk code dips too deeply into DMA mapping implementation internals.
Add dma_need_sync and use it. From Christoph Hellwig
7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.
8) Check for disallowed attributes when loading flow dissector BPF
programs. From Lorenz Bauer.
9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
Jason A. Donenfeld.
10) Don't advertise checksum offload on ipa devices that don't support
it. From Alex Elder.
11) Resolve several issues in TCP MD5 signature support. Missing memory
barriers, bogus options emitted when using syncookies, and failure
to allow md5 key changes in established states. All from Eric
Dumazet.
12) Fix interface leak in hsr code, from Taehee Yoo.
13) VF reset fixes in hns3 driver, from Huazhong Tan.
14) Make loopback work again with ipv6 anycast, from David Ahern.
15) Fix TX starvation under high load in fec driver, from Tobias
Waldekranz.
16) MLD2 payload lengths not checked properly in bridge multicast code,
from Linus Lüssing.
17) Packet scheduler code that wants to find the inner protocol
currently only works for one level of VLAN encapsulation. Allow
Q-in-Q situations to work properly here, from Toke
Høiland-Jørgensen.
18) Fix route leak in l2tp, from Xin Long.
19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
support and various protocols. From Martin KaFai Lau.
20) Fix socket cgroup v2 reference counting in some situations, from
Cong Wang.
21) Cure memory leak in mlx5 connection tracking offload support, from
Eli Britstein.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
mlxsw: pci: Fix use-after-free in case of failed devlink reload
mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
net: macb: fix call to pm_runtime in the suspend/resume functions
net: macb: fix macb_suspend() by removing call to netif_carrier_off()
net: macb: fix macb_get/set_wol() when moving to phylink
net: macb: mark device wake capable when "magic-packet" property present
net: macb: fix wakeup test in runtime suspend/resume routines
bnxt_en: fix NULL dereference in case SR-IOV configuration fails
libbpf: Fix libbpf hashmap on (I)LP32 architectures
net/mlx5e: CT: Fix memory leak in cleanup
net/mlx5e: Fix port buffers cell size value
net/mlx5e: Fix 50G per lane indication
net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
net/mlx5e: Fix VXLAN configuration restore after function reload
net/mlx5e: Fix usage of rcu-protected pointer
net/mxl5e: Verify that rpriv is not NULL
net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
net/mlx5: Fix eeprom support for SFP module
cgroup: Fix sock_cgroup_data on big-endian.
selftests: bpf: Fix detach from sockmap tests
...
- add a warning when the atomic pool is depleted (David Rientjes)
- protect the parameters of the new scatterlist helper macros
(Marek Szyprowski )
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl8IjBILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN10RAAjCGeb2ImNmGHgqZEbJ5KM99g/gVeGJO2aUOLQWCx
qr3Jx0PX6TaGi/tg4OMJFwA8oErHh6bZO1OWVp7PShmeEHRdRp+FPmcb0PzRM1pO
gNxgouJIj+B47enkFwRjLpiST5YVoP90Sn61I8Vr9hiC88TaLho0Kj2hkvTcKRln
NCahkT9NTQpoC1iFR+lMje1yodEzWum3+aAEmjIaebeMJor1v8RRGkYXJASdD1V2
whchfZCWM6Jhr9PUAL3NnTbQXccI7qOkCCsxssW652SysIN6dV8XmBmoH/VUC5QE
soScl93T0EZvBdUreEvKSjVO3BOCRuemuzQ9myFk4c/olKGqQO675G1sCs9RIawz
UEAtWEWYC/CluKvzjJuJl2pGmfNRuazsylLA6WDQGqQoe8uJ/9qKKpCr9jRn3shl
dUccyFQWrmXrh76qXPvB05D0/qb4JNVhyXYLiD8DhzR3DlH1d5z52TWDT9g/J84Q
usq69gwZq65MZYMHWRlRRXYdEuvQxgEZvl2ecYA/ZaW1wh6XYGBCQI5CtG5E2sOP
8THs5E+u1PQaJWqdIR57xCuNxpWS+r6nv0N7z4vIQtwVkXPO3lS7aVNClIOY1u5/
m7SEeJ4ZBtVZsA4nQbG3sxAiA1GT8nm5JugwfOIgmMyxrpRbNWj8IrIe49Ckbhqa
YZQ=
=KI5i
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- add a warning when the atomic pool is depleted (David Rientjes)
- protect the parameters of the new scatterlist helper macros (Marek
Szyprowski )
* tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mapping:
scatterlist: protect parameters of the sg_table related macros
dma-mapping: warn when coherent pool is depleted
The XML packet could be supported by required architecture if the
architecture defines CONFIG_HAVE_ARCH_KGDB_QXFER_PKT and implement its own
kgdb_arch_handle_qxfer_pkt(). Except for the kgdb_arch_handle_qxfer_pkt(),
the architecture also needs to record the feature supported by gdb stub
into the kgdb_arch_gdb_stub_feature, and these features will be reported
to host gdb when gdb stub receives the qSupported packet.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Several users of kallsyms_show_value() were performing checks not
during "open". Refactor everything needed to gain proper checks against
file->f_cred for modules, kprobes, and bpf.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8GUbMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJohnD/9VsPsAMV+8lhsPvkcuW/DkTcAY
qUEzsXU3v06gJ0Z/1lBKtisJ6XmD93wWcZCTFvJ0S8vR3yLZvOVfToVjCMO32Trc
4ZkWTPwpvfeLug6T6CcI2ukQdZ/opI1cSabqGl79arSBgE/tsghwrHuJ8Exkz4uq
0b7i8nZa+RiTezwx4EVeGcg6Dv1tG5UTG2VQvD/+QGGKneBlrlaKlI885N/6jsHa
KxvB7+8ES1pnfGYZenx+RxMdljNrtyptbQEU8gyvoV5YR7635gjZsVsPwWANJo+4
EGcFFpwWOAcVQaC3dareLTM8nVngU6Wl3Rd7JjZtjvtZba8DdCn669R34zDGXbiP
+1n1dYYMSMBeqVUbAQfQyLD0pqMIHdwQj2TN8thSGccr2o3gNk6AXgYq0aYm8IBf
xDCvAansJw9WqmxErIIsD4BFkMqF7MjH3eYZxwCPWSrKGDvKxQSPV5FarnpDC9U7
dYCWVxNPmtn+unC/53yXjEcBepKaYgNR7j5G7uOfkHvU43Bd5demzLiVJ10D8abJ
ezyErxxEqX2Gr7JR2fWv7iBbULJViqcAnYjdl0y0NgK/hftt98iuge6cZmt1z6ai
24vI3X4VhvvVN5/f64cFDAdYtMRUtOo2dmxdXMid1NI07Mj2qFU1MUwb8RHHlxbK
8UegV2zcrBghnVuMkw==
=ib5Q
-----END PGP SIGNATURE-----
Merge tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kallsyms fix from Kees Cook:
"Refactor kallsyms_show_value() users for correct cred.
I'm not delighted by the timing of getting these changes to you, but
it does fix a handful of kernel address exposures, and no one has
screamed yet at the patches.
Several users of kallsyms_show_value() were performing checks not
during "open". Refactor everything needed to gain proper checks
against file->f_cred for modules, kprobes, and bpf"
* tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests: kmod: Add module address visibility test
bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
kprobes: Do not expose probe addresses to non-CAP_SYSLOG
module: Do not expose section addresses to non-CAP_SYSLOG
module: Refactor section attr into bin attribute
kallsyms: Refactor kallsyms_show_value() to take cred
bpf_sk_reuseport_detach is currently called when sk->sk_user_data
is not NULL. It is incorrect because sk->sk_user_data may not be
managed by the bpf's reuseport_array. It has been reported in [1] that,
the bpf_sk_reuseport_detach() which is called from udp_lib_unhash() has
corrupted the sk_user_data managed by l2tp.
This patch solves it by using another bit (defined as SK_USER_DATA_BPF)
of the sk_user_data pointer value. It marks that a sk_user_data is
managed/owned by BPF.
The patch depends on a PTRMASK introduced in
commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
[ Note: sk->sk_user_data is used by bpf's reuseport_array only when a sk is
added to the bpf's reuseport_array.
i.e. doing setsockopt(SO_REUSEPORT) and having "sk->sk_reuseport == 1"
alone will not stop sk->sk_user_data being used by other means. ]
[1]: https://lore.kernel.org/netdev/20200706121259.GA20199@katalix.com/
Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
Reported-by: James Chapman <jchapman@katalix.com>
Reported-by: syzbot+9f092552ba9a5efca5df@syzkaller.appspotmail.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: James Chapman <jchapman@katalix.com>
Acked-by: James Chapman <jchapman@katalix.com>
Link: https://lore.kernel.org/bpf/20200709061110.4019316-1-kafai@fb.com
It makes little sense for copying sk_user_data of reuseport_array during
sk_clone_lock(). This patch reuses the SK_USER_DATA_NOCOPY bit introduced in
commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
It is used to mark the sk_user_data is not supposed to be copied to its clone.
Although the cloned sk's sk_user_data will not be used/freed in
bpf_sk_reuseport_detach(), this change can still allow the cloned
sk's sk_user_data to be used by some other means.
Freeing the reuseport_array's sk_user_data does not require a rcu grace
period. Thus, the existing rcu_assign_sk_user_data_nocopy() is not
used.
Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200709061104.4018798-1-kafai@fb.com
When a timer is enqueued with a negative delta (ie: expiry is below
base->clk), it gets added to the wheel as expiring now (base->clk).
Yet the value that gets stored in base->next_expiry, while calling
trigger_dyntick_cpu(), is the initial timer->expires value. The
resulting state becomes:
base->next_expiry < base->clk
On the next timer enqueue, forward_timer_base() may accidentally
rewind base->clk. As a possible outcome, timers may expire way too
early, the worst case being that the highest wheel levels get spuriously
processed again.
To prevent from that, make sure that base->next_expiry doesn't get below
base->clk.
Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200703010657.2302-1-frederic@kernel.org
When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file->f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook <keescook@chromium.org>