Fix three bugs in aio poll, and one issue with POLLFREE more broadly:
- aio poll didn't handle POLLFREE, causing a use-after-free.
- aio poll could block while the file is ready.
- aio poll called eventfd_signal() when it isn't allowed.
- POLLFREE didn't handle multiple exclusive waiters correctly.
This has been tested with the libaio test suite, as well as with test
programs I wrote that reproduce the first two bugs. I am sending this
pull request myself as no one seems to be maintaining this code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYbObthQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+3mAQC9W8ApzBleEPI6FXzIIo5AiQT/2jGl
7FbO1MtkdUBU4QEAzf+VWl4Z4BJTgxl44avRdVDpXGAMnbWkd7heY+e3HwA=
=mp+r
-----END PGP SIGNATURE-----
Merge tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull aio poll fixes from Eric Biggers:
"Fix three bugs in aio poll, and one issue with POLLFREE more broadly:
- aio poll didn't handle POLLFREE, causing a use-after-free.
- aio poll could block while the file is ready.
- aio poll called eventfd_signal() when it isn't allowed.
- POLLFREE didn't handle multiple exclusive waiters correctly.
This has been tested with the libaio test suite, as well as with test
programs I wrote that reproduce the first two bugs. I am sending this
pull request myself as no one seems to be maintaining this code"
* tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
aio: Fix incorrect usage of eventfd_signal_allowed()
aio: fix use-after-free due to missing POLLFREE handling
aio: keep poll requests on waitqueue until completed
signalfd: use wake_up_pollfree()
binder: use wake_up_pollfree()
wait: add wake_up_pollfree()
* Logic bugs in CR0 writes and Hyper-V hypercalls
* Don't use Enlightened MSR Bitmap for L3
* Remove user-triggerable WARN
Plus a few selftest fixes and a regression test for the
user-triggerable WARN.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGzZuIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMD4Qf+Im7Q0XNRZGzK6x4Blu3ZZJSuIIkW
gEK5mDX/BWSYxoGhRN0IOkyf1Tx/A5qYwbZts87wZSvKONG2MuVzdeQ0mkDxgKc3
cYwvvIPxCKaW/dQLD2OKVlqdAv6YbeJiFURWXgszMkrcgHvw39H5Tn6ldi0B5nvg
Gvpj8LtbPDXGXab//Xrhia3+1F9TKOrcOG+obGC5G2mrGKTkG2+pi9L6LohvENhd
sOSWdpmvQTU4PeqGlhW8RCwcN+vpa+NasHT2i2tHcWZA9Lqp4P81+4ZyQLIBsRB3
psANG0c40EW+lfjFGbLL/6VR5kypxa6zy9RgX+QiRcj6C0+dJOgnwNY35A==
=cREz
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"More x86 fixes:
- Logic bugs in CR0 writes and Hyper-V hypercalls
- Don't use Enlightened MSR Bitmap for L3
- Remove user-triggerable WARN
Plus a few selftest fixes and a regression test for the
user-triggerable WARN"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/O
KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit
KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode
selftests: KVM: avoid failures due to reserved HyperTransport region
KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall
KVM: x86: selftests: svm_int_ctl_test: fix intercept calculation
KVM: nVMX: Don't use Enlightened MSR Bitmap for L3
Maxime points out that the polling code in mpc_i2c_isr should use the
_atomic API because it is called in an irq context and that the
behaviour of the MCF bit is that it is 1 when the byte transfer is
complete. All of this means the original code was effectively a
udelay(100).
Fix this by using readb_poll_timeout_atomic() and removing the negation
of the break condition.
Fixes: 4a8ac5e45cda ("i2c: mpc: Poll for MCF")
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
We check IO_WQ_BIT_EXIT before attempting to create a new worker, and
wq exit cancels pending work if we have any. But it's possible to have
a race between the two, where creation checks exit finding it not set,
but we're in the process of exiting. The exit side will cancel pending
creation task_work, but there's a gap where we add task_work after we've
canceled existing creations at exit time.
Fix this by checking the EXIT bit post adding the creation task_work.
If it's set, run the same cancelation that exit does.
Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we successfully cancel a work item but that work item needs to be
processed through task_work, then we can be sleeping uninterruptibly
in io_uring_cancel_generic() and never process it. Hence we don't
make forward progress and we end up with an uninterruptible sleep
warning.
While in there, correct a comment that should be IFF, not IIF.
Reported-and-tested-by: syzbot+21e6887c0be14181206d@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGzf4IUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwjcBAAiWel9P5H947jR9sTbz4ya6wH1biD
k2w97VDa65DyH/LBJSgNwmblnXs7yIUuGTd+mRq9bhlpE8CQi9BfeCehP1vCfTeQ
JtMH62dW8KBLkvIHU83H1SSZZNKQgDn7hqUsrrMa0HD+Z+ovbuQYp4M1Oh6xRAEM
TTBTKb0KivA8bFwvtgj/mu7K7sVJH+cVMilD9ABoVeGmCWfUSO48ovEjWB+vmBFs
UyTCU5CUg/FkjvVmZTOv5GY4EL83FA9Jdtzy8inRA+hSWY6ImXHTzmQlAzvA+Rkv
k344ZQM9GNvbvwKfBa9iW2g+B2y/OJXafGoVL0NBUcj/eiY5dnAX0/tZHvx0aXFy
G1Txy2utaG2MSkfZzchEKbRvS0tV7kiFiTmqp9lNmffTZiP72k4+kFJHQC5AzvZb
O7Ce/XSQifQ1Z3f5B+Ymx6EOgKYJUaWO9B1U1KF0EKGMe5GB0TBiXh/tS2EmV1O8
1hkUJm032Bbf1Bv5R6BLdgKVz4I3UsqmGKH5gg3blyylAQ1oHsioaUKeV6iHSq40
u9rNZaKGC3SweYZVISNE1uoII4qzEgLOHggHpZvWxhQy35cFBz8ZsNfLwBD3/8z9
UfFuLSLHjx+hv3Ev5mgDWH1mzAlzyq5KkDT0bodBix07s5mviDH+57yyw3JtHUuL
F+tMrYjUHK9ArC8=
=BWkT
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Revert emulation of Marvell Armada A3720 expansion ROM because it
doesn't work as expected (Marek Behún)
- Assert PERST# in Apple M1 driver to fix initialization when booting
from bootloaders using PCIe, such as U-Boot (Marc Zyngier)
- Describe PERST# as active low in Apple T8103 DT and update driver to
match (Marc Zyngier)
* tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: apple: Fix PERST# polarity
arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT
PCI: apple: Follow the PCIe specifications when resetting the port
Revert "PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge"
* Fix a sparse warning in the ahci_ceva driver, from me
* Disable the ASMedia 1092 non-functional device, from Hannes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYbMACQAKCRDdoc3SxdoY
dt1wAQDbewJv2zf5eCUkAF2/NJaRJvrT8HmcbihsGic+NzfdwAD+PvF6XS4DMbr5
ee6g7c0SYOb5ZtLfVGwVaZQgNNvJAQQ=
=vV6H
-----END PGP SIGNATURE-----
Merge tag 'libata-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull libata fixes from Damien Le Moal:
- Fix a sparse warning in the ahci_ceva driver (me)
- Disable the ASMedia 1092 non-functional device (Hannes)
* tag 'libata-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
libata: add horkage for ASMedia 1092
ata: ahci_ceva: Fix id array access in ceva_ahci_read_id()
Another collection of small fixes. It's still not quite calm yet,
but nothing looks scary.
ALSA core got a few fixes for covering the issues detected by fuzzer
and the 32bit compat problem of control API, while the rest are all
device-specific small fixes, including the continued fixes for Tegra.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmGyKGsOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE+56w/+ME2YljtbD2RhYzJs2X8ARvusOO3mX1Zjr/36
+7wBRi8WwRVH3jn59CnhgCDXSxTaM8GYbFje8kAyYv0Ib+f9bAvao7pgFAPFEd//
ZRZBQ0bCF18Pp4oNqbR/F6K2XyLyzQeRQPWl2z0oZq4zuWXtK59pQnXbEYV/UGx8
MMciRA4aj7qYaaQj4juBNKuxgixAyCatcOJh6t5O4dy2N9naQi0TShMF49ca8uRR
nSOq1YeEBpIOd4DVto4P6sQ7tpyfffj4qPhXGvemYnhBfwMhUVJyWxFjXXGJY2rT
KrFtuOHlS7NlScvT36GowbQdB5wgXJ7eLJg/JXVi3HBCrV4zHlp7Jn/Nbr62SYIu
h5gkgNN04Hjgel5lTvsJPiirxfxWpbVeF84HOrkrx6teOsGWZtW10Zms0YkovKmg
hR23YRNbX4qko6evBvg4lRlSlbTznOvzHKY323joebjSYp4kSJyNdqc+8fgVpK3E
Fx9DJmBSyGp/n2gkKZEhDVSgcWZyGvPkFqondCjwxqWV+jvJWSnScTjUyMeOfUCt
lFV4tlIMQ58t5u6BRaMGTenTxQ6Dqf5nOR1hwK5EPR5RQwu3chFfYDsm0C9ZKfsG
mCMe3BTvdl3W2nShwIH11B/ukieqAVZ7uugSFAarYamDfupPcwO69lPIeFAEuzjw
N+us0rg=
=g4gf
-----END PGP SIGNATURE-----
Merge tag 'sound-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Another collection of small fixes. It's still not quite calm yet, but
nothing looks scary.
ALSA core got a few fixes for covering the issues detected by fuzzer
and the 32bit compat problem of control API, while the rest are all
device-specific small fixes, including the continued fixes for Tegra"
* tag 'sound-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform
ALSA: usb-audio: Reorder snd_djm_devices[] entries
ALSA: hda/realtek: Fix quirk for TongFang PHxTxX1
ALSA: ctl: Fix copy of updated id with element read/write
ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*()
ALSA: pcm: oss: Limit the period size to 16MB
ALSA: pcm: oss: Fix negative period/buffer sizes
ASoC: codecs: wsa881x: fix return values from kcontrol put
ASoC: codecs: wcd934x: return correct value from mixer put
ASoC: codecs: wcd934x: handle channel mappping list correctly
ASoC: qdsp6: q6routing: Fix return value from msm_routing_put_audio_mixer
ASoC: SOF: Intel: Retry codec probing if it fails
ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
ASoC: rockchip: i2s_tdm: Dup static DAI template
ASoC: rt5682s: Fix crash due to out of scope stack vars
ASoC: rt5682: Fix crash due to out of scope stack vars
ASoC: tegra: Use normal system sleep for ADX
ASoC: tegra: Use normal system sleep for AMX
ASoC: tegra: Use normal system sleep for Mixer
ASoC: tegra: Use normal system sleep for MVC
...
This reverts commit 776b54e97a7d993ba23696e032426d5dea5bbe70.
Looks like a last minute edit snuck into this patch, and as a result,
it doesn't even compile. Revert the change for now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent
change_pid(PIDTYPE_PGID) that can move the task from one hlist
to another while iterating. Serialize ioprio_get to take
the tasklist_lock in this case, just like it's set counterpart.
Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}())
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In driver/md/md.c, if the function autorun_array() is called,
the problem of double free may occur.
In function autorun_array(), when the function do_md_run() returns an
error, the function do_md_stop() will be called.
The function do_md_run() called function md_run(), but in function
md_run(), the pointer mddev->private may be freed.
The function do_md_stop() called the function __md_stop(), but in
function __md_stop(), the pointer mddev->private also will be freed
without judging null.
At this time, the pointer mddev->private will be double free, so it
needs to be judged null or not.
Signed-off-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: Song Liu <songliubraving@fb.com>
The superblock of version 1.0 doesn't get moved to the new position on a
device size change. This leads to a rdev without a superblock on a known
position, the raid can't be re-assembled.
The line was removed by mistake and is re-added by this patch.
Fixes: d9c0fa509eaf ("md: fix max sectors calculation for super 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Hochholdinger <markus@hochholdinger.net>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
A delegation break could arrive as soon as we've called vfs_setlease. A
delegation break runs a callback which immediately (in
nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we
then exit nfs4_set_delegation without hashing the delegation, it will be
freed as soon as the callback is done with it, without ever being
removed from del_recall_lru.
Symptoms show up later as use-after-free or list corruption warnings,
usually in the laundromat thread.
I suspect aba2072f4523 "nfsd: grant read delegations to clients holding
writes" made this bug easier to hit, but I looked as far back as v3.0
and it looks to me it already had the same problem. So I'm not sure
where the bug was introduced; it may have been there from the beginning.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first")
has re-opened rpc_pipefs_event() race against nfsd_net_id registration
(register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76
("nfsd: fix nsfd startup race triggering BUG_ON").
Restore the order of register_pernet_subsys() vs register_cld_notifier().
Add WARN_ON() to prevent a future regression.
Crash info:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012
CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1
pc : rpc_pipefs_event+0x54/0x120 [nfsd]
lr : rpc_pipefs_event+0x48/0x120 [nfsd]
Call trace:
rpc_pipefs_event+0x54/0x120 [nfsd]
blocking_notifier_call_chain
rpc_fill_super
get_tree_keyed
rpc_fs_get_tree
vfs_get_tree
do_mount
ksys_mount
__arm64_sys_mount
el0_svc_handler
el0_svc
Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
With some specific kernel configuration and Clang, the kernel fails
to like with something like:
ld.lld: error: undefined symbol: __compiletime_assert_200
>>> referenced by arch_timer.h:156 (./arch/arm64/include/asm/arch_timer.h:156)
>>> clocksource/arm_arch_timer.o:(erratum_set_next_event_generic) in archive drivers/built-in.a
ld.lld: error: undefined symbol: __compiletime_assert_197
>>> referenced by arch_timer.h:133 (./arch/arm64/include/asm/arch_timer.h:133)
>>> clocksource/arm_arch_timer.o:(erratum_set_next_event_generic) in archive drivers/built-in.a
make: *** [Makefile:1161: vmlinux] Error 1
These are due to the BUILD_BUG() macros contained in the low-level
accessors (arch_timer_reg_{write,read}_cp15) being emitted, as the
access type wasn't known at compile time.
Fix this by making erratum_set_next_event_generic() __force_inline,
resulting in the 'access' parameter to be resolved at compile time,
similarly to what is already done for set_next_event().
Fixes: 4775bc63f880 ("Add build-time guards for unhandled register accesses")
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20211117113532.3895208-1-maz@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The driver refuses to probe with -EINVAL since the commit 5d9814df0aec
("clocksource/drivers/dw_apb_timer_of: Add error handling if no clock
available").
Before the driver used to probe successfully if either "clock-freq" or
"clock-frequency" properties has been specified in the device tree.
That commit changed
if (A && B)
panic("No clock nor clock-frequency property");
into
if (!A && !B)
return 0;
That's a bug: the reverse of `A && B` is '!A || !B', not '!A && !B'
Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
Fixes: 5d9814df0aec56a6 ("clocksource/drivers/dw_apb_timer_of: Add error handling if no clock available").
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://lore.kernel.org/r/20211109153401.157491-1-asheplyakov@basealt.ru
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Add an x86 selftest to verify that KVM doesn't WARN or otherwise explode
if userspace modifies RCX during a userspace exit to handle string I/O.
This is a regression test for a user-triggerable WARN introduced by
commit 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in").
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211025201311.1881846-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace a WARN with a comment to call out that userspace can modify RCX
during an exit to userspace to handle string I/O. KVM doesn't actually
support changing the rep count during an exit, i.e. the scenario can be
ignored, but the WARN needs to go as it's trivial to trigger from
userspace.
Cc: stable@vger.kernel.org
Fixes: 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211025201311.1881846-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the SDM:
If the logical processor is in 64-bit mode or if CR4.PCIDE = 1, an
attempt to clear CR0.PG causes a general-protection exception (#GP).
Software should transition to compatibility mode and clear CR4.PCIDE
before attempting to disable paging.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211207095230.53437-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make xhci_disable_slot() synchronous, thus ensuring it, and
xhci_free_dev() calling it return after xHC controller completes
the disable slot command.
Otherwise the roothub and xHC host may runtime suspend, and clear the
command ring while the disable slot command is being processed.
This causes a command completion mismatch as the completion event can't
be mapped to the correct command.
Command ring gets out of sync and commands time out.
Driver finally assumes host is unresponsive and bails out.
usb 2-4: USB disconnect, device number 10
xhci_hcd 0000:00:0d.0: ERROR mismatched command completion event
...
xhci_hcd 0000:00:0d.0: xHCI host controller not responding, assume dead
xhci_hcd 0000:00:0d.0: HC died; cleaning up
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20211210141735.1384209-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the xHCI is quirked with XHCI_RESET_ON_RESUME, runtime resume
routine also resets the controller.
This is bad for USB drivers without reset_resume callback, because
there's no subsequent call of usb_dev_complete() ->
usb_resume_complete() to force rebinding the driver to the device. For
instance, btusb device stops working after xHCI controller is runtime
resumed, if the controlled is quirked with XHCI_RESET_ON_RESUME.
So always take XHCI_RESET_ON_RESUME into account to solve the issue.
Cc: <stable@vger.kernel.org>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20211210141735.1384209-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- set ana_log_size to 0 after freeing ana_log_buf (Hou Tao)
- show subsys nqn for duplicate cntlids (Keith Busch)
- disable namespace access for unsupported metadata (Keith Busch)
- report write pointer for a full zone as zone start + zone len
(Niklas Cassel)
- fix use after free when disconnecting a reconnecting ctrl
(Ruozhu Li)
- fix a list corruption in nvmet-tcp (Sagi Grimberg)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmGy8DMLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYP/AxAAwEmPSfDC6KvW0pOFglP14z73WakBZUs8OF7SosCr
+TEoCGtT8eeNvWAUnh2Ja59sSc1cUKAVeJL/DLj1jEap4RjrGtX6uVma2Hv52Inb
1YtSSQFW+dea5qq6aUIWJk8X9PSYFT3VW7sjOb11lrp5M38E3xckmzi1QWI9iX4P
qgVw86wdaOrhFO/sX9H5wlQEm7+ps7HXOjLiZOPrFVoggVca8BPpEyG1YRZPAQvr
NCnSgv6ciUY4zDfCQUYkT4vXEfEm7Y8Y9eDjOPlCaRuloHHYI6yhHgMgzvJgn2p7
WkFyai2y8RkSAGtsAtt93bi7mPpM6Zx3t3xXV8yIht/+uyT1eZGCy7feFIjeFEnT
GXtwRVCkHkhdsIuPR/GV5NAmtGb0sqaLdiMmOw3OCxfWBeqv4KizLROhZmTqNLP5
V50PXj4aJzyGQrqTbsrcyqfvPDm16HHM9EQwjC/YsaUqjpRHxFHuKrQ3iYb3+BdE
zx6gRlx5eMIvBAbdxCH5409XWiVcyBNuCw1zjCdeKT0PNQwWAP+HRzH0HVbaPdse
EnfVKy4r6VtlmFBWqKSKagVQMrULoohaAOmzVrEsqfDBw4OU+LKB3sw2iGlh4kFj
YW2N+Ey0CnYdkgbqiJ2Z7ahbzNSLcoykI8Gij4omrhT3yIgNCjwtNiokZ6gKOpO0
coA=
=Q+lW
-----END PGP SIGNATURE-----
Merge tag 'nvme-5.16-2021-12-10' of git://git.infradead.org/nvme into block-5.16
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.16
- set ana_log_size to 0 after freeing ana_log_buf (Hou Tao)
- show subsys nqn for duplicate cntlids (Keith Busch)
- disable namespace access for unsupported metadata (Keith Busch)
- report write pointer for a full zone as zone start + zone len
(Niklas Cassel)
- fix use after free when disconnecting a reconnecting ctrl
(Ruozhu Li)
- fix a list corruption in nvmet-tcp (Sagi Grimberg)"
* tag 'nvme-5.16-2021-12-10' of git://git.infradead.org/nvme:
nvmet-tcp: fix possible list corruption for unexpected command failure
nvme: fix use after free when disconnecting a reconnecting ctrl
nvme-multipath: set ana_log_size to 0 after free ana_log_buf
nvme: report write pointer for a full zone as zone start + zone len
nvme: disable namespace access for unsupported metadata
nvme: show subsys nqn for duplicate cntlids
This was found by coccicheck:
./drivers/irqchip/irq-bcm7120-l2.c,328,1-7,ERROR missing put_device;
call of_find_device_by_node on line 234, but without a corresponding
object release within this function.
./drivers/irqchip/irq-bcm7120-l2.c,341,1-7,ERROR missing put_device;
call of_find_device_by_node on line 234, but without a corresponding
object release within this function.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109055958.130287-1-ye.guojin@zte.com.cn
AMD proceessors define an address range that is reserved by HyperTransport
and causes a failure if used for guest physical addresses. Avoid
selftests failures by reserving those guest physical addresses; the
rules are:
- On parts with <40 bits, its fully hidden from software.
- Before Fam17h, it was always 12G just below 1T, even if there was more
RAM above this location. In this case we just not use any RAM above 1T.
- On Fam17h and later, it is variable based on SME, and is either just
below 2^48 (no encryption) or 2^43 (encryption).
Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210805105423.412878-1-pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not bail early if there are no bits set in the sparse banks for a
non-sparse, a.k.a. "all CPUs", IPI request. Per the Hyper-V spec, it is
legal to have a variable length of '0', e.g. VP_SET's BankContents in
this case, if the request can be serviced without the extra info.
It is possible that for a given invocation of a hypercall that does
accept variable sized input headers that all the header input fits
entirely within the fixed size header. In such cases the variable sized
input header is zero-sized and the corresponding bits in the hypercall
input should be set to zero.
Bailing early results in KVM failing to send IPIs to all CPUs as expected
by the guest.
Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211207220926.718794-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prior to commit 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use
tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
and KVM_REQ_TLB_FLUSH is/was defined as:
(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
"This call guarantees that by the time control returns back to the
caller, the observable effects of all flushes on the specified virtual
processors have occurred." and without KVM_REQUEST_WAIT there's a small
chance that the vCPU making the TLB flush will resume running before
all IPIs get delivered to other vCPUs and a stale mapping can get read
there.
Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
kvm_hv_flush_tlb() is the sole caller which uses it for
kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
KVM_REQUEST_WAIT makes a difference.
Cc: stable@kernel.org
Fixes: 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
a fix for GT initialization when GuC/HuC are used on ICL.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmGyVUEACgkQ+mJfZA7r
E8oOHgf/SY3qplngsYcXMtcW4LQKu2/FyhJmpNEZSx8jS8AHHMaRXKR3YEoD1Xf7
FyNgzIxtWIEMEgirzcwgzaF8opKa/q4CCXcC8aaeftFS/QQPMCEuCtr2+ljcvsQg
6MmE9UDZJ+GYpBFbhizP4/0b9WoQQM7vkQu0OoAL75CJX7kk5/I9ACQ5XEvlryrS
1TqLkpZULgw1ySCA0S3rpX+UUyg+JSG5Srl9ynPcv3roERxF+Uezd9ZwjxwdSZfZ
aU5YaGnLOqjjS56CQe658TxLPOpOr0owvls9J68kgplsihKpz6X0GEwZMT/eywYl
IY6BVjD79lt9HhcKyO2Zus3w6Recbw==
=83mg
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-fixes-2021-12-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
A fix to a error pointer dereference in gem_execbuffer and
a fix for GT initialization when GuC/HuC are used on ICL.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YbJVWYAd/jeERCYY@intel.com
a ttm_bo_swapout eviction check.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYbH5sAAKCRDj7w1vZxhR
xSUIAP9JRmE9I/mKjmSQ6HVmVz3Kn/Nso+aiu3Llske0WEm24wEAg9Taq9eJL60s
YSu/IKHKqQyhv6SyG5jVU8QRZ4wLBwo=
=RSSQ
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2021-12-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A fix in syncobj to handle fence already signalled better, and a fix for
a ttm_bo_swapout eviction check.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211209124305.gxhid5zwf7m4oasn@houat
Quite a few small bug fixes old and new, also Doug Ledford is retiring
now, we thank him for his work. Details:
- Use after free in rxe
- mlx5 DM regression
- hns bugs triggred by device reset
- Two fixes for CONFIG_DEBUG_PREEMPT
- Several longstanding corner case bugs in hfi1
- Two irdma data path bugs in rare cases and some memory issues
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmGybk8ACgkQOG33FX4g
mxpWGg/9EQCflfYA3ZThFbnOYZpuohldBUWYK/b8G+QAuMbObjaT/PnwWdNts+UJ
7xsn5LHkt2yuJcY5r8m5o4DqEUPAu7LCHLT8tq2FR//uMTRLkgxtBaehzrfTS58M
P+4Sy096ByecqzvV8SczZ+dNhN4vKvPv+L9em4453gfWu3GS6+Ny8uBaMcjX5KOT
VB1s6fLSV3gG+ItmSx60NH/YnS4kFcEKY+nMVJ6N5LsP6Z2LWK9GkmL4MvZaljc5
dLGJWpDqk6J4gGYFppKEY7g7NTlzc8zaa52KdkdxcztWw/j4OTS3We9OJZtvC7JB
k4EpuFu6L7ypf2TMXmre5xPNUumz/ZXQq1HjmA96LaTrwhHZ85qpyUm+Oz9/0cKt
0TmF2PnPIXilOH1RCwFzOpx/ArEN2wyPP89um+EahS5Fi2jd0V57l2Br5D7h2HLs
Vy82jgKpNQ56Oa/GAm6oporDrL3OzfCWD9+1Nx5ZPzhKxBONeAKLAMTRvVNEYcHK
kiz+l5ZcHs0KQL5RSxkQHX/zbSLwkgUCeLgEOHaAJq2KUZojGPMwkFvac4LsFhNv
SPI9lAw27CtHrM7zpeDFwNvLgHYHBemjS2p97RbXC2XuqL37j7ynzOvlaLTFfS3p
R6MjEIryBMBwRIaLDaVevjYug6rg/n+SrBN6hf56Ke+oNDTeFSE=
=XgH0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Quite a few small bug fixes old and new, also Doug Ledford is retiring
now, we thank him for his work. Details:
- Use after free in rxe
- mlx5 DM regression
- hns bugs triggred by device reset
- Two fixes for CONFIG_DEBUG_PREEMPT
- Several longstanding corner case bugs in hfi1
- Two irdma data path bugs in rare cases and some memory issues"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
RDMA/irdma: Report correct WC errors
RDMA/irdma: Fix a potential memory allocation issue in 'irdma_prm_add_pble_mem()'
RDMA/irdma: Fix a user-after-free in add_pble_prm
IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr
IB/hfi1: Fix early init panic
IB/hfi1: Insure use of smp_processor_id() is preempt disabled
IB/hfi1: Correct guard on eager buffer deallocation
RDMA/rtrs: Call {get,put}_cpu_ptr to silence a debug kernel warning
RDMA/hns: Do not destroy QP resources in the hw resetting phase
RDMA/hns: Do not halt commands during reset until later
Remove Doug Ledford from MAINTAINERS
RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow
RDMA: Fix use-after-free in rxe_queue_cleanup
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Current release - regressions:
- bpf, sockmap: re-evaluate proto ops when psock is removed from sockmap
Current release - new code bugs:
- bpf: fix bpf_check_mod_kfunc_call for built-in modules
- ice: fixes for TC classifier offloads
- vrf: don't run conntrack on vrf with !dflt qdisc
Previous releases - regressions:
- bpf: fix the off-by-two error in range markings
- seg6: fix the iif in the IPv6 socket control block
- devlink: fix netns refcount leak in devlink_nl_cmd_reload()
- dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
- dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
Previous releases - always broken:
- ethtool: do not perform operations on net devices being unregistered
- udp: use datalen to cap max gso segments
- ice: fix races in stats collection
- fec: only clear interrupt of handling queue in fec_enet_rx_queue()
- m_can: pci: fix incorrect reference clock rate
- m_can: disable and ignore ELO interrupt
- mvpp2: fix XDP rx queues registering
Misc:
- treewide: add missing includes masked by cgroup -> bpf.h dependency
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGyN1AACgkQMUZtbf5S
IrtgMA/8D0qk3c75ts0hCzGXwdNdEBs+e7u1bJVPqdyU8x/ZLAp2c0EKB/7IWuxA
CtsnbanPcmibqvQJDI1hZEBdafi43BmF5VuFSIxYC4EM/1vLoRprurXlIwL2YWki
aWi//tyOIGBl6/ClzJ9Vm51HTJQwDmdv8GRnKAbsC1eOTM3pmmcg+6TLbDhycFEQ
F9kkDCvyB9kWIH645QyJRH+Y5qQOvneCyQCPkkyjTgEADzV5i7YgtRol6J3QIbw3
umPHSckCBTjMacYcCLsbhQaF2gTMgPV1basNLPMjCquJVrItE0ZaeX3MiD6nBFae
yY5+Wt5KAZDzjERhneX8AINHoRPA/tNIahC1+ytTmsTA8Hj230FHE5hH1ajWiJ9+
GSTBCBqjtZXce3r2Efxfzy0Kb9JwL3vDi7LS2eKQLv0zBLfYp2ry9Sp9qe4NhPkb
OYrxws9kl5GOPvrFB5BWI9XBINciC9yC3PjIsz1noi0vD8/Hi9dPwXeAYh36fXU3
rwRg9uAt6tvFCpwbuQ9T2rsMST0miur2cDYd8qkJtuJ7zFvc+suMXwBZyI29nF2D
uyymIC2XStHJfAjUkFsGVUSXF5FhML9OQsqmisdQ8KdH26jMnDeMjIWJM7UWK+zY
E/fqWT8UyS3mXWqaggid4ZbotipCwA0gxiDHuqqUGTM+dbKrzmk=
=F6rS
-----END PGP SIGNATURE-----
Merge tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.
Current release - regressions:
- bpf, sockmap: re-evaluate proto ops when psock is removed from
sockmap
Current release - new code bugs:
- bpf: fix bpf_check_mod_kfunc_call for built-in modules
- ice: fixes for TC classifier offloads
- vrf: don't run conntrack on vrf with !dflt qdisc
Previous releases - regressions:
- bpf: fix the off-by-two error in range markings
- seg6: fix the iif in the IPv6 socket control block
- devlink: fix netns refcount leak in devlink_nl_cmd_reload()
- dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
- dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
Previous releases - always broken:
- ethtool: do not perform operations on net devices being
unregistered
- udp: use datalen to cap max gso segments
- ice: fix races in stats collection
- fec: only clear interrupt of handling queue in fec_enet_rx_queue()
- m_can: pci: fix incorrect reference clock rate
- m_can: disable and ignore ELO interrupt
- mvpp2: fix XDP rx queues registering
Misc:
- treewide: add missing includes masked by cgroup -> bpf.h
dependency"
* tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
net: wwan: iosm: fixes unable to send AT command during mbim tx
net: wwan: iosm: fixes net interface nonfunctional after fw flash
net: wwan: iosm: fixes unnecessary doorbell send
net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
MAINTAINERS: s390/net: remove myself as maintainer
net/sched: fq_pie: prevent dismantle issue
net: mana: Fix memory leak in mana_hwc_create_wq
seg6: fix the iif in the IPv6 socket control block
nfp: Fix memory leak in nfp_cpp_area_cache_add()
nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
nfc: fix segfault in nfc_genl_dump_devices_done
udp: using datalen to cap max gso segments
net: dsa: mv88e6xxx: error handling for serdes_power functions
can: kvaser_usb: get CAN clock frequency from device
can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
net: mvpp2: fix XDP rx queues registering
vmxnet3: fix minimum vectors alloc issue
net, neigh: clear whole pneigh_entry at alloc time
net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
...
Pull HID fixes from Jiri Kosina:
- fixes for various drivers which assume that a HID device is on USB
transport, but that might not necessarily be the case, as the device
can be faked by uhid. (Greg, Benjamin Tissoires)
- fix for spurious wakeups on certain Lenovo notebooks (Thomas
Weißschuh)
- a few other device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: Ignore battery for Elan touchscreen on Asus UX550VE
HID: intel-ish-hid: ipc: only enable IRQ wakeup when requested
HID: google: add eel USB id
HID: add USB_HID dependancy to hid-prodikeys
HID: add USB_HID dependancy to hid-chicony
HID: bigbenff: prevent null pointer dereference
HID: sony: fix error path in probe
HID: add USB_HID dependancy on some USB HID drivers
HID: check for valid USB device for many HID drivers
HID: wacom: fix problems when device is not a valid USB device
HID: add hid_is_usb() function to make it simpler for USB detection
HID: quirks: Add quirk for the Microsoft Surface 3 type-cover
This reverts commit cefdd52fa0455c0555c30927386ee466a108b060.
On sc7180-trogdor class devices with 'fw_devlink=permissive' and KASAN
enabled, you'll see a Use-After-Free reported at bootup.
The root of the problem is that dwc3_qcom_of_register_core() is adding
a devm-allocated "tx-fifo-resize" property to its device tree node
using of_add_property().
The issue is that of_add_property() makes a _permanent_ addition to
the device tree that lasts until reboot. That means allocating memory
for the property using "devm" managed memory is a terrible idea since
that memory will be freed upon probe deferral or device unbinding.
Let's revert the patch since the system is still functional without
it. The fact that of_add_property() makes a permanent change is extra
fodder for those folks who were aruging that the device tree isn't
really the right way to pass information between parts of the
driver. It is an exercise left to the reader to submit a patch
re-adding the new feature in a way that makes everyone happier.
Fixes: cefdd52fa045 ("usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default")
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20211207094327.1.Ie3cde3443039342e2963262a4c3ac36dc2c08b30@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should defer eventfd_signal() to the workqueue when
eventfd_signal_allowed() return false rather than return
true.
Fixes: b542e383d8c0 ("eventfd: Make signal recursion protection a task bit")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
signalfd_poll() and binder_poll() are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case. This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution. This solution is for the queue to be cleared
before it is freed, by sending a POLLFREE notification to all waiters.
Unfortunately, only eventpoll handles POLLFREE. A second type of
non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
handle POLLFREE. This allows a use-after-free to occur if a signalfd or
binder fd is polled with aio poll, and the waitqueue gets freed.
Fix this by making aio poll handle POLLFREE.
A patch by Ramji Jiyani <ramjiyani@google.com>
(https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
tried to do this by making aio_poll_wake() always complete the request
inline if POLLFREE is seen. However, that solution had two bugs.
First, it introduced a deadlock, as it unconditionally locked the aio
context while holding the waitqueue lock, which inverts the normal
locking order. Second, it didn't consider that POLLFREE notifications
are missed while the request has been temporarily de-queued.
The second problem was solved by my previous patch. This patch then
properly fixes the use-after-free by handling POLLFREE in a
deadlock-free way. It does this by taking advantage of the fact that
freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.
Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Currently, aio_poll_wake() will always remove the poll request from the
waitqueue. Then, if aio_poll_complete_work() sees that none of the
polled events are ready and the request isn't cancelled, it re-adds the
request to the waitqueue. (This can easily happen when polling a file
that doesn't pass an event mask when waking up its waitqueue.)
This is fundamentally broken for two reasons:
1. If a wakeup occurs between vfs_poll() and the request being
re-added to the waitqueue, it will be missed because the request
wasn't on the waitqueue at the time. Therefore, IOCB_CMD_POLL
might never complete even if the polled file is ready.
2. When the request isn't on the waitqueue, there is no way to be
notified that the waitqueue is being freed (which happens when its
lifetime is shorter than the struct file's). This is supposed to
happen via the waitqueue entries being woken up with POLLFREE.
Therefore, leave the requests on the waitqueue until they are actually
completed (or cancelled). To keep track of when aio_poll_complete_work
needs to be scheduled, use new fields in struct poll_iocb. Remove the
'done' field which is now redundant.
Note that this is consistent with how sys_poll() and eventpoll work;
their wakeup functions do *not* remove the waitqueue entries.
Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up
all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll
and aio poll are fortunately not affected by this, but it's very
fragile. Thus, the new function wake_up_pollfree() has been introduced.
Convert signalfd to use wake_up_pollfree().
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: d80e731ecab4 ("epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up
all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll
and aio poll are fortunately not affected by this, but it's very
fragile. Thus, the new function wake_up_pollfree() has been introduced.
Convert binder to use wake_up_pollfree().
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: f5cb779ba163 ("ANDROID: binder: remove waitqueue when thread exits.")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Several ->poll() implementations are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case. This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution. This solution is for the queue to be cleared
before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'.
However, that has a bug: wake_up_poll() calls __wake_up() with
nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters,
and the wakeup function for the first one returns a positive value, only
that one will be called. That's *not* what's needed for POLLFREE;
POLLFREE is special in that it really needs to wake up everyone.
Considering the three non-blocking poll systems:
- io_uring poll doesn't handle POLLFREE at all, so it is broken anyway.
- aio poll is unaffected, since it doesn't support exclusive waits.
However, that's fragile, as someone could add this feature later.
- epoll doesn't appear to be broken by this, since its wakeup function
returns 0 when it sees POLLFREE. But this is fragile.
Although there is a workaround (see epoll), it's better to define a
function which always sends POLLFREE to all waiters. Add such a
function. Also make it verify that the queue really becomes empty after
all waiters have been woken up.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmGvhdUACgkQ+7dXa6fL
C2t41A/+LETrO+UgaAOhikx5XIZbyZTuiiKWg4i63mBhhhn2593h3z0UjDx9XMEi
XRPTHHhZ0u+uyWN8r2CkdfkrGx3AK/YBuR7s/N/7yQsXfiv1NrqFxU14gd6EK0YC
hc7nBxLG/EsN913Jc+ttUjw2Lnnv+7A1JyFc4Rv8aS8SXcRPH5ADy7I8SCFet+nC
JKoVapkTxLX57X+tiCsf+wnuY7b+mRJ7L5uR/f5uweQr2evQ37PCanenoxPi1wCM
tvkHrmQ73Je4nPY0tuXgtQzMaozuyEEAdhCjgiyOR9EoeCldbVpzY+AWZkx2qog6
z0oU1hsWYEj/hFc3r4B7MxYziwUpwJHYssiiwjXm8lVUh2BJA9U97dnc3hQfJa8z
Upg2nCv+/ObpMiX4l9HKDl6WHvMec/VpYPwbsSC5nMysWZrZos6rales9Ougb/eJ
cl2vs8YLNm87rhHsFWCp5x2LkFRp9SlJkUWVybdl0M6+SWnB38ckqey3m+xhb3DQ
brCCa+UyG2d6PGkRthl+iF7Zx86blZ4Rm8qD2rec5K1dNiVP3HunsRO5yq4mXuEK
QM9m0C/hF8k/v7gD0kLI9h7fEiO7JF7+KelxUVoq57P8hcsnjsHQcfMapn+yP4o0
6Qda9FnLTXnNj/s9ChMAc9mFwdjSzjbqTfiqNK5BKdypjM/2YEc=
=w/ll
-----END PGP SIGNATURE-----
Merge tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfslib fixes from David Howells:
- Fix a lockdep warning and potential deadlock. This is takes the
simple approach of offloading the write-to-cache done from within a
network filesystem read to a worker thread to avoid taking the
sb_writer lock from the cache backing filesystem whilst holding the
mmap lock on an inode from the network filesystem.
Jan Kara posits a scenario whereby this can cause deadlock[1], though
it's quite complex and I think requires someone in userspace to
actually do I/O on the cache files. Matthew Wilcox isn't so certain,
though[2].
An alternative way to fix this, suggested by Darrick Wong, might be
to allow cachefiles to prevent userspace from performing I/O upon the
file - something like an exclusive open - but that's beyond the scope
of a fix here if we do want to make such a facility in the future.
- In some of the error handling paths where netfs_ops->cleanup() is
called, the arguments are transposed[3]. gcc doesn't complain because
one of the parameters is void* and one of the values is void*.
Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
Link: https://lore.kernel.org/r/Ya9eDiFCE2fO7K/S@casper.infradead.org/ [2]
Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ [3]
* tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
netfs: fix parameter of cleanup()
netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock
There's error paths in __create_synth_event() after the argv is allocated
that fail to free it. Add a jump to free it when necessary.
Link: https://lkml.kernel.org/r/20211209024317.11783-1-linmq006@gmail.com
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
[ Fixed up the patch and change log ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Adding ftrace-direct-multi-modify.ko kernel module that uses
modify_ftrace_direct_multi API. The core functionality is taken
from ftrace-direct-modify.ko kernel module and changed to fit
multi direct interface.
The init function creates kthread that periodically calls
modify_ftrace_direct_multi to change the trampoline address
for the direct ftrace_ops. The ftrace trace_pipe then shows
trace from both trampolines.
Link: https://lkml.kernel.org/r/20211206182032.87248-4-jolsa@kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
For whatever reason, some devices like QCA6390, WCN6855 using ath11k
are not in M3 state during PM resume, but still functional. The
mhi_pm_resume should then not fail in those cases, and let the higher
level device specific stack continue resuming process.
Add an API mhi_pm_resume_force(), to force resuming irrespective of the
current MHI state. This fixes a regression with non functional ath11k WiFi
after suspend/resume cycle on some machines.
Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=214179
Link: https://lore.kernel.org/regressions/871r5p0x2u.fsf@codeaurora.org/
Fixes: 020d3b26c07a ("bus: mhi: Early MHI resume failure in non M3 state")
Cc: stable@vger.kernel.org #5.13
Reported-by: Kalle Valo <kvalo@codeaurora.org>
Reported-by: Pengyu Ma <mapengyu@gmail.com>
Tested-by: Kalle Valo <kvalo@kernel.org>
Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
[mani: Switched to API, added bug report, reported-by tags and CCed stable]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20211209131633.4168-1-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
INTERCEPT_x are bit positions, but the code was using the raw value of
INTERCEPT_VINTR (4) instead of BIT(INTERCEPT_VINTR).
This resulted in masking of bit 2 - that is, SMI instead of VINTR.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <49b9571d25588870db5380b0be1a41df4bbaaf93.1638486479.git.maciej.szmigiero@oracle.com>
Clean up remaining headers that are specific to liblockdep but lived in
the shared header directory. These are all unused after the liblockdep
code was removed in commit 7246f4dcaccc ("tools/lib/lockdep: drop
liblockdep").
Note that there are still headers that were originally created for
liblockdep, that still have liblockdep references, but they are used by
other tools/ code at this point.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>