949458 Commits

Author SHA1 Message Date
Christoph Hellwig
2eb81a3364 nvme: rename and document nvme_end_request
nvme_end_request is a bit misnamed, as it wraps around the
blk_mq_complete_* API.  It's semantics also are non-trivial, so give it
a more descriptive name and add a comment explaining the semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:28 -06:00
Keith Busch
c41ad98beb nvme: skip noiob for zoned devices
Zoned block devices reuse the chunk_sectors queue limit to define zone
boundaries. If a such a device happens to also report an optimal
boundary, do not use that to define the chunk_sectors as that may
intermittently interfere with io splitting and zone size queries.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Christoph Hellwig
c61b82c7b7 nvme-pci: fix PRP pool size
All operations are based on the controller, not the host page size.
Switch the dma pool to use the controller page size as well to avoid
massive overallocations on large page size systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
John Garry
7442ddcedc nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth
Recently nvme_dev.q_depth was changed from an int to u16 type.

This falls over for the queue depth calculation in nvme_pci_enable(),
where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow as a u16, as
NVME_CAP_MQES() is a 16b number also. That happens for me, and this is the
result:

root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
dereference at virtual address 0000000000000010
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
[0000000000000010] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in: nvme nvme_core
CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
5.8.0-next-20200812 #27
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
V1.16.01 03/15/2019
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
pc : __sg_alloc_table_from_pages+0xec/0x238
lr : __sg_alloc_table_from_pages+0xc8/0x238
sp : ffff800013ccbad0
x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
x27: 0000000000000000 x26: 0000000000002dc2
x25: 0000000000000dc0 x24: 0000000000000000
x23: 0000000000000000 x22: ffff800013ccbbe8
x21: 0000000000000010 x20: 0000000000000000
x19: 00000000fffff000 x18: ffffffffffffffff
x17: 00000000000000c0 x16: fffffe289eaf6380
x15: ffff800011b59948 x14: ffff002bc8fe98f8
x13: ff00000000000000 x12: ffff8000114ca000
x11: 0000000000000000 x10: ffffffffffffffff
x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0a27b5f9b680 x4 : 0000000000000000
x3 : ffff0a27b5f9b680 x2 : 0000000000000000
 x1 : 0000000000000001 x0 : 0000000000000000
 Call trace:
__sg_alloc_table_from_pages+0xec/0x238
sg_alloc_table_from_pages+0x18/0x28
iommu_dma_alloc+0x474/0x678
dma_alloc_attrs+0xd8/0xf0
nvme_alloc_queue+0x114/0x160 [nvme]
nvme_reset_work+0xb34/0x14b4 [nvme]
process_one_work+0x1e8/0x360
worker_thread+0x44/0x478
kthread+0x150/0x158
ret_from_fork+0x10/0x34
 Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
---[ end trace 89bb2b72d59bf925 ]---

Fix by making onto a u32.

Also use u32 for nvme_dev.q_depth, as we assign this value from
nvme_dev.q_depth, and nvme_dev.q_depth will possibly hold 65536 - this
avoids the same crash as above.

Fixes: 61f3b8963097 ("nvme-pci: use unsigned for io queue depth")
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Logan Gunthorpe
ecbcdf0c81 nvme: Use spin_lock_irq() when taking the ctrl->lock
When locking the ctrl->lock spinlock IRQs need to be disabled to avoid a
dead lock. The new spin_lock() calls recently added produce the
following lockdep warning when running the blktest nvme/003:

    ================================
    WARNING: inconsistent lock state
    --------------------------------
    inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    ksoftirqd/2/22 [HC0[0]:SC1[1]:HE0:SE0] takes:
    ffff888276a8c4c0 (&ctrl->lock){+.?.}-{2:2}, at: nvme_keep_alive_end_io+0x50/0xc0
    {SOFTIRQ-ON-W} state was registered at:
      lock_acquire+0x164/0x500
      _raw_spin_lock+0x28/0x40
      nvme_get_effects_log+0x37/0x1c0
      nvme_init_identify+0x9e4/0x14f0
      nvme_reset_work+0xadd/0x2360
      process_one_work+0x66b/0xb70
      worker_thread+0x6e/0x6c0
      kthread+0x1e7/0x210
      ret_from_fork+0x22/0x30
    irq event stamp: 1449221
    hardirqs last  enabled at (1449220): [<ffffffff81c58e69>] ktime_get+0xf9/0x140
    hardirqs last disabled at (1449221): [<ffffffff83129665>] _raw_spin_lock_irqsave+0x25/0x60
    softirqs last  enabled at (1449210): [<ffffffff83400447>] __do_softirq+0x447/0x595
    softirqs last disabled at (1449215): [<ffffffff81b489b5>] run_ksoftirqd+0x35/0x50

    other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&ctrl->lock);
      <Interrupt>
        lock(&ctrl->lock);

     *** DEADLOCK ***

    no locks held by ksoftirqd/2/22.

    stack backtrace:
    CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 5.8.0-rc4-eid-vmlocalyes-dbg-00157-g7236657c6b3a #1450
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
    Call Trace:
     dump_stack+0xc8/0x11a
     print_usage_bug.cold.63+0x235/0x23e
     mark_lock+0xa9c/0xcf0
     __lock_acquire+0xd9a/0x2b50
     lock_acquire+0x164/0x500
     _raw_spin_lock_irqsave+0x40/0x60
     nvme_keep_alive_end_io+0x50/0xc0
     blk_mq_end_request+0x158/0x210
     nvme_complete_rq+0x146/0x500
     nvme_loop_complete_rq+0x26/0x30 [nvme_loop]
     blk_done_softirq+0x187/0x1e0
     __do_softirq+0x118/0x595
     run_ksoftirqd+0x35/0x50
     smpboot_thread_fn+0x1d3/0x310
     kthread+0x1e7/0x210
     ret_from_fork+0x22/0x30

Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Chaitanya Kulkarni
7ee51cf60a nvmet: call blk_mq_free_request() directly
Instead of calling blk_put_request() which calls blk_mq_free_request(),
call blk_mq_free_request() directly for NVMeOF passthru. This is to
mainly avoid an extra function call in the completion path
nvmet_passthru_req_done().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Chaitanya Kulkarni
a2138fd494 nvmet: fix oops in pt cmd execution
In the existing NVMeOF Passthru core command handling on failure of
nvme_alloc_request() it errors out with rq value set to NULL. In the
error handling path it calls blk_put_request() without checking if
rq is set to NULL or not which produces following Oops:-

[ 1457.346861] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1457.347838] #PF: supervisor read access in kernel mode
[ 1457.348464] #PF: error_code(0x0000) - not-present page
[ 1457.349085] PGD 0 P4D 0
[ 1457.349402] Oops: 0000 [#1] SMP NOPTI
[ 1457.349851] CPU: 18 PID: 10782 Comm: kworker/18:2 Tainted: G           OE     5.8.0-rc4nvme-5.9+ #35
[ 1457.350951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
[ 1457.352347] Workqueue: events nvme_loop_execute_work [nvme_loop]
[ 1457.353062] RIP: 0010:blk_mq_free_request+0xe/0x110
[ 1457.353651] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8
[ 1457.355975] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282
[ 1457.356636] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[ 1457.357526] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000
[ 1457.358416] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000
[ 1457.359317] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000
[ 1457.360424] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8
[ 1457.361322] FS:  0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000
[ 1457.362337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1457.363058] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0
[ 1457.363973] Call Trace:
[ 1457.364296]  nvmet_passthru_execute_cmd+0x150/0x2c0 [nvmet]
[ 1457.364990]  process_one_work+0x24e/0x5a0
[ 1457.365493]  ? __schedule+0x353/0x840
[ 1457.365957]  worker_thread+0x3c/0x380
[ 1457.366426]  ? process_one_work+0x5a0/0x5a0
[ 1457.366948]  kthread+0x135/0x150
[ 1457.367362]  ? kthread_create_on_node+0x60/0x60
[ 1457.367934]  ret_from_fork+0x22/0x30
[ 1457.368388] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corer
[ 1457.368414]  ata_piix crc32c_intel virtio_pci libata virtio_ring serio_raw t10_pi virtio floppy dm_]
[ 1457.380849] CR2: 0000000000000000
[ 1457.381288] ---[ end trace c6cab61bfd1f68fd ]---
[ 1457.381861] RIP: 0010:blk_mq_free_request+0xe/0x110
[ 1457.382469] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8
[ 1457.384749] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282
[ 1457.385393] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[ 1457.386264] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000
[ 1457.387142] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000
[ 1457.388029] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000
[ 1457.388914] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8
[ 1457.389798] FS:  0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000
[ 1457.390796] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1457.391508] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0
[ 1457.392525] Kernel panic - not syncing: Fatal exception
[ 1457.394138] Kernel Offset: disabled
[ 1457.394677] ---[ end Kernel panic - not syncing: Fatal exception ]---

We fix this Oops by adding a new goto label out_put_req and reordering
the blk_put_request call to avoid calling blk_put_request() with rq
value is set to NULL. Here we also update the rest of the code
accordingly.

Fixes: 06b7164dfdc0 ("nvmet: add passthru code to process commands")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Chaitanya Kulkarni
4db69a3d7c nvmet: add ns tear down label for pt-cmd handling
In the current implementation before submitting the passthru cmd we
may come across error e.g. getting ns from passthru controller,
allocating a request from passthru controller, etc. For all the failure
cases it only uses single goto label fail_out.

In the target code, we follow the pattern to have a separate label for
each error out the case when setting up multiple things before the actual
action.

This patch follows the same pattern and renames generic fail_out label
to out_put_ns and updates the error out cases in the
nvmet_passthru_execute_cmd() where it is needed.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Martin Wilck
e398863b75 nvme: multipath: round-robin: eliminate "fallback" variable
If we find an optimized path, we quit the loop immediately. Thus we can use
just one variable for the next path, slighly simplifying the code.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Martin Wilck
93eb0381e1 nvme: multipath: round-robin: fix single non-optimized path case
If there's only one usable, non-optimized path, nvme_round_robin_path()
returns NULL, which is wrong. Fix it by falling back to "old", like in
the single optimized path case. Also, if the active path isn't changed,
there's no need to re-assign the pointer.

Fixes: 3f6e3246db0e ("nvme-multipath: fix logic for non-optimized paths")
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin George <marting@netapp.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Tianjia Zhang
f34448cd0d nvme-fc: Fix wrong return value in __nvme_fc_init_request()
On an error exit path, a negative error code should be returned
instead of a positive return value.

Fixes: e399441de9115 ("nvme-fabrics: Add host support for FC transport")
Cc: James Smart <jsmart2021@gmail.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Logan Gunthorpe
0ceeab96ba nvmet-passthru: Reject commands with non-sgl flags set
Any command with a non-SGL flag set (like fuse flags) should be
rejected.

Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Sagi Grimberg
382fee1a8b nvmet: fix a memory leak
We forgot to free new_model_number

Fixes: 013b7ebe5a0d ("nvmet: make ctrl model configurable")
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Yufen Yu
27029b4b18 blkcg: fix memleak for iolatency
Normally, blkcg_iolatency_exit() will free related memory in iolatency
when cleanup queue. But if blk_throtl_init() return error and queue init
fail, blkcg_iolatency_exit() will not do that for us. Then it cause
memory leak.

Fixes: d70675121546 ("block: introduce blk-iolatency io controller")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:14:27 -06:00
Geert Uytterhoeven
0c8b9c3540 MAINTAINERS: Add missing header files to BLOCK LAYER section
The various <linux/blk*.h> header files are part of the Block Layer.
Add them to the corresponding section in the MAINTAINERS file, so
scripts/get_maintainer.pl will pick them up.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:09:29 -06:00
Keith Busch
e4b469c66f block: fix get_max_io_size()
A previous commit aligning splits to physical block sizes inadvertently
modified one return case such that that it now returns 0 length splits
when the number of sectors doesn't exceed the physical offset. This
later hits a BUG in bio_split(). Restore the previous working behavior.

Fixes: 9cc5169cd478b ("block: Improve physical block alignment of split bios")
Reported-by: Eric Deal <eric.deal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:09:22 -06:00
Ming Lei
db03f88fae blk-mq: insert request not through ->queue_rq into sw/scheduler queue
c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed
to add request which has been through ->queue_rq() to the hw queue dispatch
list, however it adds request running out of budget or driver tag to hw queue
too. This way basically bypasses request merge, and causes too many request
dispatched to LLD, and system% is unnecessary increased.

Fixes this issue by adding request not through ->queue_rq into sw/scheduler
queue, and this way is safe because no ->queue_rq is called on this request
yet.

High %system can be observed on Azure storvsc device, and even soft lock
is observed. This patch reduces %system during heavy sequential IO,
meantime decreases soft lockup risk.

Fixes: c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:09:22 -06:00
Nathan Chancellor
17bc10300c block/rnbd: Ensure err is always initialized in process_rdma
Clang warns:

drivers/block/rnbd/rnbd-srv.c:150:6: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
        if (IS_ERR(bio)) {
            ^~~~~~~~~~~
drivers/block/rnbd/rnbd-srv.c:177:9: note: uninitialized use occurs here
        return err;
               ^~~
drivers/block/rnbd/rnbd-srv.c:150:2: note: remove the 'if' if its
condition is always false
        if (IS_ERR(bio)) {
        ^~~~~~~~~~~~~~~~~~
drivers/block/rnbd/rnbd-srv.c:126:9: note: initialize the variable 'err'
to silence this warning
        int err;
               ^
                = 0
1 warning generated.

err is indeed uninitialized when this statement is taken. Ensure that it
is assigned the error value of bio before jumping to the error handling
label.

Fixes: 735d77d4fd28 ("rnbd: remove rnbd_dev_submit_io")
Reported-by: Brooke Basile <brookebasile@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1134
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21 17:09:11 -06:00
Geert Uytterhoeven
5cd841d267 dt-bindings: vendor-prefixes: Remove trailing whitespace
Fixes: f516fb704d02fff2 ("dt-bindings: Whitespace clean-ups in schema files")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200819092058.1526-1-geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
2020-08-21 16:27:57 -06:00
Will Deacon
b5331379bc KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc: <stable@vger.kernel.org>
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-3-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 18:06:43 -04:00
Will Deacon
fdfe7cbd58 KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 18:03:47 -04:00
Madalin Bucur
5f53584ce9 dt-bindings: net: correct description of phy-connection-type
The phy-connection-type parameter is described in ePAPR 1.1:

Specifies interface type between the Ethernet device and a physical
layer (PHY) device. The value of this property is specific to the
implementation.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://lore.kernel.org/r/1597917724-11127-1-git-send-email-madalin.bucur@oss.nxp.com
Signed-off-by: Rob Herring <robh@kernel.org>
2020-08-21 16:01:50 -06:00
Linus Torvalds
f873db9acd io_uring-5.9-2020-08-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9AMgoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsqjEAC3hNnlu7BwVRFMeJzOyxZUqvvtT2ktYTZs
 9duV7qezounq022nKihl0/wy0KWOxfg4HDT492la54nKYAPf6xhszrbuAUx9pW8Z
 pwcUccaci7nB29V4a4wedtxz+jegCN2LXbRNk4DOpchlVKULfrOIcfW5/rL/7gkp
 15n/AAIZNChJ6y9dJDqYRoiF152/6uk7t+BolU/+W9QCKi2PW40nTOgfkzSnBvJV
 WaHlYHKAOUaiurIUjZQolgohNNBUzNwWtF/4HSeT5n8c94gSpI3IKFkmNCjxQQ96
 I0gjJZIss7N8ysKFBy3WALqx9FqxSWS3pi/G9fai4o/VPEFj+fhfBTh+H1fzLaoM
 V+oOHMCt5Cwlw+n8vSgtUU0JF6ZnmoolfpHWPchtCJyQ42i/gt41MrePdu/tUC+n
 tV7wvftuM/+AN36vDDgbDc5BTKjCnRQSHz80M3EwUznJJjaeTAPxnQ+pVlpN9IS+
 sbywlg+Xake9F19qA/astAH9n3U2+m3HdmoIXfG1vrXKFt/I9d36gh5hzlCh5//5
 zAu1/iwy1fAlaI4CWRR14+e8/ozu5SCxlswsI79sGZcFuv+WQsQ84q297rq8v0Wr
 HdtmiRDGlBFfcuiEOjoSzSEwMWPc1F+8EcmiEp8SZBglKDM+kQI9XMKKXakqh7K0
 yEWGAMm+1g==
 =dLiS
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Make sure the head link cancelation includes async work

 - Get rid of kiocb_wait_page_queue_init(), makes no sense to have it as
   a separate function since you moved it into io_uring itself

 - io_import_iovec cleanups (Pavel, me)

 - Use system_unbound_wq for ring exit work, to avoid spawning tons of
   these if we have tons of rings exiting at the same time

 - Fix req->flags overflow flag manipulation (Pavel)

* tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block:
  io_uring: kill extra iovec=NULL in import_iovec()
  io_uring: comment on kfree(iovec) checks
  io_uring: fix racy req->flags modification
  io_uring: use system_unbound_wq for ring exit work
  io_uring: cleanup io_import_iovec() of pre-mapped request
  io_uring: get rid of kiocb_wait_page_queue_init()
  io_uring: find and cancel head link async work on files exit
2020-08-21 14:59:16 -07:00
Rob Herring
a326462cba dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances
The intel,lgm-pcie binding is matching on all snps,dw-pcie instances
which is wrong. Add a custom 'select' entry to fix this.

Fixes: e54ea45a4955 ("dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller")
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Reviewed-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2020-08-21 15:51:01 -06:00
Linus Torvalds
349111f050 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "11 patches.

  Subsystems affected by this: misc, mm/hugetlb, mm/vmalloc, mm/misc,
  romfs, relay, uprobes, squashfs, mm/cma, mm/pagealloc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: fix core hung in free_pcppages_bulk()
  mm: include CMA pages in lowmem_reserve at boot
  squashfs: avoid bio_alloc() failure with 1Mbyte blocks
  uprobes: __replace_page() avoid BUG in munlock_vma_page()
  kernel/relay.c: fix memleak on destroy relay channel
  romfs: fix uninitialized memory leak in romfs_dev_read()
  mm/rodata_test.c: fix missing function declaration
  mm/vunmap: add cond_resched() in vunmap_pmd_range
  khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
  hugetlb_cgroup: convert comma to semicolon
  mailmap: add Andi Kleen
2020-08-21 14:44:48 -07:00
David S. Miller
4af7b32f84 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-08-21

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 5 day(s) which contain
a total of 12 files changed, 78 insertions(+), 24 deletions(-).

The main changes are:

1) three fixes in BPF task iterator logic, from Yonghong.

2) fix for compressed dwarf sections in vmlinux, from Jiri.

3) fix xdp attach regression, from Andrii.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-21 12:54:50 -07:00
Linus Torvalds
f22c5579a7 RISC-V Fixes for 5.9-rc2
* The CLINT driver has been split in two: one to handle the M-mode CLINT
   (memory mapped and used on NOMMU systems) and one to handle the S-mode CLINT
   (via SBI).
 * The addition of SiFive's drivers to rv32_defconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl9AECkTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiZMAD/wKrUd1yCPvg5lKMWLN34+CAJHKU/oH
 ASK3/AdwHS1B3zohk2Eq04hycBEzmFhz2TcoLResZb3CHyqXUN090xBz0Z4qAPlT
 NvOJE+fCZLdxXu35HS2wPpjkZnnBTom3kv3q2D+Cyq3nrFrGt1mGJ4a91oxebktP
 RODyooO982KjTSs/t10GUzuZFIDc7UzvyNQMv0EipXmPaVEPxPbnNhMbdkuCXWjs
 PKj6ZNMx7HVv5ms3RNQP83w3X1czLc8t/2ATfo2fApnrpizWvukzyIhDK3Ar37d+
 f9u/SQdhv6NaAzacv8GYJ79o72e4hu6+llPrTrkTPQq5WtzAHQNTqYA+99+/Lj13
 2Hh9DlRyLc7tUF2bidY6po7XluWVYbYk4JEVbTPDOUP5kmKM/MN6r14TzKxch+26
 ghu1RI70cJkE3hn2DrbG7Errcs1+I+59Fhh+PzzxhY12V0d1EUY7SVETBw5GoOce
 YOz3327rJ6f0sozUwx9bZT2x6udVAPVDTyxZ2egmzay6xlFeMQJBxWSmleAHdZce
 0O1qTOcfAp6Nyx8SIv7Ust208hGXzpsULFn6z9QQsQhfu4PIWQUUR7c3P15Sf3r8
 3vyK5slBlBm2vgkfbsmf9hZ5u+BEc/ly8M/lotbXdjJXjzoAKsg5PSYQ4KjyO/qr
 2xYMHVNsSoaFkw==
 =MBvG
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - The CLINT driver has been split in two: one to handle the M-mode
   CLINT (memory mapped and used on NOMMU systems) and one to handle the
   S-mode CLINT (via SBI).

 - The addition of SiFive's drivers to rv32_defconfig

* tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Add SiFive drivers to rv32_defconfig
  dt-bindings: timer: Add CLINT bindings
  RISC-V: Remove CLINT related code from timer and arch
  clocksource/drivers: Add CLINT timer driver
  RISC-V: Add mechanism to provide custom IPI operations
2020-08-21 12:32:42 -07:00
Linus Torvalds
c0a4f5b354 xen: branch for v5.9-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCX0ALjQAKCRCAXGG7T9hj
 voZ2AQCXYVDclEXoNwkD6sS0RuVSc4T6ypEzeGM6tP4Z/VInaAD/eJ8zP+aJx9wL
 oeTPsOEJzAax6Xj/c+tQ7maCQdxFpQQ=
 =K3yj
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "One build fix and a minor fix for suppressing a useless warning when
  booting a Xen dom0 via UEFI"

* tag 'for-linus-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Fix build error when CONFIG_ACPI is not set/enabled:
  efi: avoid error message when booting under Xen
2020-08-21 12:28:33 -07:00
Linus Torvalds
985c788b6d Power management fixes for 5.9-rc2
- Fix re-enabling of resources in dev_pm_opp_set_rate() (Rajendra
    Nayak).
 
  - Fix OPP table reference counting in error paths (Stephen Boyd).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl9AC7oSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxW2AP+gN4h8yIpyn9G0HSEv8zUPy+CwOSOL5U
 Mrzcoln2eBj9MGE+Pny5JcOUTRcdpBtN0k/0/OfZMNwKm228eWlkcolG/HeRWY2K
 AjE3a80drxjqBxsaoJraGgK8KjbtfZGJzV2iFrBNU9LPr0deuJ83CKTTa07YyefM
 gKMijLQDWzxKMsvnE1ZlPzuPPruQjsN0h+NjXZiEkdovQWZuQyiKNmUDS/L78o4o
 eDP8g3z7FOz21KBytzO2QikSXzqX2cPQw2Ydbry9mRGBZxJ8woyCxj1jsxjv22IK
 YiTJKnZAlLKaGF2X1r1PO9Ccu4Js6MGgeWKhHAUBpI4dT4ssVybJmzl1iI4Um5EH
 39JJ07dClt+INsvhLQ6VoTQkpP4kds+yJVGdkhUyinxWXkMAt3ibEqeKqFmUu7Cb
 t0+sPkJYFmLkfNL81/E1thrXk/sEZ0ixA5Nlg49Gk+fSv6riG6bugpnJEEKYpFb8
 5rqWQKKcVp4p1IrrVbSaaqPRCvvfu3Zk3LZc4ycYsVPPVng6+PvFgGLlBHxK1ath
 m0F2kY6WJA0MrA4JeQCQ7Jpgt4TiIa+DPzFu1OD3tWBms1IF3DGd2HSAlaz4EI1i
 Ad0ozgXIhwnYIh6EbWfjSBiwkOK4cltpZG0cD0rNYHl/S1e46BbYCHow1i2h4x/u
 P/fVdxK6zVeE
 =7DWg
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a few issues in the operating performance points (OPP)
  framework.

  Specifics:

   - Fix re-enabling of resources in dev_pm_opp_set_rate() (Rajendra
     Nayak)

   - Fix OPP table reference counting in error paths (Stephen Boyd)"

* tag 'pm-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  opp: Enable resources again if they were disabled earlier
  opp: Put opp table in dev_pm_opp_set_rate() if _set_opp_bw() fails
  opp: Put opp table in dev_pm_opp_set_rate() for empty tables
2020-08-21 12:26:58 -07:00
Tobias Klauser
b16fc097bc bpf: Fix two typos in uapi/linux/bpf.h
Also remove trailing whitespaces in bpf_skb_get_tunnel_key example code.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821133642.18870-1-tklauser@distanz.ch
2020-08-21 12:26:17 -07:00
Tom Rix
774d977abf net: dsa: b53: check for timeout
clang static analysis reports this problem

b53_common.c:1583:13: warning: The left expression of the compound
  assignment is an uninitialized value. The computed value will
  also be garbage
        ent.port &= ~BIT(port);
        ~~~~~~~~ ^

ent is set by a successful call to b53_arl_read().  Unsuccessful
calls are caught by an switch statement handling specific returns.
b32_arl_read() calls b53_arl_op_wait() which fails with the
unhandled -ETIMEDOUT.

So add -ETIMEDOUT to the switch statement.  Because
b53_arl_op_wait() already prints out a message, do not add another
one.

Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-21 11:46:12 -07:00
Stephen Boyd
8d75785a81 ARM64: vdso32: Install vdso32 from vdso_install
Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
vdso_install' installs the 32-bit compat vdso when it is compiled.

Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-08-21 19:11:44 +01:00
Linus Torvalds
d723b99ec9 Improvements to ext4's block allocator performance for very large file
systems, especially when the file system or files which are highly
 fragmented.  There is a new mount option, prefetch_block_bitmaps which
 will pull in the block bitmaps and set up the in-memory buddy bitmaps
 when the file system is initially mounted.
 
 Beyond that, a lot of bug fixes and cleanups.  In particular, a number
 of changes to make ext4 more robust in the face of write errors or
 file system corruptions.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl8/Q9YACgkQ8vlZVpUN
 gaPz+wgAkiWwpge0pfcukABW9FcHK9R82IPggA/NnFu0I+3trpqVQP8mYWqg+1l7
 X0W6B6GHMcITGdwxVDNGHHv0WabXCqFPT0ENwW1cnl9UL6I91Ev2NjmG9HP6hVZa
 g3+NyXJwiOP38xsxpPJGPoYFw2wZyv8/e41MMnsE6goYjMmB04sHvXCUQkbN41Fn
 3CMdsiueYZDAKflvAlL50Jy7Imz5tq9oy81/z+amqvWo4T0U8zRwQuf25nBAhr25
 1WdT4CbCNGO2Qwyu9X+t/KGNVIQhCctkx/yz71l3p2piEGkw/XE4VJNrkmWb0zN7
 k9F5uGOZlAlQEzx+5PN//Qtz1Db0QQ==
 =E6vv
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Improvements to ext4's block allocator performance for very large file
  systems, especially when the file system or files which are highly
  fragmented. There is a new mount option, prefetch_block_bitmaps which
  will pull in the block bitmaps and set up the in-memory buddy bitmaps
  when the file system is initially mounted.

  Beyond that, a lot of bug fixes and cleanups. In particular, a number
  of changes to make ext4 more robust in the face of write errors or
  file system corruptions"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
  ext4: limit the length of per-inode prealloc list
  ext4: reorganize if statement of ext4_mb_release_context()
  ext4: add mb_debug logging when there are lost chunks
  ext4: Fix comment typo "the the".
  jbd2: clean up checksum verification in do_one_pass()
  ext4: change to use fallthrough macro
  ext4: remove unused parameter of ext4_generic_delete_entry function
  mballoc: replace seq_printf with seq_puts
  ext4: optimize the implementation of ext4_mb_good_group()
  ext4: delete invalid comments near ext4_mb_check_limits()
  ext4: fix typos in ext4_mb_regular_allocator() comment
  ext4: fix checking of directory entry validity for inline directories
  fs: prevent BUG_ON in submit_bh_wbc()
  ext4: correctly restore system zone info when remount fails
  ext4: handle add_system_zone() failure in ext4_setup_system_zone()
  ext4: fold ext4_data_block_valid_rcu() into the caller
  ext4: check journal inode extents more carefully
  ext4: don't allow overlapping system zones
  ext4: handle error of ext4_setup_system_zone() on remount
  ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp()
  ...
2020-08-21 11:03:38 -07:00
David Howells
5e0b17b026 afs: Fix NULL deref in afs_dynroot_depopulate()
If an error occurs during the construction of an afs superblock, it's
possible that an error occurs after a superblock is created, but before
we've created the root dentry.  If the superblock has a dynamic root
(ie.  what's normally mounted on /afs), the afs_kill_super() will call
afs_dynroot_depopulate() to unpin any created dentries - but this will
oops if the root hasn't been created yet.

Fix this by skipping that bit of code if there is no root dentry.

This leads to an oops looking like:

	general protection fault, ...
	KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
	...
	RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385
	...
	Call Trace:
	 afs_kill_super+0x13b/0x180 fs/afs/super.c:535
	 deactivate_locked_super+0x94/0x160 fs/super.c:335
	 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598
	 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
	 do_new_mount fs/namespace.c:2875 [inline]
	 path_mount+0x1387/0x2070 fs/namespace.c:3192
	 do_mount fs/namespace.c:3205 [inline]
	 __do_sys_mount fs/namespace.c:3413 [inline]
	 __se_sys_mount fs/namespace.c:3390 [inline]
	 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
	 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

which is oopsing on this line:

	inode_lock(root->d_inode);

presumably because sb->s_root was NULL.

Fixes: 0da0b7fd73e4 ("afs: Display manually added cells in dynamic root mount")
Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 10:56:40 -07:00
Linus Torvalds
cd02217a5d RDMA first 5.9rc pull request
One regression from 5.8 and a few bugs from earlier kernels.
 
 - Various spelling corrections in kernel prints
 
 - Bug fixes in hfi1 and bntx_re
 
 - Revert a 5.8 patch in hns
 
 - Batch update for Mellanox and Cumulus maintainers emails
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl8/0OsACgkQOG33FX4g
 mxo7Pg/8CJzOQdb9KmR1bwqzi18i0BQ7ev81NucQvBzJNY/grx2LYQUTbZCQx+QJ
 em+PdUmjtdcA1C1fBEbeS++/LIQxIpBZbU36Yvz4syxJqYHV+GX9EAoy02zitd6O
 tyQST6boCAZQln7ya0VmCREW5EVOgWOdBmxmxO7B5VViXbFb1BShysxtekQAOpTk
 TZU9HDhVaG4LRktr5IfyZLBeS6agbvBEXcwoeOkk+YvxDnwE3BYyveeDluEJdvh8
 NqbYRxDZM0yKjVoxi08ceMhTJ7U+igCpoUP9O3NQCgWPaBHwE6udeWoT/5VEA+AJ
 522477WPlsX88FL40m4F+qjbgctD+MjoswqgMXXhXJWIgnae3hvTnsMYAOH0xHGn
 F+LfwMct5kFNn3cueOxNmUfwylwxpnx5SiknDw1VnpNhybU2arPI2Tnkz9und848
 WDUdOlgmWRjKgOjcN6KkqFb2cTXMbr1yRuid0sutmVXSeH7FtiNR3VdvZuS8VeYF
 pfev3XZFjRGy3AOTc83oSg6h0SDWAOvs0m24y0Jut1Lb49N1QpCITIHUeQhB8L+W
 udCVHvYSZdlnXKkrj7FDDwKBy43krJrGrNw+CM50o4eTHi+WlZ2dVlzVMtoU8YeJ
 qZw79pcsC8OY7nD2PJ456xFy1OhxKDY0iJXmjvIyi45GsOYNPks=
 =5gq2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "One regression from 5.8 and a few bugs from earlier kernels:

   - Various spelling corrections in kernel prints

   - Bug fixes in hfi1 and bntx_re

   - Revert a 5.8 patch in hns

   - Batch update for Mellanox and Cumulus maintainers emails"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  MAINTAINERS: Update Mellanox and Cumulus Network addresses to new domain
  Revert "RDMA/hns: Reserve one sge in order to avoid local length error"
  RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE request
  RDMA/bnxt_re: Do not add user qps to flushlist
  RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"
  RDMA/usnic: Fix spelling mistake "transistion" -> "transition"
  RDMA/hns: Fix spelling mistake "epmty" -> "empty"
2020-08-21 10:14:16 -07:00
Linus Torvalds
7f04f3ed62 sound fixes for 5.9-rc2
A collection of small fixes over several drivers, but all are driver-
 specific and nothing looks scary.  Slightly large changes are seen in
 ASoC qcom driver for the bugs that were revealed by the recent ASoC
 core change to report the invalid register access errors.  Also ASoC
 fsl got a slight intensive change for the distortion fix.  Others are
 only trivial fixes or device-specific quirks.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAl8/kt4OHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE+sexAAoPtgbUActijlJY34J4cm7kIt1wVulKKgdxMu
 LQEbuJPbRuv80fcr7MIcYTc6Q20frb3kQYJNxyPAG7afVV5wkGa/dO6mUrS3LHyg
 UsQLqm6iHqZNg0wo8cZK9Lhs+VwWrD1VWnV+5ODL5koT/SYfzqI7Km3lKketA4y/
 MvsRFhW03Mc6SabRINqxNcE3YTUHi8HPgM4aF9mQmBQTqm3tnld6MSCgo4B129cY
 rnNoGcpJmcZRGo1ZM7kUGS+FfLeclt3STvepbpz2iAoTWiI55X67uVwjAO3GHW4s
 5EoycKu0f8D6g3ZO0evari1vJRhC0X2QVHO42CaDk32PKxnh+xlR4sBfuJW0Zsqt
 AR+Jibv/wiF+vmlC0s+DQqgaxPkCJrK6zJ4uvjZi+iZhqXhq8Rl9DmIOCUwPoPZd
 PsKhenrmjecL3yd7kgYMtSm6orjaAzkG9r8rUnTWuWvtnpIyMe9eN3BsOGZnI0GV
 sn0UCVJQSmlxNcbWfFX+w/hYajY82FGRYbUf8bOAMWIxZP5ecPu54cn5eta70JQk
 w+b3Th9FSFJDlnATA+WAoh4TmYHmuYBISjmx4tLjnMwKxUCirJt90o7hl04Ntwwe
 1/pA4bIN6dN+csaLgWJzstlCxr/8H0LyQgRdvCfAES2ejboUQliiNOO0fDHkdf87
 PTCJtUg=
 =T8Rb
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes over several drivers, but all are driver-
  specific and nothing looks scary.

  Slightly large changes are seen in ASoC qcom driver for the bugs that
  were revealed by the recent ASoC core change to report the invalid
  register access errors. Also ASoC fsl got a slight intensive change
  for the distortion fix.

  Others are only trivial fixes or device-specific quirks"

* tag 'sound-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
  ALSA: hda: avoid reset of sdo_limit
  ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion
  ALSA: usb-audio: ignore broken processing/extension unit
  ASoC: intel: Fix memleak in sst_media_open
  ASoC: wm8994: Avoid attempts to read unreadable registers
  ASoC: msm8916-wcd-analog: fix register Interrupt offset
  ASoC: wm8994: Prevent access to invalid VU register bits on WM1811
  ALSA: hda/realtek: Add model alc298-samsung-headphone
  ALSA: usb-audio: Update documentation comment for MS2109 quirk
  ALSA: isa: fix spelling mistakes in the comments
  ALSA: usb-audio: Add capture support for Saffire 6 (USB 1.1)
  ALSA: hda/realtek: Add quirk for Samsung Galaxy Flex Book
  ASoC: q6routing: add dummy register read/write function
  ASoC: q6afe-dai: mark all widgets registers as SND_SOC_NOPM
  ASoC: Make soc_component_read() returning an error code again
  ASoC: amd: Replacing component->name with codec_dai->name.
  ASoC: fsl: Fix unused variable warning
  ASoC: tegra: tegra210_i2s: Fix compile warning with CONFIG_PM=n
  ASoC: tegra: tegra210_dmic: Fix compile warning with CONFIG_PM=n
  ASoC: tegra: tegra210_ahub: Fix compile warning with CONFIG_PM=n
  ...
2020-08-21 10:07:54 -07:00
Linus Torvalds
43d387a4ad drm fixes for 5.9-rc2
amdgpu:
 - Fix allocation size
 - SR-IOV fixes
 - Vega20 SMU feature state caching fix
 - Fix custom pptable handling
 - Arcturus golden settings update
 - Several display fixes
 - Fixes for Navy Flounder
 - Misc display fixes
 - RAS fix
 
 amdkfd:
 - SDMA fix for renoir
 
 i915:
 - Fix device parameter usage for selftest mock i915 device
 - Fix LPSP capability debugfs NULL dereference
 - Fix buddy register pagemask table
 - Fix intel_atomic_check() non-negative return value
 - Fix selftests passing a random 0 into ilog2()
 - Fix TGL power well enable/disable ordering
 - Switch to PMU module refcounting
 - GVT fixes
 
 virtio:
 - Add missing dma_fence_put() in virtio_gpu_execbuffer_ioctl().
 - Fix memory leak in virtio_gpu_cleanup_object().
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJfPyfDAAoJEAx081l5xIa+FZMP/1I13j/J/uaxThFuAc8m0By5
 wvDLu4EdzV1zIXmAT1m/bUzvXVxsCgg6XSQjZEQ4nK0SqEN5dU8g/Kg+u0E3ojJc
 g4A3XJydrQ+CSkiuP51QenRXZPMdj3rAIKXYelb7UylSdw0tPKdBP0ISXTwQUZcS
 PSN6PPUiVTJHZ52EatO7yIDV2QGyV4h9qnKcGsyLIEBa567kClrQPqdUdLa+WBSf
 9uTVdulx7CQg5vO5qZCQYpEDbhToZQA2DTYx9m640D5OCP0M6XCXTQIgpK7sISdk
 N1XkqhfiWr5ivnXJBRdqTkXv3PAUrxGNVYfXvkK3+oP5Vz4yrM9tB8AyCorWXzps
 WibnqejHgYhG57uwo3APNg/1j4EDJDdq01pDl65TEz2YyDLHAV1FiYW98XveKL2k
 8uNqCmxFnnj4p9xWhsmNIm7dwkud3QxOs17vX7odzlLq63QX+8tTnjhAKw8aXFhC
 USJqmMNY5pI3kVX1jUHLGvPxakLngLWH2T+Bozk55Rm1f0JyMCY6ZSHPVaq48mqv
 2ZifBgBb12h6MKENZvHXbGUrK1p+Q+uo4ueXvpQs1vAMNx1kQ0hJwoCJqNq+WxQN
 /P9XtQJUJ6jg/w1PSMNA3hipg4jtqy511pf9+jrdDOHbIYhIxEK3F1cmjasMjBya
 +zwVzm3p5u+TKuw+z4wF
 =eivO
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2020-08-21' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular fixes pull for rc2. Usual rc2 doesn't seem too busy, mainly
  i915 and amdgpu. I'd expect the usual uptick for rc3.

  amdgpu:
   - Fix allocation size
   - SR-IOV fixes
   - Vega20 SMU feature state caching fix
   - Fix custom pptable handling
   - Arcturus golden settings update
   - Several display fixes
   - Fixes for Navy Flounder
   - Misc display fixes
   - RAS fix

  amdkfd:
   - SDMA fix for renoir

  i915:
   - Fix device parameter usage for selftest mock i915 device
   - Fix LPSP capability debugfs NULL dereference
   - Fix buddy register pagemask table
   - Fix intel_atomic_check() non-negative return value
   - Fix selftests passing a random 0 into ilog2()
   - Fix TGL power well enable/disable ordering
   - Switch to PMU module refcounting
   - GVT fixes

  virtio:
   - Add missing dma_fence_put() in virtio_gpu_execbuffer_ioctl()
   - Fix memory leak in virtio_gpu_cleanup_object()"

* tag 'drm-fixes-2020-08-21' of git://anongit.freedesktop.org/drm/drm: (34 commits)
  Revert "drm/amdgpu: disable gfxoff for navy_flounder"
  drm/i915/tgl: Make sure TC-cold is blocked before enabling TC AUX power wells
  drm/i915/selftests: Avoid passing a random 0 into ilog2
  drm/i915: Fix wrong return value in intel_atomic_check()
  drm/i915: Update bw_buddy pagemask table
  drm/i915/display: Check for an LPSP encoder before dereferencing
  drm/i915: Copy default modparams to mock i915_device
  drm/i915: Provide the perf pmu.module
  drm/amd/display: fix pow() crashing when given base 0
  drm/amd/display: Reset scrambling on Test Pattern
  drm/amd/display: fix dcn3 wide timing dsc validation
  drm/amd/display: Fix DFPstate hang due to view port changed
  drm/amd/display: Assign correct left shift
  drm/amd/display: Call DMUB for eDP power control
  drm/amdkfd: fix the wrong sdma instance query for renoir
  drm/amdgpu: parse ta firmware for navy_flounder
  drm/amdgpu: fix NULL pointer access issue when unloading driver
  drm/amdgpu: fix uninit-value in arcturus_log_thermal_throttling_event()
  drm/amdgpu: disable gfxoff for navy_flounder
  drm/amdgpu/display: use GFP_ATOMIC in dcn20_validate_bandwidth_internal
  ...
2020-08-21 10:02:44 -07:00
Charan Teja Reddy
88e8ac11d2 mm, page_alloc: fix core hung in free_pcppages_bulk()
The following race is observed with the repeated online, offline and a
delay between two successive online of memory blocks of movable zone.

P1						P2

Online the first memory block in
the movable zone. The pcp struct
values are initialized to default
values,i.e., pcp->high = 0 &
pcp->batch = 1.

					Allocate the pages from the
					movable zone.

Try to Online the second memory
block in the movable zone thus it
entered the online_pages() but yet
to call zone_pcp_update().
					This process is entered into
					the exit path thus it tries
					to release the order-0 pages
					to pcp lists through
					free_unref_page_commit().
					As pcp->high = 0, pcp->count = 1
					proceed to call the function
					free_pcppages_bulk().
Update the pcp values thus the
new pcp values are like, say,
pcp->high = 378, pcp->batch = 63.
					Read the pcp's batch value using
					READ_ONCE() and pass the same to
					free_pcppages_bulk(), pcp values
					passed here are, batch = 63,
					count = 1.

					Since num of pages in the pcp
					lists are less than ->batch,
					then it will stuck in
					while(list_empty(list)) loop
					with interrupts disabled thus
					a core hung.

Avoid this by ensuring free_pcppages_bulk() is called with proper count of
pcp list pages.

The mentioned race is some what easily reproducible without [1] because
pcp's are not updated for the first memory block online and thus there is
a enough race window for P2 between alloc+free and pcp struct values
update through onlining of second memory block.

With [1], the race still exists but it is very narrow as we update the pcp
struct values for the first memory block online itself.

This is not limited to the movable zone, it could also happen in cases
with the normal zone (e.g., hotplug to a node that only has DMA memory, or
no other memory yet).

[1]: https://patchwork.kernel.org/patch/11696389/

Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into one-list-per-migrate-type")
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: <stable@vger.kernel.org> [2.6+]
Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Doug Berger
e08d3fdfe2 mm: include CMA pages in lowmem_reserve at boot
The lowmem_reserve arrays provide a means of applying pressure against
allocations from lower zones that were targeted at higher zones.  Its
values are a function of the number of pages managed by higher zones and
are assigned by a call to the setup_per_zone_lowmem_reserve() function.

The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses of the
/proc/sys/vm/lowmem_reserve_ratio sysctl file.

The function init_per_zone_wmark_min() was moved up from a module_init to
a core_initcall to resolve a sequencing issue with khugepaged.
Unfortunately this created a sequencing issue with CMA page accounting.

The CMA pages are added to the managed page count of a zone when
cma_init_reserved_areas() is called at boot also as a core_initcall.  This
makes it uncertain whether the CMA pages will be added to the managed page
counts of their zones before or after the call to
init_per_zone_wmark_min() as it becomes dependent on link order.  With the
current link order the pages are added to the managed count after the
lowmem_reserve arrays are initialized at boot.

This means the lowmem_reserve values at boot may be lower than the values
used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
ratio values are unchanged.

In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout

  cma: Reserved 256 MiB at 0x0000000030000000
  Zone ranges:
    DMA      [mem 0x0000000000000000-0x000000002fffffff]
    Normal   empty
    HighMem  [mem 0x0000000030000000-0x000000003fffffff]

would result in 0 lowmem_reserve for the DMA zone.  This would allow
userspace to deplete the DMA zone easily.

Funnily enough

  $ cat /proc/sys/vm/lowmem_reserve_ratio

would fix up the situation because as a side effect it forces
setup_per_zone_lowmem_reserve.

This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
have the chance to be properly accounted in their zone(s) and allowing
the lowmem_reserve arrays to receive consistent values.

Fixes: bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Phillip Lougher
f26044c83e squashfs: avoid bio_alloc() failure with 1Mbyte blocks
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".

Bio_alloc() is limited to 256 pages (1 Mbyte).  This can cause a failure
when reading 1 Mbyte block filesystems.  The problem is a datablock can be
fully (or almost uncompressed), requiring 256 pages, but, because blocks
are not aligned to page boundaries, it may require 257 pages to read.

Bio_kmalloc() can handle 1024 pages, and so use this for the edge
condition.

Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: Nicolas Prochazka <nicolas.prochazka@gmail.com>
Reported-by: Tomoatsu Shimada <shimada@walbrix.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: Philippe Liard <pliard@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Adrien Schildknecht <adrien+dev@schischi.me>
Cc: Daniel Rosenberg <drosen@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Hugh Dickins
c17c3dc9d0 uprobes: __replace_page() avoid BUG in munlock_vma_page()
syzbot crashed on the VM_BUG_ON_PAGE(PageTail) in munlock_vma_page(), when
called from uprobes __replace_page().  Which of many ways to fix it?
Settled on not calling when PageCompound (since Head and Tail are equals
in this context, PageCompound the usual check in uprobes.c, and the prior
use of FOLL_SPLIT_PMD will have cleared PageMlocked already).

Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[5.4+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008161338360.20413@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Wei Yongjun
71e843295c kernel/relay.c: fix memleak on destroy relay channel
kmemleak report memory leak as follows:

  unreferenced object 0x607ee4e5f948 (size 8):
  comm "syz-executor.1", pid 2098, jiffies 4295031601 (age 288.468s)
  hex dump (first 8 bytes):
  00 00 00 00 00 00 00 00 ........
  backtrace:
     relay_open kernel/relay.c:583 [inline]
     relay_open+0xb6/0x970 kernel/relay.c:563
     do_blk_trace_setup+0x4a8/0xb20 kernel/trace/blktrace.c:557
     __blk_trace_setup+0xb6/0x150 kernel/trace/blktrace.c:597
     blk_trace_ioctl+0x146/0x280 kernel/trace/blktrace.c:738
     blkdev_ioctl+0xb2/0x6a0 block/ioctl.c:613
     block_ioctl+0xe5/0x120 fs/block_dev.c:1871
     vfs_ioctl fs/ioctl.c:48 [inline]
     __do_sys_ioctl fs/ioctl.c:753 [inline]
     __se_sys_ioctl fs/ioctl.c:739 [inline]
     __x64_sys_ioctl+0x170/0x1ce fs/ioctl.c:739
     do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

'chan->buf' is malloced in relay_open() by alloc_percpu() but not free
while destroy the relay channel.  Fix it by adding free_percpu() before
return from relay_destroy_channel().

Fixes: 017c59c042d0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Akash Goel <akash.goel@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200817122826.48518-1-weiyongjun1@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Jann Horn
bcf85fcedf romfs: fix uninitialized memory leak in romfs_dev_read()
romfs has a superblock field that limits the size of the filesystem; data
beyond that limit is never accessed.

romfs_dev_read() fetches a caller-supplied number of bytes from the
backing device.  It returns 0 on success or an error code on failure;
therefore, its API can't represent short reads, it's all-or-nothing.

However, when romfs_dev_read() detects that the requested operation would
cross the filesystem size limit, it currently silently truncates the
requested number of bytes.  This e.g.  means that when the content of a
file with size 0x1000 starts one byte before the filesystem size limit,
->readpage() will only fill a single byte of the supplied page while
leaving the rest uninitialized, leaking that uninitialized memory to
userspace.

Fix it by returning an error code instead of truncating the read when the
requested read operation would go beyond the end of the filesystem.

Fixes: da4458bda237 ("NOMMU: Make it possible for RomFS to use MTD devices directly")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Leon Romanovsky
86f54bb7e4 mm/rodata_test.c: fix missing function declaration
The compilation with CONFIG_DEBUG_RODATA_TEST set produces the following
warning due to the missing include.

 mm/rodata_test.c:15:6: warning: no previous prototype for 'rodata_test' [-Wmissing-prototypes]
    15 | void rodata_test(void)
       |      ^~~~~~~~~~~

Fixes: 2959a5f726f6 ("mm: add arch-independent testcases for RODATA")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lkml.kernel.org/r/20200819080026.918134-1-leon@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Aneesh Kumar K.V
e47110e905 mm/vunmap: add cond_resched() in vunmap_pmd_range
Like zap_pte_range add cond_resched so that we can avoid softlockups as
reported below.  On non-preemptible kernel with large I/O map region (like
the one we get when using persistent memory with sector mode), an unmap of
the namespace can report below softlockups.

22724.027334] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [ndctl:50777]
 NIP [c0000000000dc224] plpar_hcall+0x38/0x58
 LR [c0000000000d8898] pSeries_lpar_hpte_invalidate+0x68/0xb0
 Call Trace:
    flush_hash_page+0x114/0x200
    hpte_need_flush+0x2dc/0x540
    vunmap_page_range+0x538/0x6f0
    free_unmap_vmap_area+0x30/0x70
    remove_vm_area+0xfc/0x140
    __vunmap+0x68/0x270
    __iounmap.part.0+0x34/0x60
    memunmap+0x54/0x70
    release_nodes+0x28c/0x300
    device_release_driver_internal+0x16c/0x280
    unbind_store+0x124/0x170
    drv_attr_store+0x44/0x60
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    __vfs_write+0x3c/0x70
    vfs_write+0xd8/0x260
    ksys_write+0xdc/0x130
    system_call+0x5c/0x70

Reported-by: Harish Sriram <harish@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200807075933.310240-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Hugh Dickins
f3f99d63a8 khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
__khugepaged_enter(): yes, when one thread is about to dump core, has set
core_state, and is waiting for others, another might do something calling
__khugepaged_enter(), which now crashes because I lumped the core_state
test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
think it's best to lump them together, so just in this exceptional case,
check mm->mm_users directly instead of khugepaged_test_exit().

Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Xu Wang
d5a1695977 hugetlb_cgroup: convert comma to semicolon
Replace a comma between expression statements by a semicolon.

Fixes: faced7e0806cf4 ("mm: hugetlb controller for cgroups v2")
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Giuseppe Scrivano <gscrivan@redhat.com>
Link: http://lkml.kernel.org/r/20200818064333.21759-1-vulab@iscas.ac.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:52 -07:00
Nick Desaulniers
00c54a80cd mailmap: add Andi Kleen
I keep getting bounce back from the suse.de address.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Quentin Perret <qperret@qperret.net>
Link: http://lkml.kernel.org/r/20200818203214.659955-1-ndesaulniers@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:52 -07:00
Thomas Gleixner
d88d59b64c core/entry: Respect syscall number rewrites
The transcript of the x86 entry code to the generic version failed to
reload the syscall number from ptregs after ptrace and seccomp have run,
which both can modify the syscall number in ptregs. It returns the original
syscall number instead which is obviously not the right thing to do.

Reload the syscall number to fix that.

Fixes: 142781e108b1 ("entry: Provide generic syscall entry functionality")
Reported-by: Kyle Huey <me@kylehuey.com> 
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kyle Huey <me@kylehuey.com> 
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/87blj6ifo8.fsf@nanos.tec.linutronix.de
2020-08-21 16:17:29 +02:00
Sean Christopherson
6a3ea3e68b x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
KVM has an optmization to avoid expensive MRS read/writes on
VMENTER/EXIT. It caches the MSR values and restores them either when
leaving the run loop, on preemption or when going out to user space.

The affected MSRs are not required for kernel context operations. This
changed with the recently introduced mechanism to handle FSGSBASE in the
paranoid entry code which has to retrieve the kernel GSBASE value by
accessing per CPU memory. The mechanism needs to retrieve the CPU number
and uses either LSL or RDPID if the processor supports it.

Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
lazily restored MSRs, which means between the point where the guest value
is written and the point of restore, MSR_TSC_AUX contains a random number.

If an NMI or any other exception which uses the paranoid entry path happens
in such a context, then RDPID returns the random guest MSR_TSC_AUX value.

As a consequence this reads from the wrong memory location to retrieve the
kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
operations. If the GSBASE in the exception handler points to the per CPU
memory of a different CPU then this has the obvious consequences of data
corruption and crashes.

As the paranoid entry path is the only place which accesses MSR_TSX_AUX
(via RDPID) and the fallback via LSL is not significantly slower, remove
the RDPID alternative from the entry path and always use LSL.

The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
which would be inflicting massive overhead on that code path.

[ tglx: Rewrote changelog ]

Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Debugged-by: Tom Lendacky <thomas.lendacky@amd.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
2020-08-21 16:15:27 +02:00