801 Commits

Author SHA1 Message Date
Diana Craciun
df747bcd5b vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call
Expose to userspace information about the memory regions.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:24 -06:00
Diana Craciun
f97f4c04e5 vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl
Allow userspace to get fsl-mc device info (number of regions
and irqs).

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:19 -06:00
Diana Craciun
704f5082d8 vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind
The DPRC (Data Path Resource Container) device is a bus device and has
child devices attached to it. When the vfio-fsl-mc driver is probed
the DPRC is scanned and the child devices discovered and initialized.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12 11:32:03 -06:00
Bharat Bhushan
fb1ff4c194 vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices
DPAA2 (Data Path Acceleration Architecture) consists in
mechanisms for processing Ethernet packets, queue management,
accelerators, etc.

The Management Complex (mc) is a hardware entity that manages the DPAA2
hardware resources. It provides an object-based abstraction for software
drivers to use the DPAA2 hardware. The MC mediates operations such as
create, discover, destroy of DPAA2 objects.
The MC provides memory-mapped I/O command interfaces (MC portals) which
DPAA2 software drivers use to operate on DPAA2 objects.

A DPRC is a container object that holds other types of DPAA2 objects.
Each object in the DPRC is a Linux device and bound to a driver.
The MC-bus driver is a platform driver (different from PCI or platform
bus). The DPRC driver does runtime management of a bus instance. It
performs the initial scan of the DPRC and handles changes in the DPRC
configuration (adding/removing objects).

All objects inside a container share the same hardware isolation
context, meaning that only an entire DPRC can be assigned to
a virtual machine.
When a container is assigned to a virtual machine, all the objects
within that container are assigned to that virtual machine.
The DPRC container assigned to the virtual machine is not allowed
to change contents (add/remove objects) by the guest. The restriction
is set by the host and enforced by the mc hardware.

The DPAA2 objects can be directly assigned to the guest. However
the MC portals (the memory mapped command interface to the MC) need
to be emulated because there are commands that configure the
interrupts and the isolation IDs which are virtual in the guest.

Example:
echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override
echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind

The dprc.2 is bound to the VFIO driver and all the objects within
dprc.2 are going to be bound to the VFIO driver.

This patch adds the infrastructure for VFIO support for fsl-mc
devices. Subsequent patches will add support for binding and secure
assigning these devices using VFIO.

More details about the DPAA2 objects can be found here:
Documentation/networking/device_drivers/freescale/dpaa2/overview.rst

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-07 14:17:33 -06:00
Alex Williamson
3de066f8f8 Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', 'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next 2020-09-22 10:56:51 -06:00
Yan Zhao
2c5af98592 vfio/type1: fix dirty bitmap calculation in vfio_dma_rw
The count of dirtied pages is not only determined by count of copied
pages, but also by the start offset.

e.g. if offset = PAGE_SIZE - 1, and *copied=2, the dirty pages count
is 2, instead of 1 or 0.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22 10:56:41 -06:00
Yan Zhao
28b1302440 vfio: fix a missed vfio group put in vfio_pin_pages
When error occurs, need to put vfio group after a successful get.

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22 10:56:40 -06:00
Matthew Rosato
515ecd5368 vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn
While it is true that devices with is_virtfn=1 will have a Memory Space
Enable bit that is hard-wired to 0, this is not the only case where we
see this behavior -- For example some bare-metal hypervisors lack
Memory Space Enable bit emulation for devices not setting is_virtfn
(s390). Fix this by instead checking for the newly-added
no_command_memory bit which directly denotes the need for
PCI_COMMAND_MEMORY emulation in vfio.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22 10:52:24 -06:00
Matthew Rosato
7d6e132965 vfio iommu: Add dma available capability
Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container")
added the ability to limit the number of memory backed DMA mappings.
However on s390x, when lazy mapping is in use, we use a very large
number of concurrent mappings.  Let's provide the current allowable
number of DMA mappings to userspace via the IOMMU info chain so that
userspace can take appropriate mitigation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:58:34 -06:00
Yan Zhao
7ef32e5236 vfio: add a singleton check for vfio_group_pin_pages
Page pinning is used both to translate and pin device mappings for DMA
purpose, as well as to indicate to the IOMMU backend to limit the dirty
page scope to those pages that have been pinned, in the case of an IOMMU
backed device.
To support this, the vfio_pin_pages() interface limits itself to only
singleton groups such that the IOMMU backend can consider dirty page
scope only at the group level.  Implement the same requirement for the
vfio_group_pin_pages() interface.

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:50:51 -06:00
Zenghui Yu
1c0f68252a vfio/pci: Don't regenerate vconfig for all BARs if !bardirty
Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every
time any offset of any of them are read.  Though BARs aren't re-read
regularly, the regeneration can be avoided if no BARs had been written
since they were last read, in which case vdev->bardirty is false.

Let's return immediately in vfio_bar_fixup() if bardirty is false.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:08:12 -06:00
Zenghui Yu
eac7cc21c4 vfio/pci: Remove redundant declaration of vfio_pci_driver
It was added by commit 137e5531351d ("vfio/pci: Add sriov_configure
support") but duplicates a forward declaration earlier in the file.

Remove it.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:07:05 -06:00
Tom Murphy
aae4c8e27b iommu: Rename iommu_tlb_* functions to iommu_iotlb_*
To keep naming consistent we should stick with *iotlb*. This patch
renames a few remaining functions.

Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Link: https://lore.kernel.org/r/20200817210051.13546-1-murphyt7@tcd.ie
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-09-04 11:16:09 +02:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Alex Williamson
aae7a75a82 vfio/type1: Add proper error unwind for vfio_iommu_replay()
The vfio_iommu_replay() function does not currently unwind on error,
yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma
structure to indicate IOMMU mapping.  The IOMMU mappings are torn down
when the domain is destroyed, but the other actions go on to cause
trouble later.  For example, the iommu->domain_list can be empty if we
only have a non-IOMMU backed mdev attached.  We don't currently check
if the list is empty before getting the first entry in the list, which
leads to a bogus domain pointer.  If a vfio_dma entry is erroneously
marked as iommu_mapped, we'll attempt to use that bogus pointer to
retrieve the existing physical page addresses.

This is the scenario that uncovered this issue, attempting to hot-add
a vfio-pci device to a container with an existing mdev device and DMA
mappings, one of which could not be pinned, causing a failure adding
the new group to the existing container and setting the conditions
for a subsequent attempt to explode.

To resolve this, we can first check if the domain_list is empty so
that we can reject replay of a bogus domain, should we ever encounter
this inconsistent state again in the future.  The real fix though is
to add the necessary unwind support, which means cleaning up the
current pinning if an IOMMU mapping fails, then walking back through
the r-b tree of DMA entries, reading from the IOMMU which ranges are
mapped, and unmapping and unpinning those ranges.  To be able to do
this, we also defer marking the DMA entry as IOMMU mapped until all
entries are processed, in order to allow the unwind to know the
disposition of each entry.

Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17 11:09:13 -06:00
Alex Williamson
bc93b9ae01 vfio-pci: Avoid recursive read-lock usage
A down_read on memory_lock is held when performing read/write accesses
to MMIO BAR space, including across the copy_to/from_user() callouts
which may fault.  If the user buffer for these copies resides in an
mmap of device MMIO space, the mmap fault handler will acquire a
recursive read-lock on memory_lock.  Avoid this by reducing the lock
granularity.  Sequential accesses requiring multiple ioread/iowrite
cycles are expected to be rare, therefore typical accesses should not
see additional overhead.

VGA MMIO accesses are expected to be non-fatal regardless of the PCI
memory enable bit to allow legacy probing, this behavior remains with
a comment added.  ioeventfds are now included in memory access testing,
with writes dropped while memory space is disabled.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17 11:08:18 -06:00
Linus Torvalds
407bc8d818 VFIO updates for v5.9-rc1
- Inclusive naming updates (Alex Williamson)
 
  - Intel X550 INTx quirk (Alex Williamson)
 
  - Error path resched between unmaps (Xiang Zheng)
 
  - SPAPR IOMMU pin_user_pages() conversion (John Hubbard)
 
  - Trivial mutex simplification (Alex Williamson)
 
  - QAT device denylist (Giovanni Cabiddu)
 
  - type1 IOMMU ioctl refactor (Liu Yi L)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJfM1LBAAoJECObm247sIsiF1oP/RJOlmwKf0mmq8iP0j5yjCis
 gjvAAvmD4h3y15yx0kOoY8BNJjyHT26iuqnCHbEhaO+u+lNt3plWNJG9FRpuZJJY
 To6fsj6SApC3BnI3lxhduvQa29dcflbyohzjF2/+QJLbt4PJoRA5GlAEcI4rIUE0
 JfxquYE7n++h4of/532392TmJpMZeW+9IScF2TzPCIYMCw4DwUYZRcfm0UePkv0D
 nQOIyyeyzFhBhoG45TqQr4yp8KBXQvrBViwkuIX+4P0JnRBI/h8LLcDu9TCGAo87
 mkF/AZfuq7q5X6hJwbNavk9YeksNWeIM5JaYyJeBd42a3j3uz6iUe9ENUMQpm7qe
 6SGU3V7xVAPokE5q9VmjlSJz0yUgopqsTrD1EBP5DrQGIBVBAkFPHf/bOPE8DrAN
 htUNX9Zl/TuNPBmEBursgot9qMvgdwA1f2uOPtiB2zvf44qPgoGbT16k48xNK+x7
 kWWVlgUYX3yIi9NI0EnlEaTMA2DGL3xY5/ZLjySojWrtu5AspTuBlJZaT1jPKAAF
 OmKqJFCIIRMqrWPdg8r9P7k6nLcW0nI6hIMhxqRVgqp18PYLojPsaeDk/ndQv8Lx
 D0IafrruF932XGRFjuo2d5vUar4pJcHXqwbeQbf6mWRIeprrBYWm7nVsU/ZIWIMU
 lCtj3NQtShAIGARZUZK9
 =qsYH
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Inclusive naming updates (Alex Williamson)

 - Intel X550 INTx quirk (Alex Williamson)

 - Error path resched between unmaps (Xiang Zheng)

 - SPAPR IOMMU pin_user_pages() conversion (John Hubbard)

 - Trivial mutex simplification (Alex Williamson)

 - QAT device denylist (Giovanni Cabiddu)

 - type1 IOMMU ioctl refactor (Liu Yi L)

* tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio:
  vfio/type1: Refactor vfio_iommu_type1_ioctl()
  vfio/pci: Add QAT devices to denylist
  vfio/pci: Add device denylist
  PCI: Add Intel QuickAssist device IDs
  vfio/pci: Hold igate across releasing eventfd contexts
  vfio/spapr_tce: convert get_user_pages() --> pin_user_pages()
  vfio/type1: Add conditional rescheduling after iommu map failed
  vfio/pci: Add Intel X550 to hidden INTx devices
  vfio: Cleanup allowed driver naming
2020-08-12 12:09:36 -07:00
Peter Xu
64019a2e46 mm/gup: remove task_struct pointer for all gup code
After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more.  Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:04 -07:00
Liu Yi L
ccd59dce1a vfio/type1: Refactor vfio_iommu_type1_ioctl()
This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each command got a helper function.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:46:13 -06:00
Giovanni Cabiddu
50173329c8 vfio/pci: Add QAT devices to denylist
The current generation of Intel® QuickAssist Technology devices
are not designed to run in an untrusted environment because of the
following issues reported in the document "Intel® QuickAssist Technology
(Intel® QAT) Software for Linux" (document number 336211-014):

QATE-39220 - GEN - Intel® QAT API submissions with bad addresses that
             trigger DMA to invalid or unmapped addresses can cause a
             platform hang
QATE-7495  - GEN - An incorrectly formatted request to Intel® QAT can
             hang the entire Intel® QAT Endpoint

The document is downloadable from https://01.org/intel-quickassist-technology
at the following link:
https://01.org/sites/default/files/downloads/336211-014-qatforlinux-releasenotes-hwv1.7_0.pdf

This patch adds the following QAT devices to the denylist: DH895XCC,
C3XXX and C62X.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Fiona Trahe <fiona.trahe@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:40 -06:00
Giovanni Cabiddu
1f97970e6c vfio/pci: Add device denylist
Add denylist of devices that by default are not probed by vfio-pci.
Devices in this list may be susceptible to untrusted application, even
if the IOMMU is enabled. To be accessed via vfio-pci, the user has to
explicitly disable the denylist.

The denylist can be disabled via the module parameter disable_denylist.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Fiona Trahe <fiona.trahe@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:40 -06:00
Alex Williamson
924b51abf9 vfio/pci: Hold igate across releasing eventfd contexts
No need to release and immediately re-acquire igate while clearing
out the eventfd ctxs.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:38 -06:00
John Hubbard
9d532f2869 vfio/spapr_tce: convert get_user_pages() --> pin_user_pages()
This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:38 -06:00
Xiang Zheng
e1907d6752 vfio/type1: Add conditional rescheduling after iommu map failed
Commit c5e6688752c2 ("vfio/type1: Add conditional rescheduling")
missed a "cond_resched()" in vfio_iommu_map if iommu map failed.

This is a very tiny optimization and the case can hardly happen.

Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:37 -06:00
Alex Williamson
bf3551e150 vfio/pci: Add Intel X550 to hidden INTx devices
Intel document 333717-008, "Intel® Ethernet Controller X550
Specification Update", version 2.7, dated June 2020, includes errata
#22, added in version 2.1, May 2016, indicating X550 NICs suffer from
the same implementation deficiency as the 700-series NICs:

"The Interrupt Status bit in the Status register of the PCIe
 configuration space is not implemented and is not set as described
 in the PCIe specification."

Without the interrupt status bit, vfio-pci cannot determine when
these devices signal INTx.  They are therefore added to the nointx
quirk.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:37 -06:00
Alex Williamson
26afdd9882 vfio: Cleanup allowed driver naming
No functional change, avoid non-inclusive naming schemes.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27 13:43:36 -06:00
Zeng Tao
b872d06408 vfio/pci: fix racy on error and request eventfd ctx
The vfio_pci_release call will free and clear the error and request
eventfd ctx while these ctx could be in use at the same time in the
function like vfio_pci_request, and it's expected to protect them under
the vdev->igate mutex, which is missing in vfio_pci_release.

This issue is introduced since commit 1518ac272e78 ("vfio/pci: fix memory
leaks of eventfd ctx"),and since commit 5c5866c593bb ("vfio/pci: Clear
error and request eventfd ctx after releasing"), it's very easily to
trigger the kernel panic like this:

[ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 9513.913091] Mem abort info:
[ 9513.915871]   ESR = 0x96000006
[ 9513.918912]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 9513.924198]   SET = 0, FnV = 0
[ 9513.927238]   EA = 0, S1PTW = 0
[ 9513.930364] Data abort info:
[ 9513.933231]   ISV = 0, ISS = 0x00000006
[ 9513.937048]   CM = 0, WnR = 0
[ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000
[ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000
[ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci]
[ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G        W         5.8.0-rc4+ #3
[ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020
[ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
[ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88
[ 9513.999515] lr : eventfd_signal+0x6c/0x1b0
[ 9514.003591] sp : ffff800038a0b960
[ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10
[ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000
[ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400
[ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008
[ 9514.028033] x21: 0000000000000000 x20: 0000000000000000
[ 9514.033321] x19: 0000000000000008 x18: 0000000000000000
[ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328
[ 9514.043893] x15: 0000000000000000 x14: 0000000000000001
[ 9514.049179] x13: 0000000000000000 x12: 0000000000000002
[ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00
[ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60
[ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748
[ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000
[ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001
[ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000
[ 9514.086181] Call trace:
[ 9514.088618]  _raw_spin_lock_irqsave+0x48/0x88
[ 9514.092954]  eventfd_signal+0x6c/0x1b0
[ 9514.096691]  vfio_pci_request+0x84/0xd0 [vfio_pci]
[ 9514.101464]  vfio_del_group_dev+0x150/0x290 [vfio]
[ 9514.106234]  vfio_pci_remove+0x30/0x128 [vfio_pci]
[ 9514.111007]  pci_device_remove+0x48/0x108
[ 9514.115001]  device_release_driver_internal+0x100/0x1b8
[ 9514.120200]  device_release_driver+0x28/0x38
[ 9514.124452]  pci_stop_bus_device+0x68/0xa8
[ 9514.128528]  pci_stop_and_remove_bus_device+0x20/0x38
[ 9514.133557]  pci_iov_remove_virtfn+0xb4/0x128
[ 9514.137893]  sriov_disable+0x3c/0x108
[ 9514.141538]  pci_disable_sriov+0x28/0x38
[ 9514.145445]  hns3_pci_sriov_configure+0x48/0xb8 [hns3]
[ 9514.150558]  sriov_numvfs_store+0x110/0x198
[ 9514.154724]  dev_attr_store+0x44/0x60
[ 9514.158373]  sysfs_kf_write+0x5c/0x78
[ 9514.162018]  kernfs_fop_write+0x104/0x210
[ 9514.166010]  __vfs_write+0x48/0x90
[ 9514.169395]  vfs_write+0xbc/0x1c0
[ 9514.172694]  ksys_write+0x74/0x100
[ 9514.176079]  __arm64_sys_write+0x24/0x30
[ 9514.179987]  el0_svc_common.constprop.4+0x110/0x200
[ 9514.184842]  do_el0_svc+0x34/0x98
[ 9514.188144]  el0_svc+0x14/0x40
[ 9514.191185]  el0_sync_handler+0xb0/0x2d0
[ 9514.195088]  el0_sync+0x140/0x180
[ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61)
[ 9514.204455] ---[ end trace 648de00c8406465f ]---
[ 9514.212308] note: bash[1327] exited with preempt_count 1

Cc: Qian Cai <cai@lca.pw>
Cc: Alex Williamson <alex.williamson@redhat.com>
Fixes: 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx")
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-17 08:28:40 -06:00
Alex Williamson
ebfa440ce3 vfio/pci: Fix SR-IOV VF handling with MMIO blocking
SR-IOV VFs do not implement the memory enable bit of the command
register, therefore this bit is not set in config space after
pci_enable_device().  This leads to an unintended difference
between PF and VF in hand-off state to the user.  We can correct
this by setting the initial value of the memory enable bit in our
virtualized config space.  There's really no need however to
ever fault a user on a VF though as this would only indicate an
error in the user's management of the enable bit, versus a PF
where the same access could trigger hardware faults.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-25 11:04:23 -06:00
Alex Williamson
5c5866c593 vfio/pci: Clear error and request eventfd ctx after releasing
The next use of the device will generate an underflow from the
stale reference.

Cc: Qian Cai <cai@lca.pw>
Fixes: 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx")
Reported-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-17 15:18:42 -06:00
Christoph Hellwig
f5678e7f2a kernel: better document the use_mm/unuse_mm API contract
Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
  Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
  Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb]
Acked-by: Haren Myneni <haren@linux.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Christoph Hellwig
4dbe59a6ae kernel: move use_mm/unuse_mm to kthread.c
cover the newly merged use_mm/unuse_mm caller in vfio

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200416053158.586887-2-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10 19:14:18 -07:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
89154dd531 mmap locking API: convert mmap_sem call sites missed by coccinelle
Convert the last few remaining mmap_sem rwsem calls to use the new mmap
locking API.  These were missed by coccinelle for some reason (I think
coccinelle does not support some of the preprocessor constructs in these
files ?)

[akpm@linux-foundation.org: convert linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Linus Torvalds
5a36f0f3f5 VFIO updates for v5.8-rc1
- Block accesses to disabled MMIO space (Alex Williamson)
 
  - VFIO device migration API (Kirti Wankhede)
 
  - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)
 
  - PCI NULL capability masking (Alex Williamson)
 
  - Memory leak fixes (Qian Cai)
 
  - Reference leak fix (Qiushi Wu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJe19RZAAoJECObm247sIsiBTEP/1DsA/mL/eDR94Hkmz3y1qoX
 tkrIJuM5xo95uk6DsbXDikbjM2jpUmvDUiRilara25IceuVX/tKgTjgxNOWJd+h8
 KHBoPM0bveILp+Hp6r6ZAEfhZLQ3P6A5XBTB6ql0ebveadyt13TP9Cr0jCf4Jxr2
 2A7wZAexWN8P4YBXBqrW8X3ZF3V9rfWYOJ4eILRvEplJ/TkqsGJyKW8LKP45waUF
 wutN6yh7irnWQbbUO6hb/eTIO7usrNB2L4A5CJMSZIm8JZYYmFQOrn73Ry3bA3Pg
 U+v3xIRU5U1ed/1DeLs8lRDDxImmkSXbVga0N+e3qnKtDIBh7/i9yNT9hQjhG9Ba
 C3IQoOM0Fj03hK/UahNerN7FKYoj+JvtdSKeTSNFXwdp/JUSXjj1tLouxUBSFsgD
 P15CMPXF/U6rT3nQ7/3z3JFKft/iSf6ShjXeNJFqhwH+lKsj13kF2+py3+vgayl8
 zLOqeLdRkQwXz73UStX+hrc7MmNYZVRxJ0uf3XXq3/CZ1gwiinIQqxDF3ry+pQ6w
 vXB8j5XJqsOguoAU7trTdvkKpfoJd4yY0D6aNN9F3JN8xkEBbfm6xZ7Hzur2JI6r
 iCxUHcS4XrXA/PRlTFupg8451CefuPJa2UkAWSdk182hO2R41l458Q6b7lIMqhOQ
 oP04S8VYvvDIw5JVnPWU
 =KSAE
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Block accesses to disabled MMIO space (Alex Williamson)

 - VFIO device migration API (Kirti Wankhede)

 - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede)

 - PCI NULL capability masking (Alex Williamson)

 - Memory leak fixes (Qian Cai)

 - Reference leak fix (Qiushi Wu)

* tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio:
  vfio iommu: typecast corrections
  vfio iommu: Use shift operation for 64-bit integer division
  vfio/mdev: Fix reference count leak in add_mdev_supported_type
  vfio: Selective dirty page tracking if IOMMU backed device pins pages
  vfio iommu: Add migration capability to report supported features
  vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  vfio iommu: Implementation of ioctl for dirty pages tracking
  vfio iommu: Add ioctl definition for dirty pages tracking
  vfio iommu: Cache pgsize_bitmap in struct vfio_iommu
  vfio iommu: Remove atomicity of ref_count of pinned pages
  vfio: UAPI for migration interface for device state
  vfio/pci: fix memory leaks of eventfd ctx
  vfio/pci: fix memory leaks in alloc_perm_bits()
  vfio-pci: Mask cap zero
  vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  vfio-pci: Fault mmaps to enable vma tracking
  vfio/type1: Support faulting PFNMAP vmas
2020-06-05 13:51:49 -07:00
Linus Torvalds
7ae77150d9 powerpc updates for 5.8
- Support for userspace to send requests directly to the on-chip GZIP
    accelerator on Power9.
 
  - Rework of our lockless page table walking (__find_linux_pte()) to make it
    safe against parallel page table manipulations without relying on an IPI for
    serialisation.
 
  - A series of fixes & enhancements to make our machine check handling more
    robust.
 
  - Lots of plumbing to add support for "prefixed" (64-bit) instructions on
    Power10.
 
  - Support for using huge pages for the linear mapping on 8xx (32-bit).
 
  - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
 
  - Removal of some obsolete 40x platforms and associated cruft.
 
  - Initial support for booting on Power10.
 
  - Lots of other small features, cleanups & fixes.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
   Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
   Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
   Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
   George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
   Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
   Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
   Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
   Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
   Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
   Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
   Sang, Xiongfeng Wang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
 GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
 v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
 kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
 yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
 2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
 CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
 rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
 FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
 65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
 rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
 G3PXIBOI0jRgRw==
 =o0WU
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Support for userspace to send requests directly to the on-chip GZIP
   accelerator on Power9.

 - Rework of our lockless page table walking (__find_linux_pte()) to
   make it safe against parallel page table manipulations without
   relying on an IPI for serialisation.

 - A series of fixes & enhancements to make our machine check handling
   more robust.

 - Lots of plumbing to add support for "prefixed" (64-bit) instructions
   on Power10.

 - Support for using huge pages for the linear mapping on 8xx (32-bit).

 - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
   driver.

 - Removal of some obsolete 40x platforms and associated cruft.

 - Initial support for booting on Power10.

 - Lots of other small features, cleanups & fixes.

Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.

* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
  powerpc/pseries: Make vio and ibmebus initcalls pseries specific
  cxl: Remove dead Kconfig options
  powerpc: Add POWER10 architected mode
  powerpc/dt_cpu_ftrs: Add MMA feature
  powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
  powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
  powerpc: Add support for ISA v3.1
  powerpc: Add new HWCAP bits
  powerpc/64s: Don't set FSCR bits in INIT_THREAD
  powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
  powerpc/64s: Don't let DT CPU features set FSCR_DSCR
  powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
  powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
  powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
  powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
  powerpc/module_64: Consolidate ftrace code
  powerpc/32: Disable KASAN with pages bigger than 16k
  powerpc/uaccess: Don't set KUEP by default on book3s/32
  powerpc/uaccess: Don't set KUAP by default on book3s/32
  powerpc/8xx: Reduce time spent in allow_user_access() and friends
  ...
2020-06-05 12:39:30 -07:00
Alex Williamson
4f085ca2f5 Merge branch 'v5.8/vfio/kirti-migration-fixes' into v5.8/vfio/next 2020-06-02 13:53:00 -06:00
Kirti Wankhede
c8e9df4744 vfio iommu: typecast corrections
Fixes sparse warnings by adding '__user' in typecast for
copy_[from,to]_user()

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-02 13:44:28 -06:00
Kirti Wankhede
cd0bb41ea8 vfio iommu: Use shift operation for 64-bit integer division
Fixes compilation error with ARCH=i386.

Error fixed by this commit:
ld: drivers/vfio/vfio_iommu_type1.o: in function `vfio_dma_populate_bitmap':
>> vfio_iommu_type1.c:(.text+0x666): undefined reference to `__udivdi3'

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-02 13:38:21 -06:00
Alex Williamson
ea20868c71 Merge branch 'qiushi-wu-mdev-ref-v1' into v5.8/vfio/next 2020-05-29 16:17:33 -06:00
Qiushi Wu
aa8ba13cae vfio/mdev: Fix reference count leak in add_mdev_supported_type
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object. Thus,
replace kfree() by kobject_put() to fix this issue. Previous
commit "b8eb718348b8" fixed a similar problem.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29 16:07:18 -06:00
Kirti Wankhede
95fc87b441 vfio: Selective dirty page tracking if IOMMU backed device pins pages
Added a check such that only singleton IOMMU groups can pin pages.
>From the point when vendor driver pins any pages, consider IOMMU group
dirty page scope to be limited to pinned pages.

To optimize to avoid walking list often, added flag
pinned_page_dirty_scope to indicate if all of the vfio_groups for each
vfio_domain in the domain_list dirty page scope is limited to pinned
pages. This flag is updated on first pinned pages request for that IOMMU
group and on attaching/detaching group.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:29 -06:00
Kirti Wankhede
ad721705d0 vfio iommu: Add migration capability to report supported features
Added migration capability in IOMMU info chain.
User application should check IOMMU info chain for migration capability
to use dirty page tracking feature provided by kernel module.
User application must check page sizes supported and maximum dirty
bitmap size returned by this capability structure for ioctls used to get
dirty bitmap.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:27 -06:00
Kirti Wankhede
331e33d296 vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
it all zeros, set size of allocated memory, set page size to be
considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:26 -06:00
Kirti Wankhede
d6a4c18566 vfio iommu: Implementation of ioctl for dirty pages tracking
VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

Fixed error reported by build bot by changing pgsize type from uint64_t
to size_t.
Reported-by: kbuild test robot <lkp@intel.com>

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:25 -06:00
Kirti Wankhede
cade075f26 vfio iommu: Cache pgsize_bitmap in struct vfio_iommu
Calculate and cache pgsize_bitmap when iommu->domain_list is updated
and iommu->external_domain is set for mdev device.
Add iommu->lock protection when cached pgsize_bitmap is accessed.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:22 -06:00
Kirti Wankhede
6581708586 vfio iommu: Remove atomicity of ref_count of pinned pages
vfio_pfn.ref_count is always updated while holding iommu->lock, using
atomic variable is overkill.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28 15:53:21 -06:00
Alex Williamson
cd34b82e6e Merge branches 'v5.8/vfio/alex-block-mmio-v3', 'v5.8/vfio/alex-zero-cap-v2' and 'v5.8/vfio/qian-leak-fixes' into v5.8/vfio/next 2020-05-26 10:27:15 -06:00
Qian Cai
1518ac272e vfio/pci: fix memory leaks of eventfd ctx
Finished a qemu-kvm (-device vfio-pci,host=0001:01:00.0) triggers a few
memory leaks after a while because vfio_pci_set_ctx_trigger_single()
calls eventfd_ctx_fdget() without the matching eventfd_ctx_put() later.
Fix it by calling eventfd_ctx_put() for those memory in
vfio_pci_release() before vfio_device_release().

unreferenced object 0xebff008981cc2b00 (size 128):
  comm "qemu-kvm", pid 4043, jiffies 4294994816 (age 9796.310s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
    [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
    [<000000005fcec025>] do_eventfd+0x54/0x1ac
    [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
    [<00000000b819758c>] do_el0_svc+0x128/0x1dc
    [<00000000b244e810>] el0_sync_handler+0xd0/0x268
    [<00000000d495ef94>] el0_sync+0x164/0x180
unreferenced object 0x29ff008981cc4180 (size 128):
  comm "qemu-kvm", pid 4043, jiffies 4294994818 (age 9796.290s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
    [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
    [<000000005fcec025>] do_eventfd+0x54/0x1ac
    [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
    [<00000000b819758c>] do_el0_svc+0x128/0x1dc
    [<00000000b244e810>] el0_sync_handler+0xd0/0x268
    [<00000000d495ef94>] el0_sync+0x164/0x180

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-26 10:23:22 -06:00
Qian Cai
3e63b94b62 vfio/pci: fix memory leaks in alloc_perm_bits()
vfio_pci_disable() calls vfio_config_free() but forgets to call
free_perm_bits() resulting in memory leaks,

unreferenced object 0xc000000c4db2dee0 (size 16):
  comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s)
  hex dump (first 16 bytes):
    00 00 ff 00 ff ff ff ff ff ff ff ff ff ff 00 00  ................
  backtrace:
    [<00000000a6a4552d>] alloc_perm_bits+0x58/0xe0 [vfio_pci]
    [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci]
    init_pci_cap_msi_perm at drivers/vfio/pci/vfio_pci_config.c:1125
    (inlined by) vfio_msi_cap_len at drivers/vfio/pci/vfio_pci_config.c:1180
    (inlined by) vfio_cap_len at drivers/vfio/pci/vfio_pci_config.c:1241
    (inlined by) vfio_cap_init at drivers/vfio/pci/vfio_pci_config.c:1468
    (inlined by) vfio_config_init at drivers/vfio/pci/vfio_pci_config.c:1707
    [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci]
    [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio]
    [<000000009e34c54f>] ksys_ioctl+0xd8/0x130
    [<000000006577923d>] sys_ioctl+0x28/0x40
    [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0
    [<0000000008ea7dd5>] system_call_common+0xf0/0x278
unreferenced object 0xc000000c4db2e330 (size 16):
  comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s)
  hex dump (first 16 bytes):
    00 ff ff 00 ff ff ff ff ff ff ff ff ff ff 00 00  ................
  backtrace:
    [<000000004c71914f>] alloc_perm_bits+0x44/0xe0 [vfio_pci]
    [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci]
    [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci]
    [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio]
    [<000000009e34c54f>] ksys_ioctl+0xd8/0x130
    [<000000006577923d>] sys_ioctl+0x28/0x40
    [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0
    [<0000000008ea7dd5>] system_call_common+0xf0/0x278

Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Signed-off-by: Qian Cai <cai@lca.pw>
[aw: rolled in follow-up patch]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18 09:58:28 -06:00