PCI: Add pci_bus_addr_t

David Ahern reported that d63e2e1f3d ("sparc/PCI: Clip bridge windows
to fit in upstream windows") fails to boot on sparc/T5-8:

  pci 0000:06:00.0: reg 0x184: can't handle BAR above 4GB (bus address 0x110204000)

The problem is that sparc64 assumed that dma_addr_t only needed to hold DMA
addresses, i.e., bus addresses returned via the DMA API (dma_map_single(),
etc.), while the PCI core assumed dma_addr_t could hold *any* bus address,
including raw BAR values.  On sparc64, all DMA addresses fit in 32 bits, so
dma_addr_t is a 32-bit type.  However, BAR values can be 64 bits wide, so
they don't fit in a dma_addr_t.  d63e2e1f3d added new checking that
tripped over this mismatch.

Add pci_bus_addr_t, which is wide enough to hold any PCI bus address,
including both raw BAR values and DMA addresses.  This will be 64 bits
on 64-bit platforms and on platforms with a 64-bit dma_addr_t.  Then
dma_addr_t only needs to be wide enough to hold addresses from the DMA API.

[bhelgaas: changelog, bugzilla, Kconfig to ensure pci_bus_addr_t is at
least as wide as dma_addr_t, documentation]
Fixes: d63e2e1f3d ("sparc/PCI: Clip bridge windows to fit in upstream windows")
Fixes: 23b13bc76f ("PCI: Fail safely if we can't handle BARs larger than 4GB")
Link: http://lkml.kernel.org/r/CAE9FiQU1gJY1LYrxs+ma5LCTEEe4xmtjRG0aXJ9K_Tsu+m9Wuw@mail.gmail.com
Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96231
Reported-by: David Ahern <david.ahern@oracle.com>
Tested-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
CC: stable@vger.kernel.org	# v3.19+
This commit is contained in:
Yinghai Lu 2015-05-27 17:23:51 -07:00 committed by Bjorn Helgaas
parent 5ebe6afaf0
commit 3a9ad0b4fd
7 changed files with 66 additions and 43 deletions

View File

@ -25,13 +25,18 @@ physical addresses. These are the addresses in /proc/iomem. The physical
address is not directly useful to a driver; it must use ioremap() to map address is not directly useful to a driver; it must use ioremap() to map
the space and produce a virtual address. the space and produce a virtual address.
I/O devices use a third kind of address: a "bus address" or "DMA address". I/O devices use a third kind of address: a "bus address". If a device has
If a device has registers at an MMIO address, or if it performs DMA to read registers at an MMIO address, or if it performs DMA to read or write system
or write system memory, the addresses used by the device are bus addresses. memory, the addresses used by the device are bus addresses. In some
In some systems, bus addresses are identical to CPU physical addresses, but systems, bus addresses are identical to CPU physical addresses, but in
in general they are not. IOMMUs and host bridges can produce arbitrary general they are not. IOMMUs and host bridges can produce arbitrary
mappings between physical and bus addresses. mappings between physical and bus addresses.
From a device's point of view, DMA uses the bus address space, but it may
be restricted to a subset of that space. For example, even if a system
supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU
so devices only need to use 32-bit DMA addresses.
Here's a picture and some examples: Here's a picture and some examples:
CPU CPU Bus CPU CPU Bus
@ -72,11 +77,11 @@ can use virtual address X to access the buffer, but the device itself
cannot because DMA doesn't go through the CPU virtual memory system. cannot because DMA doesn't go through the CPU virtual memory system.
In some simple systems, the device can do DMA directly to physical address In some simple systems, the device can do DMA directly to physical address
Y. But in many others, there is IOMMU hardware that translates bus Y. But in many others, there is IOMMU hardware that translates DMA
addresses to physical addresses, e.g., it translates Z to Y. This is part addresses to physical addresses, e.g., it translates Z to Y. This is part
of the reason for the DMA API: the driver can give a virtual address X to of the reason for the DMA API: the driver can give a virtual address X to
an interface like dma_map_single(), which sets up any required IOMMU an interface like dma_map_single(), which sets up any required IOMMU
mapping and returns the bus address Z. The driver then tells the device to mapping and returns the DMA address Z. The driver then tells the device to
do DMA to Z, and the IOMMU maps it to the buffer at address Y in system do DMA to Z, and the IOMMU maps it to the buffer at address Y in system
RAM. RAM.
@ -98,7 +103,7 @@ First of all, you should make sure
#include <linux/dma-mapping.h> #include <linux/dma-mapping.h>
is in your driver, which provides the definition of dma_addr_t. This type is in your driver, which provides the definition of dma_addr_t. This type
can hold any valid DMA or bus address for the platform and should be used can hold any valid DMA address for the platform and should be used
everywhere you hold a DMA address returned from the DMA mapping functions. everywhere you hold a DMA address returned from the DMA mapping functions.
What memory is DMA'able? What memory is DMA'able?
@ -316,7 +321,7 @@ There are two types of DMA mappings:
Think of "consistent" as "synchronous" or "coherent". Think of "consistent" as "synchronous" or "coherent".
The current default is to return consistent memory in the low 32 The current default is to return consistent memory in the low 32
bits of the bus space. However, for future compatibility you should bits of the DMA space. However, for future compatibility you should
set the consistent mask even if this default is fine for your set the consistent mask even if this default is fine for your
driver. driver.
@ -403,7 +408,7 @@ dma_alloc_coherent() returns two values: the virtual address which you
can use to access it from the CPU and dma_handle which you pass to the can use to access it from the CPU and dma_handle which you pass to the
card. card.
The CPU virtual address and the DMA bus address are both The CPU virtual address and the DMA address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size. This invariant is greater than or equal to the requested size. This invariant
exists (for example) to guarantee that if you allocate a chunk exists (for example) to guarantee that if you allocate a chunk
@ -645,8 +650,8 @@ PLEASE NOTE: The 'nents' argument to the dma_unmap_sg call must be
dma_map_sg call. dma_map_sg call.
Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}() Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}()
counterpart, because the bus address space is a shared resource and counterpart, because the DMA address space is a shared resource and
you could render the machine unusable by consuming all bus addresses. you could render the machine unusable by consuming all DMA addresses.
If you need to use the same streaming DMA region multiple times and touch If you need to use the same streaming DMA region multiple times and touch
the data in between the DMA transfers, the buffer needs to be synced the data in between the DMA transfers, the buffer needs to be synced

View File

@ -18,10 +18,10 @@ Part I - dma_ API
To get the dma_ API, you must #include <linux/dma-mapping.h>. This To get the dma_ API, you must #include <linux/dma-mapping.h>. This
provides dma_addr_t and the interfaces described below. provides dma_addr_t and the interfaces described below.
A dma_addr_t can hold any valid DMA or bus address for the platform. It A dma_addr_t can hold any valid DMA address for the platform. It can be
can be given to a device to use as a DMA source or target. A CPU cannot given to a device to use as a DMA source or target. A CPU cannot reference
reference a dma_addr_t directly because there may be translation between a dma_addr_t directly because there may be translation between its physical
its physical address space and the bus address space. address space and the DMA address space.
Part Ia - Using large DMA-coherent buffers Part Ia - Using large DMA-coherent buffers
------------------------------------------ ------------------------------------------
@ -42,7 +42,7 @@ It returns a pointer to the allocated region (in the processor's virtual
address space) or NULL if the allocation failed. address space) or NULL if the allocation failed.
It also returns a <dma_handle> which may be cast to an unsigned integer the It also returns a <dma_handle> which may be cast to an unsigned integer the
same width as the bus and given to the device as the bus address base of same width as the bus and given to the device as the DMA address base of
the region. the region.
Note: consistent memory can be expensive on some platforms, and the Note: consistent memory can be expensive on some platforms, and the
@ -193,7 +193,7 @@ dma_map_single(struct device *dev, void *cpu_addr, size_t size,
enum dma_data_direction direction) enum dma_data_direction direction)
Maps a piece of processor virtual memory so it can be accessed by the Maps a piece of processor virtual memory so it can be accessed by the
device and returns the bus address of the memory. device and returns the DMA address of the memory.
The direction for both APIs may be converted freely by casting. The direction for both APIs may be converted freely by casting.
However the dma_ API uses a strongly typed enumerator for its However the dma_ API uses a strongly typed enumerator for its
@ -212,20 +212,20 @@ contiguous piece of memory. For this reason, memory to be mapped by
this API should be obtained from sources which guarantee it to be this API should be obtained from sources which guarantee it to be
physically contiguous (like kmalloc). physically contiguous (like kmalloc).
Further, the bus address of the memory must be within the Further, the DMA address of the memory must be within the
dma_mask of the device (the dma_mask is a bit mask of the dma_mask of the device (the dma_mask is a bit mask of the
addressable region for the device, i.e., if the bus address of addressable region for the device, i.e., if the DMA address of
the memory ANDed with the dma_mask is still equal to the bus the memory ANDed with the dma_mask is still equal to the DMA
address, then the device can perform DMA to the memory). To address, then the device can perform DMA to the memory). To
ensure that the memory allocated by kmalloc is within the dma_mask, ensure that the memory allocated by kmalloc is within the dma_mask,
the driver may specify various platform-dependent flags to restrict the driver may specify various platform-dependent flags to restrict
the bus address range of the allocation (e.g., on x86, GFP_DMA the DMA address range of the allocation (e.g., on x86, GFP_DMA
guarantees to be within the first 16MB of available bus addresses, guarantees to be within the first 16MB of available DMA addresses,
as required by ISA devices). as required by ISA devices).
Note also that the above constraints on physical contiguity and Note also that the above constraints on physical contiguity and
dma_mask may not apply if the platform has an IOMMU (a device which dma_mask may not apply if the platform has an IOMMU (a device which
maps an I/O bus address to a physical memory address). However, to be maps an I/O DMA address to a physical memory address). However, to be
portable, device driver writers may *not* assume that such an IOMMU portable, device driver writers may *not* assume that such an IOMMU
exists. exists.
@ -296,7 +296,7 @@ reduce current DMA mapping usage or delay and try again later).
dma_map_sg(struct device *dev, struct scatterlist *sg, dma_map_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction direction) int nents, enum dma_data_direction direction)
Returns: the number of bus address segments mapped (this may be shorter Returns: the number of DMA address segments mapped (this may be shorter
than <nents> passed in if some elements of the scatter/gather list are than <nents> passed in if some elements of the scatter/gather list are
physically or virtually adjacent and an IOMMU maps them with a single physically or virtually adjacent and an IOMMU maps them with a single
entry). entry).
@ -340,7 +340,7 @@ must be the same as those and passed in to the scatter/gather mapping
API. API.
Note: <nents> must be the number you passed in, *not* the number of Note: <nents> must be the number you passed in, *not* the number of
bus address entries returned. DMA address entries returned.
void void
dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size, dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
@ -507,7 +507,7 @@ it's asked for coherent memory for this device.
phys_addr is the CPU physical address to which the memory is currently phys_addr is the CPU physical address to which the memory is currently
assigned (this will be ioremapped so the CPU can access the region). assigned (this will be ioremapped so the CPU can access the region).
device_addr is the bus address the device needs to be programmed device_addr is the DMA address the device needs to be programmed
with to actually address this memory (this will be handed out as the with to actually address this memory (this will be handed out as the
dma_addr_t in dma_alloc_coherent()). dma_addr_t in dma_alloc_coherent()).

View File

@ -1,6 +1,10 @@
# #
# PCI configuration # PCI configuration
# #
config PCI_BUS_ADDR_T_64BIT
def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
depends on PCI
config PCI_MSI config PCI_MSI
bool "Message Signaled Interrupts (MSI and MSI-X)" bool "Message Signaled Interrupts (MSI and MSI-X)"
depends on PCI depends on PCI

View File

@ -92,11 +92,11 @@ void pci_bus_remove_resources(struct pci_bus *bus)
} }
static struct pci_bus_region pci_32_bit = {0, 0xffffffffULL}; static struct pci_bus_region pci_32_bit = {0, 0xffffffffULL};
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT #ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
static struct pci_bus_region pci_64_bit = {0, static struct pci_bus_region pci_64_bit = {0,
(dma_addr_t) 0xffffffffffffffffULL}; (pci_bus_addr_t) 0xffffffffffffffffULL};
static struct pci_bus_region pci_high = {(dma_addr_t) 0x100000000ULL, static struct pci_bus_region pci_high = {(pci_bus_addr_t) 0x100000000ULL,
(dma_addr_t) 0xffffffffffffffffULL}; (pci_bus_addr_t) 0xffffffffffffffffULL};
#endif #endif
/* /*
@ -200,7 +200,7 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
resource_size_t), resource_size_t),
void *alignf_data) void *alignf_data)
{ {
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT #ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
int rc; int rc;
if (res->flags & IORESOURCE_MEM_64) { if (res->flags & IORESOURCE_MEM_64) {

View File

@ -254,8 +254,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
} }
if (res->flags & IORESOURCE_MEM_64) { if (res->flags & IORESOURCE_MEM_64) {
if ((sizeof(dma_addr_t) < 8 || sizeof(resource_size_t) < 8) && if ((sizeof(pci_bus_addr_t) < 8 || sizeof(resource_size_t) < 8)
sz64 > 0x100000000ULL) { && sz64 > 0x100000000ULL) {
res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED; res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
res->start = 0; res->start = 0;
res->end = 0; res->end = 0;
@ -264,7 +264,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
goto out; goto out;
} }
if ((sizeof(dma_addr_t) < 8) && l) { if ((sizeof(pci_bus_addr_t) < 8) && l) {
/* Above 32-bit boundary; try to reallocate */ /* Above 32-bit boundary; try to reallocate */
res->flags |= IORESOURCE_UNSET; res->flags |= IORESOURCE_UNSET;
res->start = 0; res->start = 0;
@ -399,7 +399,7 @@ static void pci_read_bridge_mmio_pref(struct pci_bus *child)
struct pci_dev *dev = child->self; struct pci_dev *dev = child->self;
u16 mem_base_lo, mem_limit_lo; u16 mem_base_lo, mem_limit_lo;
u64 base64, limit64; u64 base64, limit64;
dma_addr_t base, limit; pci_bus_addr_t base, limit;
struct pci_bus_region region; struct pci_bus_region region;
struct resource *res; struct resource *res;
@ -426,8 +426,8 @@ static void pci_read_bridge_mmio_pref(struct pci_bus *child)
} }
} }
base = (dma_addr_t) base64; base = (pci_bus_addr_t) base64;
limit = (dma_addr_t) limit64; limit = (pci_bus_addr_t) limit64;
if (base != base64) { if (base != base64) {
dev_err(&dev->dev, "can't handle bridge window above 4GB (bus address %#010llx)\n", dev_err(&dev->dev, "can't handle bridge window above 4GB (bus address %#010llx)\n",

View File

@ -577,9 +577,15 @@ int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn, int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 val); int reg, int len, u32 val);
#ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
typedef u64 pci_bus_addr_t;
#else
typedef u32 pci_bus_addr_t;
#endif
struct pci_bus_region { struct pci_bus_region {
dma_addr_t start; pci_bus_addr_t start;
dma_addr_t end; pci_bus_addr_t end;
}; };
struct pci_dynids { struct pci_dynids {
@ -1128,7 +1134,7 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr); int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
static inline dma_addr_t pci_bus_address(struct pci_dev *pdev, int bar) static inline pci_bus_addr_t pci_bus_address(struct pci_dev *pdev, int bar)
{ {
struct pci_bus_region region; struct pci_bus_region region;

View File

@ -139,12 +139,20 @@ typedef unsigned long blkcnt_t;
*/ */
#define pgoff_t unsigned long #define pgoff_t unsigned long
/* A dma_addr_t can hold any valid DMA or bus address for the platform */ /*
* A dma_addr_t can hold any valid DMA address, i.e., any address returned
* by the DMA API.
*
* If the DMA API only uses 32-bit addresses, dma_addr_t need only be 32
* bits wide. Bus addresses, e.g., PCI BARs, may be wider than 32 bits,
* but drivers do memory-mapped I/O to ioremapped kernel virtual addresses,
* so they don't care about the size of the actual bus addresses.
*/
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
typedef u64 dma_addr_t; typedef u64 dma_addr_t;
#else #else
typedef u32 dma_addr_t; typedef u32 dma_addr_t;
#endif /* dma_addr_t */ #endif
typedef unsigned __bitwise__ gfp_t; typedef unsigned __bitwise__ gfp_t;
typedef unsigned __bitwise__ fmode_t; typedef unsigned __bitwise__ fmode_t;