Changes for 4.3

- Create drivers/staging/rdma
 - Move amso1100 driver to staging/rdma and schedule for deletion
 - Move ipath driver to staging/rdma and schedule for deletion
 - Add hfi1 driver to staging/rdma and set TODO for move to regular tree
 - Initial support for namespaces to be used on RDMA devices
 - Add RoCE GID table handling to the RDMA core caching code
 - Infrastructure to support handling of devices with differing
   read and write scatter gather capabilities
 - Various iSER updates
 - Kill off unsafe usage of global mr registrations
 - Update SRP driver
 - Misc. mlx4 driver updates
 - Support for the mr_alloc verb
 - Support for a netlink interface between kernel and user space cache
   daemon to speed path record queries and route resolution
 - Ininitial support for safe hot removal of verbs devices
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj
 nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ
 yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU
 yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135
 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5
 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW
 egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE
 n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp
 HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L
 /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT
 WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt
 30ZYFuW8evTL+YQcaV65
 =EHLg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull inifiniband/rdma updates from Doug Ledford:
 "This is a fairly sizeable set of changes.  I've put them through a
  decent amount of testing prior to sending the pull request due to
  that.

  There are still a few fixups that I know are coming, but I wanted to
  go ahead and get the big, sizable chunk into your hands sooner rather
  than waiting for those last few fixups.

  Of note is the fact that this creates what is intended to be a
  temporary area in the drivers/staging tree specifically for some
  cleanups and additions that are coming for the RDMA stack.  We
  deprecated two drivers (ipath and amso1100) and are waiting to hear
  back if we can deprecate another one (ehca).  We also put Intel's new
  hfi1 driver into this area because it needs to be refactored and a
  transfer library created out of the factored out code, and then it and
  the qib driver and the soft-roce driver should all be modified to use
  that library.

  I expect drivers/staging/rdma to be around for three or four kernel
  releases and then to go away as all of the work is completed and final
  deletions of deprecated drivers are done.

  Summary of changes for 4.3:

   - Create drivers/staging/rdma
   - Move amso1100 driver to staging/rdma and schedule for deletion
   - Move ipath driver to staging/rdma and schedule for deletion
   - Add hfi1 driver to staging/rdma and set TODO for move to regular
     tree
   - Initial support for namespaces to be used on RDMA devices
   - Add RoCE GID table handling to the RDMA core caching code
   - Infrastructure to support handling of devices with differing read
     and write scatter gather capabilities
   - Various iSER updates
   - Kill off unsafe usage of global mr registrations
   - Update SRP driver
   - Misc  mlx4 driver updates
   - Support for the mr_alloc verb
   - Support for a netlink interface between kernel and user space cache
     daemon to speed path record queries and route resolution
   - Ininitial support for safe hot removal of verbs devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
  IB/ipoib: Suppress warning for send only join failures
  IB/ipoib: Clean up send-only multicast joins
  IB/srp: Fix possible protection fault
  IB/core: Move SM class defines from ib_mad.h to ib_smi.h
  IB/core: Remove unnecessary defines from ib_mad.h
  IB/hfi1: Add PSM2 user space header to header_install
  IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
  mlx5: Fix incorrect wc pkey_index assignment for GSI messages
  IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
  IB/uverbs: reject invalid or unknown opcodes
  IB/cxgb4: Fix if statement in pick_local_ip6adddrs
  IB/sa: Fix rdma netlink message flags
  IB/ucma: HW Device hot-removal support
  IB/mlx4_ib: Disassociate support
  IB/uverbs: Enable device removal when there are active user space applications
  IB/uverbs: Explicitly pass ib_dev to uverbs commands
  IB/uverbs: Fix race between ib_uverbs_open and remove_one
  IB/uverbs: Fix reference counting usage of event files
  IB/core: Make ib_dealloc_pd return void
  IB/srp: Create an insecure all physical rkey only if needed
  ...
This commit is contained in:
Linus Torvalds 2015-09-09 08:33:31 -07:00
commit 26d2177e97
231 changed files with 64456 additions and 2756 deletions

View File

@ -64,3 +64,23 @@ MTHCA
fw_ver - Firmware version
hca_type - HCA type: "MT23108", "MT25208 (MT23108 compat mode)",
or "MT25208"
HFI1
The hfi1 driver also creates these additional files:
hw_rev - hardware revision
board_id - manufacturing board id
tempsense - thermal sense information
serial - board serial number
nfreectxts - number of free user contexts
nctxts - number of allowed contexts (PSM2)
chip_reset - diagnostic (root only)
boardversion - board version
ports/1/
CMgtA/
cc_settings_bin - CCA tables used by PSM2
cc_table_bin
sc2v/ - 32 files (0 - 31) used to translate sl->vl
sl2sc/ - 32 files (0 - 31) used to translate sl->sc
vl2mtu/ - 16 (0 - 15) files used to determine MTU for vl

View File

@ -5341,6 +5341,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
S: Supported
F: Documentation/infiniband/
F: drivers/infiniband/
F: drivers/staging/rdma/
F: include/uapi/linux/if_infiniband.h
F: include/uapi/rdma/
F: include/rdma/
@ -5598,7 +5599,7 @@ IPATH DRIVER
M: Mike Marciniszyn <infinipath@intel.com>
L: linux-rdma@vger.kernel.org
S: Maintained
F: drivers/infiniband/hw/ipath/
F: drivers/staging/rdma/ipath/
IPMI SUBSYSTEM
M: Corey Minyard <minyard@acm.org>
@ -9976,6 +9977,12 @@ M: Arnaud Patard <arnaud.patard@rtp-net.org>
S: Odd Fixes
F: drivers/staging/xgifb/
HFI1 DRIVER
M: Mike Marciniszyn <infinipath@intel.com>
L: linux-rdma@vger.kernel.org
S: Supported
F: drivers/staging/rdma/hfi1
STARFIRE/DURALAN NETWORK DRIVER
M: Ion Badulescu <ionut@badula.org>
S: Odd Fixes

View File

@ -55,10 +55,8 @@ config INFINIBAND_ADDR_TRANS
default y
source "drivers/infiniband/hw/mthca/Kconfig"
source "drivers/infiniband/hw/ipath/Kconfig"
source "drivers/infiniband/hw/qib/Kconfig"
source "drivers/infiniband/hw/ehca/Kconfig"
source "drivers/infiniband/hw/amso1100/Kconfig"
source "drivers/infiniband/hw/cxgb3/Kconfig"
source "drivers/infiniband/hw/cxgb4/Kconfig"
source "drivers/infiniband/hw/mlx4/Kconfig"

View File

@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
ib_core-y := packer.o ud_header.o verbs.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o
device.o fmr_pool.o cache.o netlink.o \
roce_gid_mgmt.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o

View File

@ -37,6 +37,8 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/workqueue.h>
#include <linux/netdevice.h>
#include <net/addrconf.h>
#include <rdma/ib_cache.h>
@ -47,76 +49,621 @@ struct ib_pkey_cache {
u16 table[0];
};
struct ib_gid_cache {
int table_len;
union ib_gid table[0];
};
struct ib_update_work {
struct work_struct work;
struct ib_device *device;
u8 port_num;
};
union ib_gid zgid;
EXPORT_SYMBOL(zgid);
static const struct ib_gid_attr zattr;
enum gid_attr_find_mask {
GID_ATTR_FIND_MASK_GID = 1UL << 0,
GID_ATTR_FIND_MASK_NETDEV = 1UL << 1,
GID_ATTR_FIND_MASK_DEFAULT = 1UL << 2,
};
enum gid_table_entry_props {
GID_TABLE_ENTRY_INVALID = 1UL << 0,
GID_TABLE_ENTRY_DEFAULT = 1UL << 1,
};
enum gid_table_write_action {
GID_TABLE_WRITE_ACTION_ADD,
GID_TABLE_WRITE_ACTION_DEL,
/* MODIFY only updates the GID table. Currently only used by
* ib_cache_update.
*/
GID_TABLE_WRITE_ACTION_MODIFY
};
struct ib_gid_table_entry {
/* This lock protects an entry from being
* read and written simultaneously.
*/
rwlock_t lock;
unsigned long props;
union ib_gid gid;
struct ib_gid_attr attr;
void *context;
};
struct ib_gid_table {
int sz;
/* In RoCE, adding a GID to the table requires:
* (a) Find if this GID is already exists.
* (b) Find a free space.
* (c) Write the new GID
*
* Delete requires different set of operations:
* (a) Find the GID
* (b) Delete it.
*
* Add/delete should be carried out atomically.
* This is done by locking this mutex from multiple
* writers. We don't need this lock for IB, as the MAD
* layer replaces all entries. All data_vec entries
* are locked by this lock.
**/
struct mutex lock;
struct ib_gid_table_entry *data_vec;
};
static int write_gid(struct ib_device *ib_dev, u8 port,
struct ib_gid_table *table, int ix,
const union ib_gid *gid,
const struct ib_gid_attr *attr,
enum gid_table_write_action action,
bool default_gid)
{
int ret = 0;
struct net_device *old_net_dev;
unsigned long flags;
/* in rdma_cap_roce_gid_table, this funciton should be protected by a
* sleep-able lock.
*/
write_lock_irqsave(&table->data_vec[ix].lock, flags);
if (rdma_cap_roce_gid_table(ib_dev, port)) {
table->data_vec[ix].props |= GID_TABLE_ENTRY_INVALID;
write_unlock_irqrestore(&table->data_vec[ix].lock, flags);
/* GID_TABLE_WRITE_ACTION_MODIFY currently isn't supported by
* RoCE providers and thus only updates the cache.
*/
if (action == GID_TABLE_WRITE_ACTION_ADD)
ret = ib_dev->add_gid(ib_dev, port, ix, gid, attr,
&table->data_vec[ix].context);
else if (action == GID_TABLE_WRITE_ACTION_DEL)
ret = ib_dev->del_gid(ib_dev, port, ix,
&table->data_vec[ix].context);
write_lock_irqsave(&table->data_vec[ix].lock, flags);
}
old_net_dev = table->data_vec[ix].attr.ndev;
if (old_net_dev && old_net_dev != attr->ndev)
dev_put(old_net_dev);
/* if modify_gid failed, just delete the old gid */
if (ret || action == GID_TABLE_WRITE_ACTION_DEL) {
gid = &zgid;
attr = &zattr;
table->data_vec[ix].context = NULL;
}
if (default_gid)
table->data_vec[ix].props |= GID_TABLE_ENTRY_DEFAULT;
memcpy(&table->data_vec[ix].gid, gid, sizeof(*gid));
memcpy(&table->data_vec[ix].attr, attr, sizeof(*attr));
if (table->data_vec[ix].attr.ndev &&
table->data_vec[ix].attr.ndev != old_net_dev)
dev_hold(table->data_vec[ix].attr.ndev);
table->data_vec[ix].props &= ~GID_TABLE_ENTRY_INVALID;
write_unlock_irqrestore(&table->data_vec[ix].lock, flags);
if (!ret && rdma_cap_roce_gid_table(ib_dev, port)) {
struct ib_event event;
event.device = ib_dev;
event.element.port_num = port;
event.event = IB_EVENT_GID_CHANGE;
ib_dispatch_event(&event);
}
return ret;
}
static int add_gid(struct ib_device *ib_dev, u8 port,
struct ib_gid_table *table, int ix,
const union ib_gid *gid,
const struct ib_gid_attr *attr,
bool default_gid) {
return write_gid(ib_dev, port, table, ix, gid, attr,
GID_TABLE_WRITE_ACTION_ADD, default_gid);
}
static int modify_gid(struct ib_device *ib_dev, u8 port,
struct ib_gid_table *table, int ix,
const union ib_gid *gid,
const struct ib_gid_attr *attr,
bool default_gid) {
return write_gid(ib_dev, port, table, ix, gid, attr,
GID_TABLE_WRITE_ACTION_MODIFY, default_gid);
}
static int del_gid(struct ib_device *ib_dev, u8 port,
struct ib_gid_table *table, int ix,
bool default_gid) {
return write_gid(ib_dev, port, table, ix, &zgid, &zattr,
GID_TABLE_WRITE_ACTION_DEL, default_gid);
}
static int find_gid(struct ib_gid_table *table, const union ib_gid *gid,
const struct ib_gid_attr *val, bool default_gid,
unsigned long mask)
{
int i;
for (i = 0; i < table->sz; i++) {
unsigned long flags;
struct ib_gid_attr *attr = &table->data_vec[i].attr;
read_lock_irqsave(&table->data_vec[i].lock, flags);
if (table->data_vec[i].props & GID_TABLE_ENTRY_INVALID)
goto next;
if (mask & GID_ATTR_FIND_MASK_GID &&
memcmp(gid, &table->data_vec[i].gid, sizeof(*gid)))
goto next;
if (mask & GID_ATTR_FIND_MASK_NETDEV &&
attr->ndev != val->ndev)
goto next;
if (mask & GID_ATTR_FIND_MASK_DEFAULT &&
!!(table->data_vec[i].props & GID_TABLE_ENTRY_DEFAULT) !=
default_gid)
goto next;
read_unlock_irqrestore(&table->data_vec[i].lock, flags);
return i;
next:
read_unlock_irqrestore(&table->data_vec[i].lock, flags);
}
return -1;
}
static void make_default_gid(struct net_device *dev, union ib_gid *gid)
{
gid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
addrconf_ifid_eui48(&gid->raw[8], dev);
}
int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr)
{
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
struct ib_gid_table *table;
int ix;
int ret = 0;
struct net_device *idev;
table = ports_table[port - rdma_start_port(ib_dev)];
if (!memcmp(gid, &zgid, sizeof(*gid)))
return -EINVAL;
if (ib_dev->get_netdev) {
idev = ib_dev->get_netdev(ib_dev, port);
if (idev && attr->ndev != idev) {
union ib_gid default_gid;
/* Adding default GIDs in not permitted */
make_default_gid(idev, &default_gid);
if (!memcmp(gid, &default_gid, sizeof(*gid))) {
dev_put(idev);
return -EPERM;
}
}
if (idev)
dev_put(idev);
}
mutex_lock(&table->lock);
ix = find_gid(table, gid, attr, false, GID_ATTR_FIND_MASK_GID |
GID_ATTR_FIND_MASK_NETDEV);
if (ix >= 0)
goto out_unlock;
ix = find_gid(table, &zgid, NULL, false, GID_ATTR_FIND_MASK_GID |
GID_ATTR_FIND_MASK_DEFAULT);
if (ix < 0) {
ret = -ENOSPC;
goto out_unlock;
}
add_gid(ib_dev, port, table, ix, gid, attr, false);
out_unlock:
mutex_unlock(&table->lock);
return ret;
}
int ib_cache_gid_del(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr)
{
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
struct ib_gid_table *table;
int ix;
table = ports_table[port - rdma_start_port(ib_dev)];
mutex_lock(&table->lock);
ix = find_gid(table, gid, attr, false,
GID_ATTR_FIND_MASK_GID |
GID_ATTR_FIND_MASK_NETDEV |
GID_ATTR_FIND_MASK_DEFAULT);
if (ix < 0)
goto out_unlock;
del_gid(ib_dev, port, table, ix, false);
out_unlock:
mutex_unlock(&table->lock);
return 0;
}
int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
struct net_device *ndev)
{
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
struct ib_gid_table *table;
int ix;
table = ports_table[port - rdma_start_port(ib_dev)];
mutex_lock(&table->lock);
for (ix = 0; ix < table->sz; ix++)
if (table->data_vec[ix].attr.ndev == ndev)
del_gid(ib_dev, port, table, ix, false);
mutex_unlock(&table->lock);
return 0;
}
static int __ib_cache_gid_get(struct ib_device *ib_dev, u8 port, int index,
union ib_gid *gid, struct ib_gid_attr *attr)
{
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
struct ib_gid_table *table;
unsigned long flags;
table = ports_table[port - rdma_start_port(ib_dev)];
if (index < 0 || index >= table->sz)
return -EINVAL;
read_lock_irqsave(&table->data_vec[index].lock, flags);
if (table->data_vec[index].props & GID_TABLE_ENTRY_INVALID) {
read_unlock_irqrestore(&table->data_vec[index].lock, flags);
return -EAGAIN;
}
memcpy(gid, &table->data_vec[index].gid, sizeof(*gid));
if (attr) {
memcpy(attr, &table->data_vec[index].attr, sizeof(*attr));
if (attr->ndev)
dev_hold(attr->ndev);
}
read_unlock_irqrestore(&table->data_vec[index].lock, flags);
return 0;
}
static int _ib_cache_gid_table_find(struct ib_device *ib_dev,
const union ib_gid *gid,
const struct ib_gid_attr *val,
unsigned long mask,
u8 *port, u16 *index)
{
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
struct ib_gid_table *table;
u8 p;
int local_index;
for (p = 0; p < ib_dev->phys_port_cnt; p++) {
table = ports_table[p];
local_index = find_gid(table, gid, val, false, mask);
if (local_index >= 0) {
if (index)
*index = local_index;
if (port)
*port = p + rdma_start_port(ib_dev);
return 0;
}
}
return -ENOENT;
}
static int ib_cache_gid_find(struct ib_device *ib_dev,
const union ib_gid *gid,
struct net_device *ndev, u8 *port,
u16 *index)
{
unsigned long mask = GID_ATTR_FIND_MASK_GID;
struct ib_gid_attr gid_attr_val = {.ndev = ndev};
if (ndev)
mask |= GID_ATTR_FIND_MASK_NETDEV;
return _ib_cache_gid_table_find(ib_dev, gid, &gid_attr_val,
mask, port, index);
}
int ib_cache_gid_find_by_port(struct ib_device *ib_dev,
const union ib_gid *gid,
u8 port, struct net_device *ndev,
u16 *index)
{
int local_index;
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
struct ib_gid_table *table;
unsigned long mask = GID_ATTR_FIND_MASK_GID;
struct ib_gid_attr val = {.ndev = ndev};
if (port < rdma_start_port(ib_dev) ||
port > rdma_end_port(ib_dev))
return -ENOENT;
table = ports_table[port - rdma_start_port(ib_dev)];
if (ndev)
mask |= GID_ATTR_FIND_MASK_NETDEV;
local_index = find_gid(table, gid, &val, false, mask);
if (local_index >= 0) {
if (index)
*index = local_index;
return 0;
}
return -ENOENT;
}
static struct ib_gid_table *alloc_gid_table(int sz)
{
unsigned int i;
struct ib_gid_table *table =
kzalloc(sizeof(struct ib_gid_table), GFP_KERNEL);
if (!table)
return NULL;
table->data_vec = kcalloc(sz, sizeof(*table->data_vec), GFP_KERNEL);
if (!table->data_vec)
goto err_free_table;
mutex_init(&table->lock);
table->sz = sz;
for (i = 0; i < sz; i++)
rwlock_init(&table->data_vec[i].lock);
return table;
err_free_table:
kfree(table);
return NULL;
}
static void release_gid_table(struct ib_gid_table *table)
{
if (table) {
kfree(table->data_vec);
kfree(table);
}
}
static void cleanup_gid_table_port(struct ib_device *ib_dev, u8 port,
struct ib_gid_table *table)
{
int i;
if (!table)
return;
for (i = 0; i < table->sz; ++i) {
if (memcmp(&table->data_vec[i].gid, &zgid,
sizeof(table->data_vec[i].gid)))
del_gid(ib_dev, port, table, i,
table->data_vec[i].props &
GID_ATTR_FIND_MASK_DEFAULT);
}
}
void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
struct net_device *ndev,
enum ib_cache_gid_default_mode mode)
{
struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
union ib_gid gid;
struct ib_gid_attr gid_attr;
struct ib_gid_table *table;
int ix;
union ib_gid current_gid;
struct ib_gid_attr current_gid_attr = {};
table = ports_table[port - rdma_start_port(ib_dev)];
make_default_gid(ndev, &gid);
memset(&gid_attr, 0, sizeof(gid_attr));
gid_attr.ndev = ndev;
ix = find_gid(table, NULL, NULL, true, GID_ATTR_FIND_MASK_DEFAULT);
/* Coudn't find default GID location */
WARN_ON(ix < 0);
mutex_lock(&table->lock);
if (!__ib_cache_gid_get(ib_dev, port, ix,
&current_gid, &current_gid_attr) &&
mode == IB_CACHE_GID_DEFAULT_MODE_SET &&
!memcmp(&gid, &current_gid, sizeof(gid)) &&
!memcmp(&gid_attr, &current_gid_attr, sizeof(gid_attr)))
goto unlock;
if ((memcmp(&current_gid, &zgid, sizeof(current_gid)) ||
memcmp(&current_gid_attr, &zattr,
sizeof(current_gid_attr))) &&
del_gid(ib_dev, port, table, ix, true)) {
pr_warn("ib_cache_gid: can't delete index %d for default gid %pI6\n",
ix, gid.raw);
goto unlock;
}
if (mode == IB_CACHE_GID_DEFAULT_MODE_SET)
if (add_gid(ib_dev, port, table, ix, &gid, &gid_attr, true))
pr_warn("ib_cache_gid: unable to add default gid %pI6\n",
gid.raw);
unlock:
if (current_gid_attr.ndev)
dev_put(current_gid_attr.ndev);
mutex_unlock(&table->lock);
}
static int gid_table_reserve_default(struct ib_device *ib_dev, u8 port,
struct ib_gid_table *table)
{
if (rdma_protocol_roce(ib_dev, port)) {
struct ib_gid_table_entry *entry = &table->data_vec[0];
entry->props |= GID_TABLE_ENTRY_DEFAULT;
}
return 0;
}
static int _gid_table_setup_one(struct ib_device *ib_dev)
{
u8 port;
struct ib_gid_table **table;
int err = 0;
table = kcalloc(ib_dev->phys_port_cnt, sizeof(*table), GFP_KERNEL);
if (!table) {
pr_warn("failed to allocate ib gid cache for %s\n",
ib_dev->name);
return -ENOMEM;
}
for (port = 0; port < ib_dev->phys_port_cnt; port++) {
u8 rdma_port = port + rdma_start_port(ib_dev);
table[port] =
alloc_gid_table(
ib_dev->port_immutable[rdma_port].gid_tbl_len);
if (!table[port]) {
err = -ENOMEM;
goto rollback_table_setup;
}
err = gid_table_reserve_default(ib_dev,
port + rdma_start_port(ib_dev),
table[port]);
if (err)
goto rollback_table_setup;
}
ib_dev->cache.gid_cache = table;
return 0;
rollback_table_setup:
for (port = 0; port < ib_dev->phys_port_cnt; port++) {
cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
table[port]);
release_gid_table(table[port]);
}
kfree(table);
return err;
}
static void gid_table_release_one(struct ib_device *ib_dev)
{
struct ib_gid_table **table = ib_dev->cache.gid_cache;
u8 port;
if (!table)
return;
for (port = 0; port < ib_dev->phys_port_cnt; port++)
release_gid_table(table[port]);
kfree(table);
ib_dev->cache.gid_cache = NULL;
}
static void gid_table_cleanup_one(struct ib_device *ib_dev)
{
struct ib_gid_table **table = ib_dev->cache.gid_cache;
u8 port;
if (!table)
return;
for (port = 0; port < ib_dev->phys_port_cnt; port++)
cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
table[port]);
}
static int gid_table_setup_one(struct ib_device *ib_dev)
{
int err;
err = _gid_table_setup_one(ib_dev);
if (err)
return err;
err = roce_rescan_device(ib_dev);
if (err) {
gid_table_cleanup_one(ib_dev);
gid_table_release_one(ib_dev);
}
return err;
}
int ib_get_cached_gid(struct ib_device *device,
u8 port_num,
int index,
union ib_gid *gid)
{
struct ib_gid_cache *cache;
unsigned long flags;
int ret = 0;
if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
return -EINVAL;
read_lock_irqsave(&device->cache.lock, flags);
cache = device->cache.gid_cache[port_num - rdma_start_port(device)];
if (index < 0 || index >= cache->table_len)
ret = -EINVAL;
else
*gid = cache->table[index];
read_unlock_irqrestore(&device->cache.lock, flags);
return ret;
return __ib_cache_gid_get(device, port_num, index, gid, NULL);
}
EXPORT_SYMBOL(ib_get_cached_gid);
int ib_find_cached_gid(struct ib_device *device,
int ib_find_cached_gid(struct ib_device *device,
const union ib_gid *gid,
u8 *port_num,
u16 *index)
u8 *port_num,
u16 *index)
{
struct ib_gid_cache *cache;
unsigned long flags;
int p, i;
int ret = -ENOENT;
*port_num = -1;
if (index)
*index = -1;
read_lock_irqsave(&device->cache.lock, flags);
for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
cache = device->cache.gid_cache[p];
for (i = 0; i < cache->table_len; ++i) {
if (!memcmp(gid, &cache->table[i], sizeof *gid)) {
*port_num = p + rdma_start_port(device);
if (index)
*index = i;
ret = 0;
goto found;
}
}
}
found:
read_unlock_irqrestore(&device->cache.lock, flags);
return ret;
return ib_cache_gid_find(device, gid, NULL, port_num, index);
}
EXPORT_SYMBOL(ib_find_cached_gid);
@ -243,9 +790,21 @@ static void ib_cache_update(struct ib_device *device,
{
struct ib_port_attr *tprops = NULL;
struct ib_pkey_cache *pkey_cache = NULL, *old_pkey_cache;
struct ib_gid_cache *gid_cache = NULL, *old_gid_cache;
struct ib_gid_cache {
int table_len;
union ib_gid table[0];
} *gid_cache = NULL;
int i;
int ret;
struct ib_gid_table *table;
struct ib_gid_table **ports_table = device->cache.gid_cache;
bool use_roce_gid_table =
rdma_cap_roce_gid_table(device, port);
if (port < rdma_start_port(device) || port > rdma_end_port(device))
return;
table = ports_table[port - rdma_start_port(device)];
tprops = kmalloc(sizeof *tprops, GFP_KERNEL);
if (!tprops)
@ -265,12 +824,14 @@ static void ib_cache_update(struct ib_device *device,
pkey_cache->table_len = tprops->pkey_tbl_len;
gid_cache = kmalloc(sizeof *gid_cache + tprops->gid_tbl_len *
sizeof *gid_cache->table, GFP_KERNEL);
if (!gid_cache)
goto err;
if (!use_roce_gid_table) {
gid_cache = kmalloc(sizeof(*gid_cache) + tprops->gid_tbl_len *
sizeof(*gid_cache->table), GFP_KERNEL);
if (!gid_cache)
goto err;
gid_cache->table_len = tprops->gid_tbl_len;
gid_cache->table_len = tprops->gid_tbl_len;
}
for (i = 0; i < pkey_cache->table_len; ++i) {
ret = ib_query_pkey(device, port, i, pkey_cache->table + i);
@ -281,29 +842,36 @@ static void ib_cache_update(struct ib_device *device,
}
}
for (i = 0; i < gid_cache->table_len; ++i) {
ret = ib_query_gid(device, port, i, gid_cache->table + i);
if (ret) {
printk(KERN_WARNING "ib_query_gid failed (%d) for %s (index %d)\n",
ret, device->name, i);
goto err;
if (!use_roce_gid_table) {
for (i = 0; i < gid_cache->table_len; ++i) {
ret = ib_query_gid(device, port, i,
gid_cache->table + i);
if (ret) {
printk(KERN_WARNING "ib_query_gid failed (%d) for %s (index %d)\n",
ret, device->name, i);
goto err;
}
}
}
write_lock_irq(&device->cache.lock);
old_pkey_cache = device->cache.pkey_cache[port - rdma_start_port(device)];
old_gid_cache = device->cache.gid_cache [port - rdma_start_port(device)];
device->cache.pkey_cache[port - rdma_start_port(device)] = pkey_cache;
device->cache.gid_cache [port - rdma_start_port(device)] = gid_cache;
if (!use_roce_gid_table) {
for (i = 0; i < gid_cache->table_len; i++) {
modify_gid(device, port, table, i, gid_cache->table + i,
&zattr, false);
}
}
device->cache.lmc_cache[port - rdma_start_port(device)] = tprops->lmc;
write_unlock_irq(&device->cache.lock);
kfree(gid_cache);
kfree(old_pkey_cache);
kfree(old_gid_cache);
kfree(tprops);
return;
@ -344,85 +912,88 @@ static void ib_cache_event(struct ib_event_handler *handler,
}
}
static void ib_cache_setup_one(struct ib_device *device)
int ib_cache_setup_one(struct ib_device *device)
{
int p;
int err;
rwlock_init(&device->cache.lock);
device->cache.pkey_cache =
kmalloc(sizeof *device->cache.pkey_cache *
kzalloc(sizeof *device->cache.pkey_cache *
(rdma_end_port(device) - rdma_start_port(device) + 1), GFP_KERNEL);
device->cache.gid_cache =
kmalloc(sizeof *device->cache.gid_cache *
(rdma_end_port(device) - rdma_start_port(device) + 1), GFP_KERNEL);
device->cache.lmc_cache = kmalloc(sizeof *device->cache.lmc_cache *
(rdma_end_port(device) -
rdma_start_port(device) + 1),
GFP_KERNEL);
if (!device->cache.pkey_cache || !device->cache.gid_cache ||
if (!device->cache.pkey_cache ||
!device->cache.lmc_cache) {
printk(KERN_WARNING "Couldn't allocate cache "
"for %s\n", device->name);
goto err;
return -ENOMEM;
}
for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
device->cache.pkey_cache[p] = NULL;
device->cache.gid_cache [p] = NULL;
err = gid_table_setup_one(device);
if (err)
/* Allocated memory will be cleaned in the release function */
return err;
for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
ib_cache_update(device, p + rdma_start_port(device));
}
INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
device, ib_cache_event);
if (ib_register_event_handler(&device->cache.event_handler))
goto err_cache;
err = ib_register_event_handler(&device->cache.event_handler);
if (err)
goto err;
return;
err_cache:
for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
kfree(device->cache.pkey_cache[p]);
kfree(device->cache.gid_cache[p]);
}
return 0;
err:
kfree(device->cache.pkey_cache);
kfree(device->cache.gid_cache);
kfree(device->cache.lmc_cache);
gid_table_cleanup_one(device);
return err;
}
static void ib_cache_cleanup_one(struct ib_device *device)
void ib_cache_release_one(struct ib_device *device)
{
int p;
ib_unregister_event_handler(&device->cache.event_handler);
flush_workqueue(ib_wq);
for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
kfree(device->cache.pkey_cache[p]);
kfree(device->cache.gid_cache[p]);
}
/*
* The release function frees all the cache elements.
* This function should be called as part of freeing
* all the device's resources when the cache could no
* longer be accessed.
*/
if (device->cache.pkey_cache)
for (p = 0;
p <= rdma_end_port(device) - rdma_start_port(device); ++p)
kfree(device->cache.pkey_cache[p]);
gid_table_release_one(device);
kfree(device->cache.pkey_cache);
kfree(device->cache.gid_cache);
kfree(device->cache.lmc_cache);
}
static struct ib_client cache_client = {
.name = "cache",
.add = ib_cache_setup_one,
.remove = ib_cache_cleanup_one
};
int __init ib_cache_setup(void)
void ib_cache_cleanup_one(struct ib_device *device)
{
return ib_register_client(&cache_client);
/* The cleanup function unregisters the event handler,
* waits for all in-progress workqueue elements and cleans
* up the GID cache. This function should be called after
* the device was removed from the devices list and all
* clients were removed, so the cache exists but is
* non-functional and shouldn't be updated anymore.
*/
ib_unregister_event_handler(&device->cache.event_handler);
flush_workqueue(ib_wq);
gid_table_cleanup_one(device);
}
void __init ib_cache_setup(void)
{
roce_gid_mgmt_init();
}
void __exit ib_cache_cleanup(void)
{
ib_unregister_client(&cache_client);
roce_gid_mgmt_cleanup();
}

View File

@ -58,7 +58,7 @@ MODULE_DESCRIPTION("InfiniBand CM");
MODULE_LICENSE("Dual BSD/GPL");
static void cm_add_one(struct ib_device *device);
static void cm_remove_one(struct ib_device *device);
static void cm_remove_one(struct ib_device *device, void *client_data);
static struct ib_client cm_client = {
.name = "cm",
@ -213,13 +213,15 @@ struct cm_id_private {
spinlock_t lock; /* Do not acquire inside cm.lock */
struct completion comp;
atomic_t refcount;
/* Number of clients sharing this ib_cm_id. Only valid for listeners.
* Protected by the cm.lock spinlock. */
int listen_sharecount;
struct ib_mad_send_buf *msg;
struct cm_timewait_info *timewait_info;
/* todo: use alternate port on send failure */
struct cm_av av;
struct cm_av alt_av;
struct ib_cm_compare_data *compare_data;
void *private_data;
__be64 tid;
@ -440,40 +442,6 @@ static struct cm_id_private * cm_acquire_id(__be32 local_id, __be32 remote_id)
return cm_id_priv;
}
static void cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask)
{
int i;
for (i = 0; i < IB_CM_COMPARE_SIZE; i++)
dst[i] = src[i] & mask[i];
}
static int cm_compare_data(struct ib_cm_compare_data *src_data,
struct ib_cm_compare_data *dst_data)
{
u32 src[IB_CM_COMPARE_SIZE];
u32 dst[IB_CM_COMPARE_SIZE];
if (!src_data || !dst_data)
return 0;
cm_mask_copy(src, src_data->data, dst_data->mask);
cm_mask_copy(dst, dst_data->data, src_data->mask);
return memcmp(src, dst, sizeof(src));
}
static int cm_compare_private_data(u32 *private_data,
struct ib_cm_compare_data *dst_data)
{
u32 src[IB_CM_COMPARE_SIZE];
if (!dst_data)
return 0;
cm_mask_copy(src, private_data, dst_data->mask);
return memcmp(src, dst_data->data, sizeof(src));
}
/*
* Trivial helpers to strip endian annotation and compare; the
* endianness doesn't actually matter since we just need a stable
@ -506,18 +474,14 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
struct cm_id_private *cur_cm_id_priv;
__be64 service_id = cm_id_priv->id.service_id;
__be64 service_mask = cm_id_priv->id.service_mask;
int data_cmp;
while (*link) {
parent = *link;
cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
service_node);
data_cmp = cm_compare_data(cm_id_priv->compare_data,
cur_cm_id_priv->compare_data);
if ((cur_cm_id_priv->id.service_mask & service_id) ==
(service_mask & cur_cm_id_priv->id.service_id) &&
(cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
!data_cmp)
(cm_id_priv->id.device == cur_cm_id_priv->id.device))
return cur_cm_id_priv;
if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
@ -528,8 +492,6 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
link = &(*link)->rb_left;
else if (be64_gt(service_id, cur_cm_id_priv->id.service_id))
link = &(*link)->rb_right;
else if (data_cmp < 0)
link = &(*link)->rb_left;
else
link = &(*link)->rb_right;
}
@ -539,20 +501,16 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
}
static struct cm_id_private * cm_find_listen(struct ib_device *device,
__be64 service_id,
u32 *private_data)
__be64 service_id)
{
struct rb_node *node = cm.listen_service_table.rb_node;
struct cm_id_private *cm_id_priv;
int data_cmp;
while (node) {
cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
data_cmp = cm_compare_private_data(private_data,
cm_id_priv->compare_data);
if ((cm_id_priv->id.service_mask & service_id) ==
cm_id_priv->id.service_id &&
(cm_id_priv->id.device == device) && !data_cmp)
(cm_id_priv->id.device == device))
return cm_id_priv;
if (device < cm_id_priv->id.device)
@ -563,8 +521,6 @@ static struct cm_id_private * cm_find_listen(struct ib_device *device,
node = node->rb_left;
else if (be64_gt(service_id, cm_id_priv->id.service_id))
node = node->rb_right;
else if (data_cmp < 0)
node = node->rb_left;
else
node = node->rb_right;
}
@ -859,9 +815,15 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
spin_lock_irq(&cm_id_priv->lock);
switch (cm_id->state) {
case IB_CM_LISTEN:
cm_id->state = IB_CM_IDLE;
spin_unlock_irq(&cm_id_priv->lock);
spin_lock_irq(&cm.lock);
if (--cm_id_priv->listen_sharecount > 0) {
/* The id is still shared. */
cm_deref_id(cm_id_priv);
spin_unlock_irq(&cm.lock);
return;
}
rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
spin_unlock_irq(&cm.lock);
break;
@ -930,7 +892,6 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
wait_for_completion(&cm_id_priv->comp);
while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
cm_free_work(work);
kfree(cm_id_priv->compare_data);
kfree(cm_id_priv->private_data);
kfree(cm_id_priv);
}
@ -941,11 +902,23 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
}
EXPORT_SYMBOL(ib_destroy_cm_id);
int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
struct ib_cm_compare_data *compare_data)
/**
* __ib_cm_listen - Initiates listening on the specified service ID for
* connection and service ID resolution requests.
* @cm_id: Connection identifier associated with the listen request.
* @service_id: Service identifier matched against incoming connection
* and service ID resolution requests. The service ID should be specified
* network-byte order. If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
* assign a service ID to the caller.
* @service_mask: Mask applied to service ID used to listen across a
* range of service IDs. If set to 0, the service ID is matched
* exactly. This parameter is ignored if %service_id is set to
* IB_CM_ASSIGN_SERVICE_ID.
*/
static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
__be64 service_mask)
{
struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
unsigned long flags;
int ret = 0;
service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
@ -958,20 +931,9 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
if (cm_id->state != IB_CM_IDLE)
return -EINVAL;
if (compare_data) {
cm_id_priv->compare_data = kzalloc(sizeof *compare_data,
GFP_KERNEL);
if (!cm_id_priv->compare_data)
return -ENOMEM;
cm_mask_copy(cm_id_priv->compare_data->data,
compare_data->data, compare_data->mask);
memcpy(cm_id_priv->compare_data->mask, compare_data->mask,
sizeof(compare_data->mask));
}
cm_id->state = IB_CM_LISTEN;
++cm_id_priv->listen_sharecount;
spin_lock_irqsave(&cm.lock, flags);
if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
cm_id->service_mask = ~cpu_to_be64(0);
@ -980,18 +942,95 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
cm_id->service_mask = service_mask;
}
cur_cm_id_priv = cm_insert_listen(cm_id_priv);
spin_unlock_irqrestore(&cm.lock, flags);
if (cur_cm_id_priv) {
cm_id->state = IB_CM_IDLE;
kfree(cm_id_priv->compare_data);
cm_id_priv->compare_data = NULL;
--cm_id_priv->listen_sharecount;
ret = -EBUSY;
}
return ret;
}
int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask)
{
unsigned long flags;
int ret;
spin_lock_irqsave(&cm.lock, flags);
ret = __ib_cm_listen(cm_id, service_id, service_mask);
spin_unlock_irqrestore(&cm.lock, flags);
return ret;
}
EXPORT_SYMBOL(ib_cm_listen);
/**
* Create a new listening ib_cm_id and listen on the given service ID.
*
* If there's an existing ID listening on that same device and service ID,
* return it.
*
* @device: Device associated with the cm_id. All related communication will
* be associated with the specified device.
* @cm_handler: Callback invoked to notify the user of CM events.
* @service_id: Service identifier matched against incoming connection
* and service ID resolution requests. The service ID should be specified
* network-byte order. If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
* assign a service ID to the caller.
*
* Callers should call ib_destroy_cm_id when done with the listener ID.
*/
struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
ib_cm_handler cm_handler,
__be64 service_id)
{
struct cm_id_private *cm_id_priv;
struct ib_cm_id *cm_id;
unsigned long flags;
int err = 0;
/* Create an ID in advance, since the creation may sleep */
cm_id = ib_create_cm_id(device, cm_handler, NULL);
if (IS_ERR(cm_id))
return cm_id;
spin_lock_irqsave(&cm.lock, flags);
if (service_id == IB_CM_ASSIGN_SERVICE_ID)
goto new_id;
/* Find an existing ID */
cm_id_priv = cm_find_listen(device, service_id);
if (cm_id_priv) {
if (cm_id->cm_handler != cm_handler || cm_id->context) {
/* Sharing an ib_cm_id with different handlers is not
* supported */
spin_unlock_irqrestore(&cm.lock, flags);
return ERR_PTR(-EINVAL);
}
atomic_inc(&cm_id_priv->refcount);
++cm_id_priv->listen_sharecount;
spin_unlock_irqrestore(&cm.lock, flags);
ib_destroy_cm_id(cm_id);
cm_id = &cm_id_priv->id;
return cm_id;
}
new_id:
/* Use newly created ID */
err = __ib_cm_listen(cm_id, service_id, 0);
spin_unlock_irqrestore(&cm.lock, flags);
if (err) {
ib_destroy_cm_id(cm_id);
return ERR_PTR(err);
}
return cm_id;
}
EXPORT_SYMBOL(ib_cm_insert_listen);
static __be64 cm_form_tid(struct cm_id_private *cm_id_priv,
enum cm_msg_sequence msg_seq)
{
@ -1268,6 +1307,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
primary_path->packet_life_time =
cm_req_get_primary_local_ack_timeout(req_msg);
primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
primary_path->service_id = req_msg->service_id;
if (req_msg->alt_local_lid) {
memset(alt_path, 0, sizeof *alt_path);
@ -1289,9 +1329,28 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
alt_path->packet_life_time =
cm_req_get_alt_local_ack_timeout(req_msg);
alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
alt_path->service_id = req_msg->service_id;
}
}
static u16 cm_get_bth_pkey(struct cm_work *work)
{
struct ib_device *ib_dev = work->port->cm_dev->ib_device;
u8 port_num = work->port->port_num;
u16 pkey_index = work->mad_recv_wc->wc->pkey_index;
u16 pkey;
int ret;
ret = ib_get_cached_pkey(ib_dev, port_num, pkey_index, &pkey);
if (ret) {
dev_warn_ratelimited(&ib_dev->dev, "ib_cm: Couldn't retrieve pkey for incoming request (port %d, pkey index %d). %d\n",
port_num, pkey_index, ret);
return 0;
}
return pkey;
}
static void cm_format_req_event(struct cm_work *work,
struct cm_id_private *cm_id_priv,
struct ib_cm_id *listen_id)
@ -1302,6 +1361,7 @@ static void cm_format_req_event(struct cm_work *work,
req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
param = &work->cm_event.param.req_rcvd;
param->listen_id = listen_id;
param->bth_pkey = cm_get_bth_pkey(work);
param->port = cm_id_priv->av.port->port_num;
param->primary_path = &work->path[0];
if (req_msg->alt_local_lid)
@ -1484,8 +1544,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
/* Find matching listen request. */
listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
req_msg->service_id,
req_msg->private_data);
req_msg->service_id);
if (!listen_cm_id_priv) {
cm_cleanup_timewait(cm_id_priv->timewait_info);
spin_unlock_irq(&cm.lock);
@ -2992,6 +3051,8 @@ static void cm_format_sidr_req_event(struct cm_work *work,
param = &work->cm_event.param.sidr_req_rcvd;
param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
param->listen_id = listen_id;
param->service_id = sidr_req_msg->service_id;
param->bth_pkey = cm_get_bth_pkey(work);
param->port = work->port->port_num;
work->cm_event.private_data = &sidr_req_msg->private_data;
}
@ -3031,8 +3092,7 @@ static int cm_sidr_req_handler(struct cm_work *work)
}
cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
cur_cm_id_priv = cm_find_listen(cm_id->device,
sidr_req_msg->service_id,
sidr_req_msg->private_data);
sidr_req_msg->service_id);
if (!cur_cm_id_priv) {
spin_unlock_irq(&cm.lock);
cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
@ -3886,9 +3946,9 @@ static void cm_add_one(struct ib_device *ib_device)
kfree(cm_dev);
}
static void cm_remove_one(struct ib_device *ib_device)
static void cm_remove_one(struct ib_device *ib_device, void *client_data)
{
struct cm_device *cm_dev;
struct cm_device *cm_dev = client_data;
struct cm_port *port;
struct ib_port_modify port_modify = {
.clr_port_cap_mask = IB_PORT_CM_SUP
@ -3896,7 +3956,6 @@ static void cm_remove_one(struct ib_device *ib_device)
unsigned long flags;
int i;
cm_dev = ib_get_client_data(ib_device, &cm_client);
if (!cm_dev)
return;

View File

@ -46,6 +46,8 @@
#include <net/tcp.h>
#include <net/ipv6.h>
#include <net/ip_fib.h>
#include <net/ip6_route.h>
#include <rdma/rdma_cm.h>
#include <rdma/rdma_cm_ib.h>
@ -94,7 +96,7 @@ const char *rdma_event_msg(enum rdma_cm_event_type event)
EXPORT_SYMBOL(rdma_event_msg);
static void cma_add_one(struct ib_device *device);
static void cma_remove_one(struct ib_device *device);
static void cma_remove_one(struct ib_device *device, void *client_data);
static struct ib_client cma_client = {
.name = "cma",
@ -113,6 +115,22 @@ static DEFINE_IDR(udp_ps);
static DEFINE_IDR(ipoib_ps);
static DEFINE_IDR(ib_ps);
static struct idr *cma_idr(enum rdma_port_space ps)
{
switch (ps) {
case RDMA_PS_TCP:
return &tcp_ps;
case RDMA_PS_UDP:
return &udp_ps;
case RDMA_PS_IPOIB:
return &ipoib_ps;
case RDMA_PS_IB:
return &ib_ps;
default:
return NULL;
}
}
struct cma_device {
struct list_head list;
struct ib_device *device;
@ -122,11 +140,33 @@ struct cma_device {
};
struct rdma_bind_list {
struct idr *ps;
enum rdma_port_space ps;
struct hlist_head owners;
unsigned short port;
};
static int cma_ps_alloc(enum rdma_port_space ps,
struct rdma_bind_list *bind_list, int snum)
{
struct idr *idr = cma_idr(ps);
return idr_alloc(idr, bind_list, snum, snum + 1, GFP_KERNEL);
}
static struct rdma_bind_list *cma_ps_find(enum rdma_port_space ps, int snum)
{
struct idr *idr = cma_idr(ps);
return idr_find(idr, snum);
}
static void cma_ps_remove(enum rdma_port_space ps, int snum)
{
struct idr *idr = cma_idr(ps);
idr_remove(idr, snum);
}
enum {
CMA_OPTION_AFONLY,
};
@ -225,6 +265,15 @@ struct cma_hdr {
#define CMA_VERSION 0x00
struct cma_req_info {
struct ib_device *device;
int port;
union ib_gid local_gid;
__be64 service_id;
u16 pkey;
bool has_gid:1;
};
static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp)
{
unsigned long flags;
@ -262,7 +311,7 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv,
return old;
}
static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
static inline u8 cma_get_ip_ver(const struct cma_hdr *hdr)
{
return hdr->ip_version >> 4;
}
@ -870,107 +919,397 @@ static inline int cma_any_port(struct sockaddr *addr)
return !cma_port(addr);
}
static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
static void cma_save_ib_info(struct sockaddr *src_addr,
struct sockaddr *dst_addr,
struct rdma_cm_id *listen_id,
struct ib_sa_path_rec *path)
{
struct sockaddr_ib *listen_ib, *ib;
listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
ib->sib_family = listen_ib->sib_family;
if (path) {
ib->sib_pkey = path->pkey;
ib->sib_flowinfo = path->flow_label;
memcpy(&ib->sib_addr, &path->sgid, 16);
} else {
ib->sib_pkey = listen_ib->sib_pkey;
ib->sib_flowinfo = listen_ib->sib_flowinfo;
ib->sib_addr = listen_ib->sib_addr;
if (src_addr) {
ib = (struct sockaddr_ib *)src_addr;
ib->sib_family = AF_IB;
if (path) {
ib->sib_pkey = path->pkey;
ib->sib_flowinfo = path->flow_label;
memcpy(&ib->sib_addr, &path->sgid, 16);
ib->sib_sid = path->service_id;
ib->sib_scope_id = 0;
} else {
ib->sib_pkey = listen_ib->sib_pkey;
ib->sib_flowinfo = listen_ib->sib_flowinfo;
ib->sib_addr = listen_ib->sib_addr;
ib->sib_sid = listen_ib->sib_sid;
ib->sib_scope_id = listen_ib->sib_scope_id;
}
ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
}
ib->sib_sid = listen_ib->sib_sid;
ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
ib->sib_scope_id = listen_ib->sib_scope_id;
if (path) {
ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
ib->sib_family = listen_ib->sib_family;
ib->sib_pkey = path->pkey;
ib->sib_flowinfo = path->flow_label;
memcpy(&ib->sib_addr, &path->dgid, 16);
if (dst_addr) {
ib = (struct sockaddr_ib *)dst_addr;
ib->sib_family = AF_IB;
if (path) {
ib->sib_pkey = path->pkey;
ib->sib_flowinfo = path->flow_label;
memcpy(&ib->sib_addr, &path->dgid, 16);
}
}
}
static __be16 ss_get_port(const struct sockaddr_storage *ss)
{
if (ss->ss_family == AF_INET)
return ((struct sockaddr_in *)ss)->sin_port;
else if (ss->ss_family == AF_INET6)
return ((struct sockaddr_in6 *)ss)->sin6_port;
BUG();
}
static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
struct cma_hdr *hdr)
static void cma_save_ip4_info(struct sockaddr *src_addr,
struct sockaddr *dst_addr,
struct cma_hdr *hdr,
__be16 local_port)
{
struct sockaddr_in *ip4;
ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
ip4->sin_family = AF_INET;
ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
ip4->sin_port = ss_get_port(&listen_id->route.addr.src_addr);
if (src_addr) {
ip4 = (struct sockaddr_in *)src_addr;
ip4->sin_family = AF_INET;
ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
ip4->sin_port = local_port;
}
ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
ip4->sin_family = AF_INET;
ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
ip4->sin_port = hdr->port;
if (dst_addr) {
ip4 = (struct sockaddr_in *)dst_addr;
ip4->sin_family = AF_INET;
ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
ip4->sin_port = hdr->port;
}
}
static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
struct cma_hdr *hdr)
static void cma_save_ip6_info(struct sockaddr *src_addr,
struct sockaddr *dst_addr,
struct cma_hdr *hdr,
__be16 local_port)
{
struct sockaddr_in6 *ip6;
ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
ip6->sin6_family = AF_INET6;
ip6->sin6_addr = hdr->dst_addr.ip6;
ip6->sin6_port = ss_get_port(&listen_id->route.addr.src_addr);
if (src_addr) {
ip6 = (struct sockaddr_in6 *)src_addr;
ip6->sin6_family = AF_INET6;
ip6->sin6_addr = hdr->dst_addr.ip6;
ip6->sin6_port = local_port;
}
ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
ip6->sin6_family = AF_INET6;
ip6->sin6_addr = hdr->src_addr.ip6;
ip6->sin6_port = hdr->port;
if (dst_addr) {
ip6 = (struct sockaddr_in6 *)dst_addr;
ip6->sin6_family = AF_INET6;
ip6->sin6_addr = hdr->src_addr.ip6;
ip6->sin6_port = hdr->port;
}
}
static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
struct ib_cm_event *ib_event)
static u16 cma_port_from_service_id(__be64 service_id)
{
return (u16)be64_to_cpu(service_id);
}
static int cma_save_ip_info(struct sockaddr *src_addr,
struct sockaddr *dst_addr,
struct ib_cm_event *ib_event,
__be64 service_id)
{
struct cma_hdr *hdr;
if (listen_id->route.addr.src_addr.ss_family == AF_IB) {
if (ib_event->event == IB_CM_REQ_RECEIVED)
cma_save_ib_info(id, listen_id, ib_event->param.req_rcvd.primary_path);
else if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED)
cma_save_ib_info(id, listen_id, NULL);
return 0;
}
__be16 port;
hdr = ib_event->private_data;
if (hdr->cma_version != CMA_VERSION)
return -EINVAL;
port = htons(cma_port_from_service_id(service_id));
switch (cma_get_ip_ver(hdr)) {
case 4:
cma_save_ip4_info(id, listen_id, hdr);
cma_save_ip4_info(src_addr, dst_addr, hdr, port);
break;
case 6:
cma_save_ip6_info(id, listen_id, hdr);
cma_save_ip6_info(src_addr, dst_addr, hdr, port);
break;
default:
return -EAFNOSUPPORT;
}
return 0;
}
static int cma_save_net_info(struct sockaddr *src_addr,
struct sockaddr *dst_addr,
struct rdma_cm_id *listen_id,
struct ib_cm_event *ib_event,
sa_family_t sa_family, __be64 service_id)
{
if (sa_family == AF_IB) {
if (ib_event->event == IB_CM_REQ_RECEIVED)
cma_save_ib_info(src_addr, dst_addr, listen_id,
ib_event->param.req_rcvd.primary_path);
else if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED)
cma_save_ib_info(src_addr, dst_addr, listen_id, NULL);
return 0;
}
return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
}
static int cma_save_req_info(const struct ib_cm_event *ib_event,
struct cma_req_info *req)
{
const struct ib_cm_req_event_param *req_param =
&ib_event->param.req_rcvd;
const struct ib_cm_sidr_req_event_param *sidr_param =
&ib_event->param.sidr_req_rcvd;
switch (ib_event->event) {
case IB_CM_REQ_RECEIVED:
req->device = req_param->listen_id->device;
req->port = req_param->port;
memcpy(&req->local_gid, &req_param->primary_path->sgid,
sizeof(req->local_gid));
req->has_gid = true;
req->service_id = req_param->primary_path->service_id;
req->pkey = req_param->bth_pkey;
break;
case IB_CM_SIDR_REQ_RECEIVED:
req->device = sidr_param->listen_id->device;
req->port = sidr_param->port;
req->has_gid = false;
req->service_id = sidr_param->service_id;
req->pkey = sidr_param->bth_pkey;
break;
default:
return -EINVAL;
}
return 0;
}
static bool validate_ipv4_net_dev(struct net_device *net_dev,
const struct sockaddr_in *dst_addr,
const struct sockaddr_in *src_addr)
{
__be32 daddr = dst_addr->sin_addr.s_addr,
saddr = src_addr->sin_addr.s_addr;
struct fib_result res;
struct flowi4 fl4;
int err;
bool ret;
if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
ipv4_is_lbcast(daddr) || ipv4_is_zeronet(saddr) ||
ipv4_is_zeronet(daddr) || ipv4_is_loopback(daddr) ||
ipv4_is_loopback(saddr))
return false;
memset(&fl4, 0, sizeof(fl4));
fl4.flowi4_iif = net_dev->ifindex;
fl4.daddr = daddr;
fl4.saddr = saddr;
rcu_read_lock();
err = fib_lookup(dev_net(net_dev), &fl4, &res, 0);
if (err)
return false;
ret = FIB_RES_DEV(res) == net_dev;
rcu_read_unlock();
return ret;
}
static bool validate_ipv6_net_dev(struct net_device *net_dev,
const struct sockaddr_in6 *dst_addr,
const struct sockaddr_in6 *src_addr)
{
#if IS_ENABLED(CONFIG_IPV6)
const int strict = ipv6_addr_type(&dst_addr->sin6_addr) &
IPV6_ADDR_LINKLOCAL;
struct rt6_info *rt = rt6_lookup(dev_net(net_dev), &dst_addr->sin6_addr,
&src_addr->sin6_addr, net_dev->ifindex,
strict);
bool ret;
if (!rt)
return false;
ret = rt->rt6i_idev->dev == net_dev;
ip6_rt_put(rt);
return ret;
#else
return false;
#endif
}
static bool validate_net_dev(struct net_device *net_dev,
const struct sockaddr *daddr,
const struct sockaddr *saddr)
{
const struct sockaddr_in *daddr4 = (const struct sockaddr_in *)daddr;
const struct sockaddr_in *saddr4 = (const struct sockaddr_in *)saddr;
const struct sockaddr_in6 *daddr6 = (const struct sockaddr_in6 *)daddr;
const struct sockaddr_in6 *saddr6 = (const struct sockaddr_in6 *)saddr;
switch (daddr->sa_family) {
case AF_INET:
return saddr->sa_family == AF_INET &&
validate_ipv4_net_dev(net_dev, daddr4, saddr4);
case AF_INET6:
return saddr->sa_family == AF_INET6 &&
validate_ipv6_net_dev(net_dev, daddr6, saddr6);
default:
return false;
}
}
static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
const struct cma_req_info *req)
{
struct sockaddr_storage listen_addr_storage, src_addr_storage;
struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage,
*src_addr = (struct sockaddr *)&src_addr_storage;
struct net_device *net_dev;
const union ib_gid *gid = req->has_gid ? &req->local_gid : NULL;
int err;
err = cma_save_ip_info(listen_addr, src_addr, ib_event,
req->service_id);
if (err)
return ERR_PTR(err);
net_dev = ib_get_net_dev_by_params(req->device, req->port, req->pkey,
gid, listen_addr);
if (!net_dev)
return ERR_PTR(-ENODEV);
if (!validate_net_dev(net_dev, listen_addr, src_addr)) {
dev_put(net_dev);
return ERR_PTR(-EHOSTUNREACH);
}
return net_dev;
}
static enum rdma_port_space rdma_ps_from_service_id(__be64 service_id)
{
return (be64_to_cpu(service_id) >> 16) & 0xffff;
}
static bool cma_match_private_data(struct rdma_id_private *id_priv,
const struct cma_hdr *hdr)
{
struct sockaddr *addr = cma_src_addr(id_priv);
__be32 ip4_addr;
struct in6_addr ip6_addr;
if (cma_any_addr(addr) && !id_priv->afonly)
return true;
switch (addr->sa_family) {
case AF_INET:
ip4_addr = ((struct sockaddr_in *)addr)->sin_addr.s_addr;
if (cma_get_ip_ver(hdr) != 4)
return false;
if (!cma_any_addr(addr) &&
hdr->dst_addr.ip4.addr != ip4_addr)
return false;
break;
case AF_INET6:
ip6_addr = ((struct sockaddr_in6 *)addr)->sin6_addr;
if (cma_get_ip_ver(hdr) != 6)
return false;
if (!cma_any_addr(addr) &&
memcmp(&hdr->dst_addr.ip6, &ip6_addr, sizeof(ip6_addr)))
return false;
break;
case AF_IB:
return true;
default:
return false;
}
return true;
}
static bool cma_match_net_dev(const struct rdma_id_private *id_priv,
const struct net_device *net_dev)
{
const struct rdma_addr *addr = &id_priv->id.route.addr;
if (!net_dev)
/* This request is an AF_IB request */
return addr->src_addr.ss_family == AF_IB;
return !addr->dev_addr.bound_dev_if ||
(net_eq(dev_net(net_dev), &init_net) &&
addr->dev_addr.bound_dev_if == net_dev->ifindex);
}
static struct rdma_id_private *cma_find_listener(
const struct rdma_bind_list *bind_list,
const struct ib_cm_id *cm_id,
const struct ib_cm_event *ib_event,
const struct cma_req_info *req,
const struct net_device *net_dev)
{
struct rdma_id_private *id_priv, *id_priv_dev;
if (!bind_list)
return ERR_PTR(-EINVAL);
hlist_for_each_entry(id_priv, &bind_list->owners, node) {
if (cma_match_private_data(id_priv, ib_event->private_data)) {
if (id_priv->id.device == cm_id->device &&
cma_match_net_dev(id_priv, net_dev))
return id_priv;
list_for_each_entry(id_priv_dev,
&id_priv->listen_list,
listen_list) {
if (id_priv_dev->id.device == cm_id->device &&
cma_match_net_dev(id_priv_dev, net_dev))
return id_priv_dev;
}
}
}
return ERR_PTR(-EINVAL);
}
static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id,
struct ib_cm_event *ib_event,
struct net_device **net_dev)
{
struct cma_req_info req;
struct rdma_bind_list *bind_list;
struct rdma_id_private *id_priv;
int err;
err = cma_save_req_info(ib_event, &req);
if (err)
return ERR_PTR(err);
*net_dev = cma_get_net_dev(ib_event, &req);
if (IS_ERR(*net_dev)) {
if (PTR_ERR(*net_dev) == -EAFNOSUPPORT) {
/* Assuming the protocol is AF_IB */
*net_dev = NULL;
} else {
return ERR_CAST(*net_dev);
}
}
bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
cma_port_from_service_id(req.service_id));
id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev);
if (IS_ERR(id_priv)) {
dev_put(*net_dev);
*net_dev = NULL;
}
return id_priv;
}
static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
{
return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
@ -1038,7 +1377,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
mutex_lock(&lock);
hlist_del(&id_priv->node);
if (hlist_empty(&bind_list->owners)) {
idr_remove(bind_list->ps, bind_list->port);
cma_ps_remove(bind_list->ps, bind_list->port);
kfree(bind_list);
}
mutex_unlock(&lock);
@ -1216,11 +1555,15 @@ static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
}
static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
struct ib_cm_event *ib_event)
struct ib_cm_event *ib_event,
struct net_device *net_dev)
{
struct rdma_id_private *id_priv;
struct rdma_cm_id *id;
struct rdma_route *rt;
const sa_family_t ss_family = listen_id->route.addr.src_addr.ss_family;
const __be64 service_id =
ib_event->param.req_rcvd.primary_path->service_id;
int ret;
id = rdma_create_id(listen_id->event_handler, listen_id->context,
@ -1229,7 +1572,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
return NULL;
id_priv = container_of(id, struct rdma_id_private, id);
if (cma_save_net_info(id, listen_id, ib_event))
if (cma_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
(struct sockaddr *)&id->route.addr.dst_addr,
listen_id, ib_event, ss_family, service_id))
goto err;
rt = &id->route;
@ -1243,14 +1588,16 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
if (rt->num_paths == 2)
rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
if (cma_any_addr(cma_src_addr(id_priv))) {
rt->addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
rdma_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid);
ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey));
} else {
ret = cma_translate_addr(cma_src_addr(id_priv), &rt->addr.dev_addr);
if (net_dev) {
ret = rdma_copy_addr(&rt->addr.dev_addr, net_dev, NULL);
if (ret)
goto err;
} else {
/* An AF_IB connection */
WARN_ON_ONCE(ss_family != AF_IB);
cma_translate_ib((struct sockaddr_ib *)cma_src_addr(id_priv),
&rt->addr.dev_addr);
}
rdma_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid);
@ -1263,10 +1610,12 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
}
static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
struct ib_cm_event *ib_event)
struct ib_cm_event *ib_event,
struct net_device *net_dev)
{
struct rdma_id_private *id_priv;
struct rdma_cm_id *id;
const sa_family_t ss_family = listen_id->route.addr.src_addr.ss_family;
int ret;
id = rdma_create_id(listen_id->event_handler, listen_id->context,
@ -1275,13 +1624,24 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
return NULL;
id_priv = container_of(id, struct rdma_id_private, id);
if (cma_save_net_info(id, listen_id, ib_event))
if (cma_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
(struct sockaddr *)&id->route.addr.dst_addr,
listen_id, ib_event, ss_family,
ib_event->param.sidr_req_rcvd.service_id))
goto err;
if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
ret = cma_translate_addr(cma_src_addr(id_priv), &id->route.addr.dev_addr);
if (net_dev) {
ret = rdma_copy_addr(&id->route.addr.dev_addr, net_dev, NULL);
if (ret)
goto err;
} else {
/* An AF_IB connection */
WARN_ON_ONCE(ss_family != AF_IB);
if (!cma_any_addr(cma_src_addr(id_priv)))
cma_translate_ib((struct sockaddr_ib *)
cma_src_addr(id_priv),
&id->route.addr.dev_addr);
}
id_priv->state = RDMA_CM_CONNECT;
@ -1319,25 +1679,33 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
{
struct rdma_id_private *listen_id, *conn_id;
struct rdma_cm_event event;
struct net_device *net_dev;
int offset, ret;
listen_id = cm_id->context;
if (!cma_check_req_qp_type(&listen_id->id, ib_event))
return -EINVAL;
listen_id = cma_id_from_event(cm_id, ib_event, &net_dev);
if (IS_ERR(listen_id))
return PTR_ERR(listen_id);
if (cma_disable_callback(listen_id, RDMA_CM_LISTEN))
return -ECONNABORTED;
if (!cma_check_req_qp_type(&listen_id->id, ib_event)) {
ret = -EINVAL;
goto net_dev_put;
}
if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) {
ret = -ECONNABORTED;
goto net_dev_put;
}
memset(&event, 0, sizeof event);
offset = cma_user_data_offset(listen_id);
event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED) {
conn_id = cma_new_udp_id(&listen_id->id, ib_event);
conn_id = cma_new_udp_id(&listen_id->id, ib_event, net_dev);
event.param.ud.private_data = ib_event->private_data + offset;
event.param.ud.private_data_len =
IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset;
} else {
conn_id = cma_new_conn_id(&listen_id->id, ib_event);
conn_id = cma_new_conn_id(&listen_id->id, ib_event, net_dev);
cma_set_req_event_data(&event, &ib_event->param.req_rcvd,
ib_event->private_data, offset);
}
@ -1375,6 +1743,8 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
mutex_unlock(&conn_id->handler_mutex);
mutex_unlock(&listen_id->handler_mutex);
cma_deref_id(conn_id);
if (net_dev)
dev_put(net_dev);
return 0;
err3:
@ -1388,6 +1758,11 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
mutex_unlock(&listen_id->handler_mutex);
if (conn_id)
rdma_destroy_id(&conn_id->id);
net_dev_put:
if (net_dev)
dev_put(net_dev);
return ret;
}
@ -1400,42 +1775,6 @@ __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)
}
EXPORT_SYMBOL(rdma_get_service_id);
static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
struct ib_cm_compare_data *compare)
{
struct cma_hdr *cma_data, *cma_mask;
__be32 ip4_addr;
struct in6_addr ip6_addr;
memset(compare, 0, sizeof *compare);
cma_data = (void *) compare->data;
cma_mask = (void *) compare->mask;
switch (addr->sa_family) {
case AF_INET:
ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
cma_set_ip_ver(cma_data, 4);
cma_set_ip_ver(cma_mask, 0xF);
if (!cma_any_addr(addr)) {
cma_data->dst_addr.ip4.addr = ip4_addr;
cma_mask->dst_addr.ip4.addr = htonl(~0);
}
break;
case AF_INET6:
ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
cma_set_ip_ver(cma_data, 6);
cma_set_ip_ver(cma_mask, 0xF);
if (!cma_any_addr(addr)) {
cma_data->dst_addr.ip6 = ip6_addr;
memset(&cma_mask->dst_addr.ip6, 0xFF,
sizeof cma_mask->dst_addr.ip6);
}
break;
default:
break;
}
}
static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
{
struct rdma_id_private *id_priv = iw_id->context;
@ -1589,33 +1928,18 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
static int cma_ib_listen(struct rdma_id_private *id_priv)
{
struct ib_cm_compare_data compare_data;
struct sockaddr *addr;
struct ib_cm_id *id;
__be64 svc_id;
int ret;
id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv);
if (IS_ERR(id))
return PTR_ERR(id);
id_priv->cm_id.ib = id;
addr = cma_src_addr(id_priv);
svc_id = rdma_get_service_id(&id_priv->id, addr);
if (cma_any_addr(addr) && !id_priv->afonly)
ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL);
else {
cma_set_compare_data(id_priv->id.ps, addr, &compare_data);
ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data);
}
id = ib_cm_insert_listen(id_priv->id.device, cma_req_handler, svc_id);
if (IS_ERR(id))
return PTR_ERR(id);
id_priv->cm_id.ib = id;
if (ret) {
ib_destroy_cm_id(id_priv->cm_id.ib);
id_priv->cm_id.ib = NULL;
}
return ret;
return 0;
}
static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog)
@ -2203,8 +2527,11 @@ static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
src_addr = (struct sockaddr *) &id->route.addr.src_addr;
src_addr->sa_family = dst_addr->sa_family;
if (dst_addr->sa_family == AF_INET6) {
((struct sockaddr_in6 *) src_addr)->sin6_scope_id =
((struct sockaddr_in6 *) dst_addr)->sin6_scope_id;
struct sockaddr_in6 *src_addr6 = (struct sockaddr_in6 *) src_addr;
struct sockaddr_in6 *dst_addr6 = (struct sockaddr_in6 *) dst_addr;
src_addr6->sin6_scope_id = dst_addr6->sin6_scope_id;
if (ipv6_addr_type(&dst_addr6->sin6_addr) & IPV6_ADDR_LINKLOCAL)
id->route.addr.dev_addr.bound_dev_if = dst_addr6->sin6_scope_id;
} else if (dst_addr->sa_family == AF_IB) {
((struct sockaddr_ib *) src_addr)->sib_pkey =
((struct sockaddr_ib *) dst_addr)->sib_pkey;
@ -2325,8 +2652,8 @@ static void cma_bind_port(struct rdma_bind_list *bind_list,
hlist_add_head(&id_priv->node, &bind_list->owners);
}
static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
unsigned short snum)
static int cma_alloc_port(enum rdma_port_space ps,
struct rdma_id_private *id_priv, unsigned short snum)
{
struct rdma_bind_list *bind_list;
int ret;
@ -2335,7 +2662,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
if (!bind_list)
return -ENOMEM;
ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
ret = cma_ps_alloc(ps, bind_list, snum);
if (ret < 0)
goto err;
@ -2348,7 +2675,8 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
return ret == -ENOSPC ? -EADDRNOTAVAIL : ret;
}
static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
static int cma_alloc_any_port(enum rdma_port_space ps,
struct rdma_id_private *id_priv)
{
static unsigned int last_used_port;
int low, high, remaining;
@ -2359,7 +2687,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
rover = prandom_u32() % remaining + low;
retry:
if (last_used_port != rover &&
!idr_find(ps, (unsigned short) rover)) {
!cma_ps_find(ps, (unsigned short)rover)) {
int ret = cma_alloc_port(ps, id_priv, rover);
/*
* Remember previously used port number in order to avoid
@ -2414,7 +2742,8 @@ static int cma_check_port(struct rdma_bind_list *bind_list,
return 0;
}
static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
static int cma_use_port(enum rdma_port_space ps,
struct rdma_id_private *id_priv)
{
struct rdma_bind_list *bind_list;
unsigned short snum;
@ -2424,7 +2753,7 @@ static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
return -EACCES;
bind_list = idr_find(ps, snum);
bind_list = cma_ps_find(ps, snum);
if (!bind_list) {
ret = cma_alloc_port(ps, id_priv, snum);
} else {
@ -2447,25 +2776,24 @@ static int cma_bind_listen(struct rdma_id_private *id_priv)
return ret;
}
static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
static enum rdma_port_space cma_select_inet_ps(
struct rdma_id_private *id_priv)
{
switch (id_priv->id.ps) {
case RDMA_PS_TCP:
return &tcp_ps;
case RDMA_PS_UDP:
return &udp_ps;
case RDMA_PS_IPOIB:
return &ipoib_ps;
case RDMA_PS_IB:
return &ib_ps;
return id_priv->id.ps;
default:
return NULL;
return 0;
}
}
static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
static enum rdma_port_space cma_select_ib_ps(struct rdma_id_private *id_priv)
{
struct idr *ps = NULL;
enum rdma_port_space ps = 0;
struct sockaddr_ib *sib;
u64 sid_ps, mask, sid;
@ -2475,15 +2803,15 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
if ((id_priv->id.ps == RDMA_PS_IB) && (sid == (RDMA_IB_IP_PS_IB & mask))) {
sid_ps = RDMA_IB_IP_PS_IB;
ps = &ib_ps;
ps = RDMA_PS_IB;
} else if (((id_priv->id.ps == RDMA_PS_IB) || (id_priv->id.ps == RDMA_PS_TCP)) &&
(sid == (RDMA_IB_IP_PS_TCP & mask))) {
sid_ps = RDMA_IB_IP_PS_TCP;
ps = &tcp_ps;
ps = RDMA_PS_TCP;
} else if (((id_priv->id.ps == RDMA_PS_IB) || (id_priv->id.ps == RDMA_PS_UDP)) &&
(sid == (RDMA_IB_IP_PS_UDP & mask))) {
sid_ps = RDMA_IB_IP_PS_UDP;
ps = &udp_ps;
ps = RDMA_PS_UDP;
}
if (ps) {
@ -2496,7 +2824,7 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
static int cma_get_port(struct rdma_id_private *id_priv)
{
struct idr *ps;
enum rdma_port_space ps;
int ret;
if (cma_family(id_priv) != AF_IB)
@ -3551,11 +3879,10 @@ static void cma_process_remove(struct cma_device *cma_dev)
wait_for_completion(&cma_dev->comp);
}
static void cma_remove_one(struct ib_device *device)
static void cma_remove_one(struct ib_device *device, void *client_data)
{
struct cma_device *cma_dev;
struct cma_device *cma_dev = client_data;
cma_dev = ib_get_client_data(device, &cma_client);
if (!cma_dev)
return;

View File

@ -43,12 +43,58 @@ int ib_device_register_sysfs(struct ib_device *device,
u8, struct kobject *));
void ib_device_unregister_sysfs(struct ib_device *device);
int ib_sysfs_setup(void);
void ib_sysfs_cleanup(void);
int ib_cache_setup(void);
void ib_cache_setup(void);
void ib_cache_cleanup(void);
int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
struct net_device *idev, void *cookie);
typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port,
struct net_device *idev, void *cookie);
void ib_enum_roce_netdev(struct ib_device *ib_dev,
roce_netdev_filter filter,
void *filter_cookie,
roce_netdev_callback cb,
void *cookie);
void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
void *filter_cookie,
roce_netdev_callback cb,
void *cookie);
int ib_cache_gid_find_by_port(struct ib_device *ib_dev,
const union ib_gid *gid,
u8 port, struct net_device *ndev,
u16 *index);
enum ib_cache_gid_default_mode {
IB_CACHE_GID_DEFAULT_MODE_SET,
IB_CACHE_GID_DEFAULT_MODE_DELETE
};
void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
struct net_device *ndev,
enum ib_cache_gid_default_mode mode);
int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr);
int ib_cache_gid_del(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr);
int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
struct net_device *ndev);
int roce_gid_mgmt_init(void);
void roce_gid_mgmt_cleanup(void);
int roce_rescan_device(struct ib_device *ib_dev);
int ib_cache_setup_one(struct ib_device *device);
void ib_cache_cleanup_one(struct ib_device *device);
void ib_cache_release_one(struct ib_device *device);
#endif /* _CORE_PRIV_H */

View File

@ -38,7 +38,10 @@
#include <linux/slab.h>
#include <linux/init.h>
#include <linux/mutex.h>
#include <linux/netdevice.h>
#include <rdma/rdma_netlink.h>
#include <rdma/ib_addr.h>
#include <rdma/ib_cache.h>
#include "core_priv.h"
@ -50,22 +53,34 @@ struct ib_client_data {
struct list_head list;
struct ib_client *client;
void * data;
/* The device or client is going down. Do not call client or device
* callbacks other than remove(). */
bool going_down;
};
struct workqueue_struct *ib_wq;
EXPORT_SYMBOL_GPL(ib_wq);
/* The device_list and client_list contain devices and clients after their
* registration has completed, and the devices and clients are removed
* during unregistration. */
static LIST_HEAD(device_list);
static LIST_HEAD(client_list);
/*
* device_mutex protects access to both device_list and client_list.
* There's no real point to using multiple locks or something fancier
* like an rwsem: we always access both lists, and we're always
* modifying one list or the other list. In any case this is not a
* hot path so there's no point in trying to optimize.
* device_mutex and lists_rwsem protect access to both device_list and
* client_list. device_mutex protects writer access by device and client
* registration / de-registration. lists_rwsem protects reader access to
* these lists. Iterators of these lists must lock it for read, while updates
* to the lists must be done with a write lock. A special case is when the
* device_mutex is locked. In this case locking the lists for read access is
* not necessary as the device_mutex implies it.
*
* lists_rwsem also protects access to the client data list.
*/
static DEFINE_MUTEX(device_mutex);
static DECLARE_RWSEM(lists_rwsem);
static int ib_device_check_mandatory(struct ib_device *device)
{
@ -152,6 +167,36 @@ static int alloc_name(char *name)
return 0;
}
static void ib_device_release(struct device *device)
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
ib_cache_release_one(dev);
kfree(dev->port_immutable);
kfree(dev);
}
static int ib_device_uevent(struct device *device,
struct kobj_uevent_env *env)
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
if (add_uevent_var(env, "NAME=%s", dev->name))
return -ENOMEM;
/*
* It would be nice to pass the node GUID with the event...
*/
return 0;
}
static struct class ib_class = {
.name = "infiniband",
.dev_release = ib_device_release,
.dev_uevent = ib_device_uevent,
};
/**
* ib_alloc_device - allocate an IB device struct
* @size:size of structure to allocate
@ -164,9 +209,27 @@ static int alloc_name(char *name)
*/
struct ib_device *ib_alloc_device(size_t size)
{
BUG_ON(size < sizeof (struct ib_device));
struct ib_device *device;
return kzalloc(size, GFP_KERNEL);
if (WARN_ON(size < sizeof(struct ib_device)))
return NULL;
device = kzalloc(size, GFP_KERNEL);
if (!device)
return NULL;
device->dev.class = &ib_class;
device_initialize(&device->dev);
dev_set_drvdata(&device->dev, device);
INIT_LIST_HEAD(&device->event_handler_list);
spin_lock_init(&device->event_handler_lock);
spin_lock_init(&device->client_data_lock);
INIT_LIST_HEAD(&device->client_data_list);
INIT_LIST_HEAD(&device->port_list);
return device;
}
EXPORT_SYMBOL(ib_alloc_device);
@ -178,13 +241,8 @@ EXPORT_SYMBOL(ib_alloc_device);
*/
void ib_dealloc_device(struct ib_device *device)
{
if (device->reg_state == IB_DEV_UNINITIALIZED) {
kfree(device);
return;
}
BUG_ON(device->reg_state != IB_DEV_UNREGISTERED);
WARN_ON(device->reg_state != IB_DEV_UNREGISTERED &&
device->reg_state != IB_DEV_UNINITIALIZED);
kobject_put(&device->dev.kobj);
}
EXPORT_SYMBOL(ib_dealloc_device);
@ -203,10 +261,13 @@ static int add_client_context(struct ib_device *device, struct ib_client *client
context->client = client;
context->data = NULL;
context->going_down = false;
down_write(&lists_rwsem);
spin_lock_irqsave(&device->client_data_lock, flags);
list_add(&context->list, &device->client_data_list);
spin_unlock_irqrestore(&device->client_data_lock, flags);
up_write(&lists_rwsem);
return 0;
}
@ -219,7 +280,7 @@ static int verify_immutable(const struct ib_device *dev, u8 port)
static int read_port_immutable(struct ib_device *device)
{
int ret = -ENOMEM;
int ret;
u8 start_port = rdma_start_port(device);
u8 end_port = rdma_end_port(device);
u8 port;
@ -235,26 +296,18 @@ static int read_port_immutable(struct ib_device *device)
* (end_port + 1),
GFP_KERNEL);
if (!device->port_immutable)
goto err;
return -ENOMEM;
for (port = start_port; port <= end_port; ++port) {
ret = device->get_port_immutable(device, port,
&device->port_immutable[port]);
if (ret)
goto err;
return ret;
if (verify_immutable(device, port)) {
ret = -EINVAL;
goto err;
}
if (verify_immutable(device, port))
return -EINVAL;
}
ret = 0;
goto out;
err:
kfree(device->port_immutable);
out:
return ret;
return 0;
}
/**
@ -271,6 +324,7 @@ int ib_register_device(struct ib_device *device,
u8, struct kobject *))
{
int ret;
struct ib_client *client;
mutex_lock(&device_mutex);
@ -285,11 +339,6 @@ int ib_register_device(struct ib_device *device,
goto out;
}
INIT_LIST_HEAD(&device->event_handler_list);
INIT_LIST_HEAD(&device->client_data_list);
spin_lock_init(&device->event_handler_lock);
spin_lock_init(&device->client_data_lock);
ret = read_port_immutable(device);
if (ret) {
printk(KERN_WARNING "Couldn't create per port immutable data %s\n",
@ -297,27 +346,30 @@ int ib_register_device(struct ib_device *device,
goto out;
}
ret = ib_cache_setup_one(device);
if (ret) {
printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
goto out;
}
ret = ib_device_register_sysfs(device, port_callback);
if (ret) {
printk(KERN_WARNING "Couldn't register device %s with driver model\n",
device->name);
kfree(device->port_immutable);
ib_cache_cleanup_one(device);
goto out;
}
list_add_tail(&device->core_list, &device_list);
device->reg_state = IB_DEV_REGISTERED;
{
struct ib_client *client;
list_for_each_entry(client, &client_list, list)
if (client->add && !add_client_context(device, client))
client->add(device);
list_for_each_entry(client, &client_list, list)
if (client->add && !add_client_context(device, client))
client->add(device);
}
out:
down_write(&lists_rwsem);
list_add_tail(&device->core_list, &device_list);
up_write(&lists_rwsem);
out:
mutex_unlock(&device_mutex);
return ret;
}
@ -331,26 +383,37 @@ EXPORT_SYMBOL(ib_register_device);
*/
void ib_unregister_device(struct ib_device *device)
{
struct ib_client *client;
struct ib_client_data *context, *tmp;
unsigned long flags;
mutex_lock(&device_mutex);
list_for_each_entry_reverse(client, &client_list, list)
if (client->remove)
client->remove(device);
down_write(&lists_rwsem);
list_del(&device->core_list);
spin_lock_irqsave(&device->client_data_lock, flags);
list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
context->going_down = true;
spin_unlock_irqrestore(&device->client_data_lock, flags);
downgrade_write(&lists_rwsem);
list_for_each_entry_safe(context, tmp, &device->client_data_list,
list) {
if (context->client->remove)
context->client->remove(device, context->data);
}
up_read(&lists_rwsem);
mutex_unlock(&device_mutex);
ib_device_unregister_sysfs(device);
ib_cache_cleanup_one(device);
down_write(&lists_rwsem);
spin_lock_irqsave(&device->client_data_lock, flags);
list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
kfree(context);
spin_unlock_irqrestore(&device->client_data_lock, flags);
up_write(&lists_rwsem);
device->reg_state = IB_DEV_UNREGISTERED;
}
@ -375,11 +438,14 @@ int ib_register_client(struct ib_client *client)
mutex_lock(&device_mutex);
list_add_tail(&client->list, &client_list);
list_for_each_entry(device, &device_list, core_list)
if (client->add && !add_client_context(device, client))
client->add(device);
down_write(&lists_rwsem);
list_add_tail(&client->list, &client_list);
up_write(&lists_rwsem);
mutex_unlock(&device_mutex);
return 0;
@ -402,19 +468,41 @@ void ib_unregister_client(struct ib_client *client)
mutex_lock(&device_mutex);
list_for_each_entry(device, &device_list, core_list) {
if (client->remove)
client->remove(device);
down_write(&lists_rwsem);
list_del(&client->list);
up_write(&lists_rwsem);
list_for_each_entry(device, &device_list, core_list) {
struct ib_client_data *found_context = NULL;
down_write(&lists_rwsem);
spin_lock_irqsave(&device->client_data_lock, flags);
list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
if (context->client == client) {
list_del(&context->list);
kfree(context);
context->going_down = true;
found_context = context;
break;
}
spin_unlock_irqrestore(&device->client_data_lock, flags);
up_write(&lists_rwsem);
if (client->remove)
client->remove(device, found_context ?
found_context->data : NULL);
if (!found_context) {
pr_warn("No client context found for %s/%s\n",
device->name, client->name);
continue;
}
down_write(&lists_rwsem);
spin_lock_irqsave(&device->client_data_lock, flags);
list_del(&found_context->list);
kfree(found_context);
spin_unlock_irqrestore(&device->client_data_lock, flags);
up_write(&lists_rwsem);
}
list_del(&client->list);
mutex_unlock(&device_mutex);
}
@ -590,10 +678,79 @@ EXPORT_SYMBOL(ib_query_port);
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid)
{
if (rdma_cap_roce_gid_table(device, port_num))
return ib_get_cached_gid(device, port_num, index, gid);
return device->query_gid(device, port_num, index, gid);
}
EXPORT_SYMBOL(ib_query_gid);
/**
* ib_enum_roce_netdev - enumerate all RoCE ports
* @ib_dev : IB device we want to query
* @filter: Should we call the callback?
* @filter_cookie: Cookie passed to filter
* @cb: Callback to call for each found RoCE ports
* @cookie: Cookie passed back to the callback
*
* Enumerates all of the physical RoCE ports of ib_dev
* which are related to netdevice and calls callback() on each
* device for which filter() function returns non zero.
*/
void ib_enum_roce_netdev(struct ib_device *ib_dev,
roce_netdev_filter filter,
void *filter_cookie,
roce_netdev_callback cb,
void *cookie)
{
u8 port;
for (port = rdma_start_port(ib_dev); port <= rdma_end_port(ib_dev);
port++)
if (rdma_protocol_roce(ib_dev, port)) {
struct net_device *idev = NULL;
if (ib_dev->get_netdev)
idev = ib_dev->get_netdev(ib_dev, port);
if (idev &&
idev->reg_state >= NETREG_UNREGISTERED) {
dev_put(idev);
idev = NULL;
}
if (filter(ib_dev, port, idev, filter_cookie))
cb(ib_dev, port, idev, cookie);
if (idev)
dev_put(idev);
}
}
/**
* ib_enum_all_roce_netdevs - enumerate all RoCE devices
* @filter: Should we call the callback?
* @filter_cookie: Cookie passed to filter
* @cb: Callback to call for each found RoCE ports
* @cookie: Cookie passed back to the callback
*
* Enumerates all RoCE devices' physical ports which are related
* to netdevices and calls callback() on each device for which
* filter() function returns non zero.
*/
void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
void *filter_cookie,
roce_netdev_callback cb,
void *cookie)
{
struct ib_device *dev;
down_read(&lists_rwsem);
list_for_each_entry(dev, &device_list, core_list)
ib_enum_roce_netdev(dev, filter, filter_cookie, cb, cookie);
up_read(&lists_rwsem);
}
/**
* ib_query_pkey - Get P_Key table entry
* @device:Device to query
@ -673,6 +830,14 @@ int ib_find_gid(struct ib_device *device, union ib_gid *gid,
int ret, port, i;
for (port = rdma_start_port(device); port <= rdma_end_port(device); ++port) {
if (rdma_cap_roce_gid_table(device, port)) {
if (!ib_cache_gid_find_by_port(device, gid, port,
NULL, index)) {
*port_num = port;
return 0;
}
}
for (i = 0; i < device->port_immutable[port].gid_tbl_len; ++i) {
ret = ib_query_gid(device, port, i, &tmp_gid);
if (ret)
@ -729,6 +894,51 @@ int ib_find_pkey(struct ib_device *device,
}
EXPORT_SYMBOL(ib_find_pkey);
/**
* ib_get_net_dev_by_params() - Return the appropriate net_dev
* for a received CM request
* @dev: An RDMA device on which the request has been received.
* @port: Port number on the RDMA device.
* @pkey: The Pkey the request came on.
* @gid: A GID that the net_dev uses to communicate.
* @addr: Contains the IP address that the request specified as its
* destination.
*/
struct net_device *ib_get_net_dev_by_params(struct ib_device *dev,
u8 port,
u16 pkey,
const union ib_gid *gid,
const struct sockaddr *addr)
{
struct net_device *net_dev = NULL;
struct ib_client_data *context;
if (!rdma_protocol_ib(dev, port))
return NULL;
down_read(&lists_rwsem);
list_for_each_entry(context, &dev->client_data_list, list) {
struct ib_client *client = context->client;
if (context->going_down)
continue;
if (client->get_net_dev_by_params) {
net_dev = client->get_net_dev_by_params(dev, port, pkey,
gid, addr,
context->data);
if (net_dev)
break;
}
}
up_read(&lists_rwsem);
return net_dev;
}
EXPORT_SYMBOL(ib_get_net_dev_by_params);
static int __init ib_core_init(void)
{
int ret;
@ -737,7 +947,7 @@ static int __init ib_core_init(void)
if (!ib_wq)
return -ENOMEM;
ret = ib_sysfs_setup();
ret = class_register(&ib_class);
if (ret) {
printk(KERN_WARNING "Couldn't create InfiniBand device class\n");
goto err;
@ -749,19 +959,12 @@ static int __init ib_core_init(void)
goto err_sysfs;
}
ret = ib_cache_setup();
if (ret) {
printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
goto err_nl;
}
ib_cache_setup();
return 0;
err_nl:
ibnl_cleanup();
err_sysfs:
ib_sysfs_cleanup();
class_unregister(&ib_class);
err:
destroy_workqueue(ib_wq);
@ -772,7 +975,7 @@ static void __exit ib_core_cleanup(void)
{
ib_cache_cleanup();
ibnl_cleanup();
ib_sysfs_cleanup();
class_unregister(&ib_class);
/* Make sure that any pending umem accounting work is done. */
destroy_workqueue(ib_wq);
}

View File

@ -338,13 +338,6 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
goto error1;
}
mad_agent_priv->agent.mr = ib_get_dma_mr(port_priv->qp_info[qpn].qp->pd,
IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(mad_agent_priv->agent.mr)) {
ret = ERR_PTR(-ENOMEM);
goto error2;
}
if (mad_reg_req) {
reg_req = kmemdup(mad_reg_req, sizeof *reg_req, GFP_KERNEL);
if (!reg_req) {
@ -429,8 +422,6 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
spin_unlock_irqrestore(&port_priv->reg_lock, flags);
kfree(reg_req);
error3:
ib_dereg_mr(mad_agent_priv->agent.mr);
error2:
kfree(mad_agent_priv);
error1:
return ret;
@ -590,7 +581,6 @@ static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
wait_for_completion(&mad_agent_priv->comp);
kfree(mad_agent_priv->reg_req);
ib_dereg_mr(mad_agent_priv->agent.mr);
kfree(mad_agent_priv);
}
@ -1038,7 +1028,7 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
mad_send_wr->mad_agent_priv = mad_agent_priv;
mad_send_wr->sg_list[0].length = hdr_len;
mad_send_wr->sg_list[0].lkey = mad_agent->mr->lkey;
mad_send_wr->sg_list[0].lkey = mad_agent->qp->pd->local_dma_lkey;
/* OPA MADs don't have to be the full 2048 bytes */
if (opa && base_version == OPA_MGMT_BASE_VERSION &&
@ -1047,7 +1037,7 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
else
mad_send_wr->sg_list[1].length = mad_size - hdr_len;
mad_send_wr->sg_list[1].lkey = mad_agent->mr->lkey;
mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
mad_send_wr->send_wr.wr_id = (unsigned long) mad_send_wr;
mad_send_wr->send_wr.sg_list = mad_send_wr->sg_list;
@ -2885,7 +2875,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
struct ib_mad_queue *recv_queue = &qp_info->recv_queue;
/* Initialize common scatter list fields */
sg_list.lkey = (*qp_info->port_priv->mr).lkey;
sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
/* Initialize common receive WR fields */
recv_wr.next = NULL;
@ -3201,13 +3191,6 @@ static int ib_mad_port_open(struct ib_device *device,
goto error4;
}
port_priv->mr = ib_get_dma_mr(port_priv->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(port_priv->mr)) {
dev_err(&device->dev, "Couldn't get ib_mad DMA MR\n");
ret = PTR_ERR(port_priv->mr);
goto error5;
}
if (has_smi) {
ret = create_mad_qp(&port_priv->qp_info[0], IB_QPT_SMI);
if (ret)
@ -3248,8 +3231,6 @@ static int ib_mad_port_open(struct ib_device *device,
error7:
destroy_mad_qp(&port_priv->qp_info[0]);
error6:
ib_dereg_mr(port_priv->mr);
error5:
ib_dealloc_pd(port_priv->pd);
error4:
ib_destroy_cq(port_priv->cq);
@ -3284,7 +3265,6 @@ static int ib_mad_port_close(struct ib_device *device, int port_num)
destroy_workqueue(port_priv->wq);
destroy_mad_qp(&port_priv->qp_info[1]);
destroy_mad_qp(&port_priv->qp_info[0]);
ib_dereg_mr(port_priv->mr);
ib_dealloc_pd(port_priv->pd);
ib_destroy_cq(port_priv->cq);
cleanup_recv_queue(&port_priv->qp_info[1]);
@ -3335,7 +3315,7 @@ static void ib_mad_init_device(struct ib_device *device)
}
}
static void ib_mad_remove_device(struct ib_device *device)
static void ib_mad_remove_device(struct ib_device *device, void *client_data)
{
int i;

View File

@ -199,7 +199,6 @@ struct ib_mad_port_private {
int port_num;
struct ib_cq *cq;
struct ib_pd *pd;
struct ib_mr *mr;
spinlock_t reg_lock;
struct ib_mad_mgmt_version_table version[MAX_MGMT_VERSION];

View File

@ -43,7 +43,7 @@
#include "sa.h"
static void mcast_add_one(struct ib_device *device);
static void mcast_remove_one(struct ib_device *device);
static void mcast_remove_one(struct ib_device *device, void *client_data);
static struct ib_client mcast_client = {
.name = "ib_multicast",
@ -840,13 +840,12 @@ static void mcast_add_one(struct ib_device *device)
ib_register_event_handler(&dev->event_handler);
}
static void mcast_remove_one(struct ib_device *device)
static void mcast_remove_one(struct ib_device *device, void *client_data)
{
struct mcast_device *dev;
struct mcast_device *dev = client_data;
struct mcast_port *port;
int i;
dev = ib_get_client_data(device, &mcast_client);
if (!dev)
return;

View File

@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
static struct sock *nls;
static LIST_HEAD(client_list);
int ibnl_chk_listeners(unsigned int group)
{
if (netlink_has_listeners(nls, group) == 0)
return -1;
return 0;
}
EXPORT_SYMBOL(ibnl_chk_listeners);
int ibnl_add_client(int index, int nops,
const struct ibnl_client_cbs cb_table[])
{
@ -151,6 +159,23 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
!client->cb_table[op].dump)
return -EINVAL;
/*
* For response or local service set_timeout request,
* there is no need to use netlink_dump_start.
*/
if (!(nlh->nlmsg_flags & NLM_F_REQUEST) ||
(index == RDMA_NL_LS &&
op == RDMA_NL_LS_OP_SET_TIMEOUT)) {
struct netlink_callback cb = {
.skb = skb,
.nlh = nlh,
.dump = client->cb_table[op].dump,
.module = client->cb_table[op].module,
};
return cb.dump(skb, &cb);
}
{
struct netlink_dump_control c = {
.dump = client->cb_table[op].dump,
@ -165,9 +190,39 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return -EINVAL;
}
static void ibnl_rcv_reply_skb(struct sk_buff *skb)
{
struct nlmsghdr *nlh;
int msglen;
/*
* Process responses until there is no more message or the first
* request. Generally speaking, it is not recommended to mix responses
* with requests.
*/
while (skb->len >= nlmsg_total_size(0)) {
nlh = nlmsg_hdr(skb);
if (nlh->nlmsg_len < NLMSG_HDRLEN || skb->len < nlh->nlmsg_len)
return;
/* Handle response only */
if (nlh->nlmsg_flags & NLM_F_REQUEST)
return;
ibnl_rcv_msg(skb, nlh);
msglen = NLMSG_ALIGN(nlh->nlmsg_len);
if (msglen > skb->len)
msglen = skb->len;
skb_pull(skb, msglen);
}
}
static void ibnl_rcv(struct sk_buff *skb)
{
mutex_lock(&ibnl_mutex);
ibnl_rcv_reply_skb(skb);
netlink_rcv_skb(skb, &ibnl_rcv_msg);
mutex_unlock(&ibnl_mutex);
}

View File

@ -0,0 +1,728 @@
/*
* Copyright (c) 2015, Mellanox Technologies inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "core_priv.h"
#include <linux/in.h>
#include <linux/in6.h>
/* For in6_dev_get/in6_dev_put */
#include <net/addrconf.h>
#include <net/bonding.h>
#include <rdma/ib_cache.h>
#include <rdma/ib_addr.h>
enum gid_op_type {
GID_DEL = 0,
GID_ADD
};
struct update_gid_event_work {
struct work_struct work;
union ib_gid gid;
struct ib_gid_attr gid_attr;
enum gid_op_type gid_op;
};
#define ROCE_NETDEV_CALLBACK_SZ 3
struct netdev_event_work_cmd {
roce_netdev_callback cb;
roce_netdev_filter filter;
struct net_device *ndev;
struct net_device *filter_ndev;
};
struct netdev_event_work {
struct work_struct work;
struct netdev_event_work_cmd cmds[ROCE_NETDEV_CALLBACK_SZ];
};
static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev,
u8 port, union ib_gid *gid,
struct ib_gid_attr *gid_attr)
{
switch (gid_op) {
case GID_ADD:
ib_cache_gid_add(ib_dev, port, gid, gid_attr);
break;
case GID_DEL:
ib_cache_gid_del(ib_dev, port, gid, gid_attr);
break;
}
}
enum bonding_slave_state {
BONDING_SLAVE_STATE_ACTIVE = 1UL << 0,
BONDING_SLAVE_STATE_INACTIVE = 1UL << 1,
/* No primary slave or the device isn't a slave in bonding */
BONDING_SLAVE_STATE_NA = 1UL << 2,
};
static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct net_device *dev,
struct net_device *upper)
{
if (upper && netif_is_bond_master(upper)) {
struct net_device *pdev =
bond_option_active_slave_get_rcu(netdev_priv(upper));
if (pdev)
return dev == pdev ? BONDING_SLAVE_STATE_ACTIVE :
BONDING_SLAVE_STATE_INACTIVE;
}
return BONDING_SLAVE_STATE_NA;
}
static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper)
{
struct net_device *_upper = NULL;
struct list_head *iter;
netdev_for_each_all_upper_dev_rcu(dev, _upper, iter)
if (_upper == upper)
break;
return _upper == upper;
}
#define REQUIRED_BOND_STATES (BONDING_SLAVE_STATE_ACTIVE | \
BONDING_SLAVE_STATE_NA)
static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
struct net_device *event_ndev = (struct net_device *)cookie;
struct net_device *real_dev;
int res;
if (!rdma_ndev)
return 0;
rcu_read_lock();
real_dev = rdma_vlan_dev_real_dev(event_ndev);
if (!real_dev)
real_dev = event_ndev;
res = ((is_upper_dev_rcu(rdma_ndev, event_ndev) &&
(is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) &
REQUIRED_BOND_STATES)) ||
real_dev == rdma_ndev);
rcu_read_unlock();
return res;
}
static int is_eth_port_inactive_slave(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
struct net_device *master_dev;
int res;
if (!rdma_ndev)
return 0;
rcu_read_lock();
master_dev = netdev_master_upper_dev_get_rcu(rdma_ndev);
res = is_eth_active_slave_of_bonding_rcu(rdma_ndev, master_dev) ==
BONDING_SLAVE_STATE_INACTIVE;
rcu_read_unlock();
return res;
}
static int pass_all_filter(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
return 1;
}
static int upper_device_filter(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
struct net_device *event_ndev = (struct net_device *)cookie;
int res;
if (!rdma_ndev)
return 0;
if (rdma_ndev == event_ndev)
return 1;
rcu_read_lock();
res = is_upper_dev_rcu(rdma_ndev, event_ndev);
rcu_read_unlock();
return res;
}
static void update_gid_ip(enum gid_op_type gid_op,
struct ib_device *ib_dev,
u8 port, struct net_device *ndev,
struct sockaddr *addr)
{
union ib_gid gid;
struct ib_gid_attr gid_attr;
rdma_ip2gid(addr, &gid);
memset(&gid_attr, 0, sizeof(gid_attr));
gid_attr.ndev = ndev;
update_gid(gid_op, ib_dev, port, &gid, &gid_attr);
}
static void enum_netdev_default_gids(struct ib_device *ib_dev,
u8 port, struct net_device *event_ndev,
struct net_device *rdma_ndev)
{
rcu_read_lock();
if (!rdma_ndev ||
((rdma_ndev != event_ndev &&
!is_upper_dev_rcu(rdma_ndev, event_ndev)) ||
is_eth_active_slave_of_bonding_rcu(rdma_ndev,
netdev_master_upper_dev_get_rcu(rdma_ndev)) ==
BONDING_SLAVE_STATE_INACTIVE)) {
rcu_read_unlock();
return;
}
rcu_read_unlock();
ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev,
IB_CACHE_GID_DEFAULT_MODE_SET);
}
static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
u8 port,
struct net_device *event_ndev,
struct net_device *rdma_ndev)
{
struct net_device *real_dev = rdma_vlan_dev_real_dev(event_ndev);
if (!rdma_ndev)
return;
if (!real_dev)
real_dev = event_ndev;
rcu_read_lock();
if (is_upper_dev_rcu(rdma_ndev, event_ndev) &&
is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) ==
BONDING_SLAVE_STATE_INACTIVE) {
rcu_read_unlock();
ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev,
IB_CACHE_GID_DEFAULT_MODE_DELETE);
} else {
rcu_read_unlock();
}
}
static void enum_netdev_ipv4_ips(struct ib_device *ib_dev,
u8 port, struct net_device *ndev)
{
struct in_device *in_dev;
if (ndev->reg_state >= NETREG_UNREGISTERING)
return;
in_dev = in_dev_get(ndev);
if (!in_dev)
return;
for_ifa(in_dev) {
struct sockaddr_in ip;
ip.sin_family = AF_INET;
ip.sin_addr.s_addr = ifa->ifa_address;
update_gid_ip(GID_ADD, ib_dev, port, ndev,
(struct sockaddr *)&ip);
}
endfor_ifa(in_dev);
in_dev_put(in_dev);
}
static void enum_netdev_ipv6_ips(struct ib_device *ib_dev,
u8 port, struct net_device *ndev)
{
struct inet6_ifaddr *ifp;
struct inet6_dev *in6_dev;
struct sin6_list {
struct list_head list;
struct sockaddr_in6 sin6;
};
struct sin6_list *sin6_iter;
struct sin6_list *sin6_temp;
struct ib_gid_attr gid_attr = {.ndev = ndev};
LIST_HEAD(sin6_list);
if (ndev->reg_state >= NETREG_UNREGISTERING)
return;
in6_dev = in6_dev_get(ndev);
if (!in6_dev)
return;
read_lock_bh(&in6_dev->lock);
list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
struct sin6_list *entry = kzalloc(sizeof(*entry), GFP_ATOMIC);
if (!entry) {
pr_warn("roce_gid_mgmt: couldn't allocate entry for IPv6 update\n");
continue;
}
entry->sin6.sin6_family = AF_INET6;
entry->sin6.sin6_addr = ifp->addr;
list_add_tail(&entry->list, &sin6_list);
}
read_unlock_bh(&in6_dev->lock);
in6_dev_put(in6_dev);
list_for_each_entry_safe(sin6_iter, sin6_temp, &sin6_list, list) {
union ib_gid gid;
rdma_ip2gid((struct sockaddr *)&sin6_iter->sin6, &gid);
update_gid(GID_ADD, ib_dev, port, &gid, &gid_attr);
list_del(&sin6_iter->list);
kfree(sin6_iter);
}
}
static void _add_netdev_ips(struct ib_device *ib_dev, u8 port,
struct net_device *ndev)
{
enum_netdev_ipv4_ips(ib_dev, port, ndev);
if (IS_ENABLED(CONFIG_IPV6))
enum_netdev_ipv6_ips(ib_dev, port, ndev);
}
static void add_netdev_ips(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
struct net_device *event_ndev = (struct net_device *)cookie;
enum_netdev_default_gids(ib_dev, port, event_ndev, rdma_ndev);
_add_netdev_ips(ib_dev, port, event_ndev);
}
static void del_netdev_ips(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
struct net_device *event_ndev = (struct net_device *)cookie;
ib_cache_gid_del_all_netdev_gids(ib_dev, port, event_ndev);
}
static void enum_all_gids_of_dev_cb(struct ib_device *ib_dev,
u8 port,
struct net_device *rdma_ndev,
void *cookie)
{
struct net *net;
struct net_device *ndev;
/* Lock the rtnl to make sure the netdevs does not move under
* our feet
*/
rtnl_lock();
for_each_net(net)
for_each_netdev(net, ndev)
if (is_eth_port_of_netdev(ib_dev, port, rdma_ndev, ndev))
add_netdev_ips(ib_dev, port, rdma_ndev, ndev);
rtnl_unlock();
}
/* This function will rescan all of the network devices in the system
* and add their gids, as needed, to the relevant RoCE devices. */
int roce_rescan_device(struct ib_device *ib_dev)
{
ib_enum_roce_netdev(ib_dev, pass_all_filter, NULL,
enum_all_gids_of_dev_cb, NULL);
return 0;
}
static void callback_for_addr_gid_device_scan(struct ib_device *device,
u8 port,
struct net_device *rdma_ndev,
void *cookie)
{
struct update_gid_event_work *parsed = cookie;
return update_gid(parsed->gid_op, device,
port, &parsed->gid,
&parsed->gid_attr);
}
static void handle_netdev_upper(struct ib_device *ib_dev, u8 port,
void *cookie,
void (*handle_netdev)(struct ib_device *ib_dev,
u8 port,
struct net_device *ndev))
{
struct net_device *ndev = (struct net_device *)cookie;
struct upper_list {
struct list_head list;
struct net_device *upper;
};
struct net_device *upper;
struct list_head *iter;
struct upper_list *upper_iter;
struct upper_list *upper_temp;
LIST_HEAD(upper_list);
rcu_read_lock();
netdev_for_each_all_upper_dev_rcu(ndev, upper, iter) {
struct upper_list *entry = kmalloc(sizeof(*entry),
GFP_ATOMIC);
if (!entry) {
pr_info("roce_gid_mgmt: couldn't allocate entry to delete ndev\n");
continue;
}
list_add_tail(&entry->list, &upper_list);
dev_hold(upper);
entry->upper = upper;
}
rcu_read_unlock();
handle_netdev(ib_dev, port, ndev);
list_for_each_entry_safe(upper_iter, upper_temp, &upper_list,
list) {
handle_netdev(ib_dev, port, upper_iter->upper);
dev_put(upper_iter->upper);
list_del(&upper_iter->list);
kfree(upper_iter);
}
}
static void _roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
struct net_device *event_ndev)
{
ib_cache_gid_del_all_netdev_gids(ib_dev, port, event_ndev);
}
static void del_netdev_upper_ips(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
handle_netdev_upper(ib_dev, port, cookie, _roce_del_all_netdev_gids);
}
static void add_netdev_upper_ips(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
handle_netdev_upper(ib_dev, port, cookie, _add_netdev_ips);
}
static void del_netdev_default_ips_join(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev,
void *cookie)
{
struct net_device *master_ndev;
rcu_read_lock();
master_ndev = netdev_master_upper_dev_get_rcu(rdma_ndev);
if (master_ndev)
dev_hold(master_ndev);
rcu_read_unlock();
if (master_ndev) {
bond_delete_netdev_default_gids(ib_dev, port, master_ndev,
rdma_ndev);
dev_put(master_ndev);
}
}
static void del_netdev_default_ips(struct ib_device *ib_dev, u8 port,
struct net_device *rdma_ndev, void *cookie)
{
struct net_device *event_ndev = (struct net_device *)cookie;
bond_delete_netdev_default_gids(ib_dev, port, event_ndev, rdma_ndev);
}
/* The following functions operate on all IB devices. netdevice_event and
* addr_event execute ib_enum_all_roce_netdevs through a work.
* ib_enum_all_roce_netdevs iterates through all IB devices.
*/
static void netdevice_event_work_handler(struct work_struct *_work)
{
struct netdev_event_work *work =
container_of(_work, struct netdev_event_work, work);
unsigned int i;
for (i = 0; i < ARRAY_SIZE(work->cmds) && work->cmds[i].cb; i++) {
ib_enum_all_roce_netdevs(work->cmds[i].filter,
work->cmds[i].filter_ndev,
work->cmds[i].cb,
work->cmds[i].ndev);
dev_put(work->cmds[i].ndev);
dev_put(work->cmds[i].filter_ndev);
}
kfree(work);
}
static int netdevice_queue_work(struct netdev_event_work_cmd *cmds,
struct net_device *ndev)
{
unsigned int i;
struct netdev_event_work *ndev_work =
kmalloc(sizeof(*ndev_work), GFP_KERNEL);
if (!ndev_work) {
pr_warn("roce_gid_mgmt: can't allocate work for netdevice_event\n");
return NOTIFY_DONE;
}
memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds));
for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) {
if (!ndev_work->cmds[i].ndev)
ndev_work->cmds[i].ndev = ndev;
if (!ndev_work->cmds[i].filter_ndev)
ndev_work->cmds[i].filter_ndev = ndev;
dev_hold(ndev_work->cmds[i].ndev);
dev_hold(ndev_work->cmds[i].filter_ndev);
}
INIT_WORK(&ndev_work->work, netdevice_event_work_handler);
queue_work(ib_wq, &ndev_work->work);
return NOTIFY_DONE;
}
static const struct netdev_event_work_cmd add_cmd = {
.cb = add_netdev_ips, .filter = is_eth_port_of_netdev};
static const struct netdev_event_work_cmd add_cmd_upper_ips = {
.cb = add_netdev_upper_ips, .filter = is_eth_port_of_netdev};
static void netdevice_event_changeupper(struct netdev_notifier_changeupper_info *changeupper_info,
struct netdev_event_work_cmd *cmds)
{
static const struct netdev_event_work_cmd upper_ips_del_cmd = {
.cb = del_netdev_upper_ips, .filter = upper_device_filter};
static const struct netdev_event_work_cmd bonding_default_del_cmd = {
.cb = del_netdev_default_ips, .filter = is_eth_port_inactive_slave};
if (changeupper_info->linking == false) {
cmds[0] = upper_ips_del_cmd;
cmds[0].ndev = changeupper_info->upper_dev;
cmds[1] = add_cmd;
} else {
cmds[0] = bonding_default_del_cmd;
cmds[0].ndev = changeupper_info->upper_dev;
cmds[1] = add_cmd_upper_ips;
cmds[1].ndev = changeupper_info->upper_dev;
cmds[1].filter_ndev = changeupper_info->upper_dev;
}
}
static int netdevice_event(struct notifier_block *this, unsigned long event,
void *ptr)
{
static const struct netdev_event_work_cmd del_cmd = {
.cb = del_netdev_ips, .filter = pass_all_filter};
static const struct netdev_event_work_cmd bonding_default_del_cmd_join = {
.cb = del_netdev_default_ips_join, .filter = is_eth_port_inactive_slave};
static const struct netdev_event_work_cmd default_del_cmd = {
.cb = del_netdev_default_ips, .filter = pass_all_filter};
static const struct netdev_event_work_cmd bonding_event_ips_del_cmd = {
.cb = del_netdev_upper_ips, .filter = upper_device_filter};
struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
struct netdev_event_work_cmd cmds[ROCE_NETDEV_CALLBACK_SZ] = { {NULL} };
if (ndev->type != ARPHRD_ETHER)
return NOTIFY_DONE;
switch (event) {
case NETDEV_REGISTER:
case NETDEV_UP:
cmds[0] = bonding_default_del_cmd_join;
cmds[1] = add_cmd;
break;
case NETDEV_UNREGISTER:
if (ndev->reg_state < NETREG_UNREGISTERED)
cmds[0] = del_cmd;
else
return NOTIFY_DONE;
break;
case NETDEV_CHANGEADDR:
cmds[0] = default_del_cmd;
cmds[1] = add_cmd;
break;
case NETDEV_CHANGEUPPER:
netdevice_event_changeupper(
container_of(ptr, struct netdev_notifier_changeupper_info, info),
cmds);
break;
case NETDEV_BONDING_FAILOVER:
cmds[0] = bonding_event_ips_del_cmd;
cmds[1] = bonding_default_del_cmd_join;
cmds[2] = add_cmd_upper_ips;
break;
default:
return NOTIFY_DONE;
}
return netdevice_queue_work(cmds, ndev);
}
static void update_gid_event_work_handler(struct work_struct *_work)
{
struct update_gid_event_work *work =
container_of(_work, struct update_gid_event_work, work);
ib_enum_all_roce_netdevs(is_eth_port_of_netdev, work->gid_attr.ndev,
callback_for_addr_gid_device_scan, work);
dev_put(work->gid_attr.ndev);
kfree(work);
}
static int addr_event(struct notifier_block *this, unsigned long event,
struct sockaddr *sa, struct net_device *ndev)
{
struct update_gid_event_work *work;
enum gid_op_type gid_op;
if (ndev->type != ARPHRD_ETHER)
return NOTIFY_DONE;
switch (event) {
case NETDEV_UP:
gid_op = GID_ADD;
break;
case NETDEV_DOWN:
gid_op = GID_DEL;
break;
default:
return NOTIFY_DONE;
}
work = kmalloc(sizeof(*work), GFP_ATOMIC);
if (!work) {
pr_warn("roce_gid_mgmt: Couldn't allocate work for addr_event\n");
return NOTIFY_DONE;
}
INIT_WORK(&work->work, update_gid_event_work_handler);
rdma_ip2gid(sa, &work->gid);
work->gid_op = gid_op;
memset(&work->gid_attr, 0, sizeof(work->gid_attr));
dev_hold(ndev);
work->gid_attr.ndev = ndev;
queue_work(ib_wq, &work->work);
return NOTIFY_DONE;
}
static int inetaddr_event(struct notifier_block *this, unsigned long event,
void *ptr)
{
struct sockaddr_in in;
struct net_device *ndev;
struct in_ifaddr *ifa = ptr;
in.sin_family = AF_INET;
in.sin_addr.s_addr = ifa->ifa_address;
ndev = ifa->ifa_dev->dev;
return addr_event(this, event, (struct sockaddr *)&in, ndev);
}
static int inet6addr_event(struct notifier_block *this, unsigned long event,
void *ptr)
{
struct sockaddr_in6 in6;
struct net_device *ndev;
struct inet6_ifaddr *ifa6 = ptr;
in6.sin6_family = AF_INET6;
in6.sin6_addr = ifa6->addr;
ndev = ifa6->idev->dev;
return addr_event(this, event, (struct sockaddr *)&in6, ndev);
}
static struct notifier_block nb_netdevice = {
.notifier_call = netdevice_event
};
static struct notifier_block nb_inetaddr = {
.notifier_call = inetaddr_event
};
static struct notifier_block nb_inet6addr = {
.notifier_call = inet6addr_event
};
int __init roce_gid_mgmt_init(void)
{
register_inetaddr_notifier(&nb_inetaddr);
if (IS_ENABLED(CONFIG_IPV6))
register_inet6addr_notifier(&nb_inet6addr);
/* We relay on the netdevice notifier to enumerate all
* existing devices in the system. Register to this notifier
* last to make sure we will not miss any IP add/del
* callbacks.
*/
register_netdevice_notifier(&nb_netdevice);
return 0;
}
void __exit roce_gid_mgmt_cleanup(void)
{
if (IS_ENABLED(CONFIG_IPV6))
unregister_inet6addr_notifier(&nb_inet6addr);
unregister_inetaddr_notifier(&nb_inetaddr);
unregister_netdevice_notifier(&nb_netdevice);
/* Ensure all gid deletion tasks complete before we go down,
* to avoid any reference to free'd memory. By the time
* ib-core is removed, all physical devices have been removed,
* so no issue with remaining hardware contexts.
*/
}

View File

@ -45,12 +45,21 @@
#include <uapi/linux/if_ether.h>
#include <rdma/ib_pack.h>
#include <rdma/ib_cache.h>
#include <rdma/rdma_netlink.h>
#include <net/netlink.h>
#include <uapi/rdma/ib_user_sa.h>
#include <rdma/ib_marshall.h>
#include "sa.h"
MODULE_AUTHOR("Roland Dreier");
MODULE_DESCRIPTION("InfiniBand subnet administration query support");
MODULE_LICENSE("Dual BSD/GPL");
#define IB_SA_LOCAL_SVC_TIMEOUT_MIN 100
#define IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT 2000
#define IB_SA_LOCAL_SVC_TIMEOUT_MAX 200000
static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
struct ib_sa_sm_ah {
struct ib_ah *ah;
struct kref ref;
@ -80,8 +89,16 @@ struct ib_sa_query {
struct ib_mad_send_buf *mad_buf;
struct ib_sa_sm_ah *sm_ah;
int id;
u32 flags;
struct list_head list; /* Local svc request list */
u32 seq; /* Local svc request sequence number */
unsigned long timeout; /* Local svc timeout */
u8 path_use; /* How will the pathrecord be used */
};
#define IB_SA_ENABLE_LOCAL_SERVICE 0x00000001
#define IB_SA_CANCEL 0x00000002
struct ib_sa_service_query {
void (*callback)(int, struct ib_sa_service_rec *, void *);
void *context;
@ -106,8 +123,28 @@ struct ib_sa_mcmember_query {
struct ib_sa_query sa_query;
};
static LIST_HEAD(ib_nl_request_list);
static DEFINE_SPINLOCK(ib_nl_request_lock);
static atomic_t ib_nl_sa_request_seq;
static struct workqueue_struct *ib_nl_wq;
static struct delayed_work ib_nl_timed_work;
static const struct nla_policy ib_nl_policy[LS_NLA_TYPE_MAX] = {
[LS_NLA_TYPE_PATH_RECORD] = {.type = NLA_BINARY,
.len = sizeof(struct ib_path_rec_data)},
[LS_NLA_TYPE_TIMEOUT] = {.type = NLA_U32},
[LS_NLA_TYPE_SERVICE_ID] = {.type = NLA_U64},
[LS_NLA_TYPE_DGID] = {.type = NLA_BINARY,
.len = sizeof(struct rdma_nla_ls_gid)},
[LS_NLA_TYPE_SGID] = {.type = NLA_BINARY,
.len = sizeof(struct rdma_nla_ls_gid)},
[LS_NLA_TYPE_TCLASS] = {.type = NLA_U8},
[LS_NLA_TYPE_PKEY] = {.type = NLA_U16},
[LS_NLA_TYPE_QOS_CLASS] = {.type = NLA_U16},
};
static void ib_sa_add_one(struct ib_device *device);
static void ib_sa_remove_one(struct ib_device *device);
static void ib_sa_remove_one(struct ib_device *device, void *client_data);
static struct ib_client sa_client = {
.name = "sa",
@ -381,6 +418,427 @@ static const struct ib_field guidinfo_rec_table[] = {
.size_bits = 512 },
};
static inline void ib_sa_disable_local_svc(struct ib_sa_query *query)
{
query->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE;
}
static inline int ib_sa_query_cancelled(struct ib_sa_query *query)
{
return (query->flags & IB_SA_CANCEL);
}
static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
struct ib_sa_query *query)
{
struct ib_sa_path_rec *sa_rec = query->mad_buf->context[1];
struct ib_sa_mad *mad = query->mad_buf->mad;
ib_sa_comp_mask comp_mask = mad->sa_hdr.comp_mask;
u16 val16;
u64 val64;
struct rdma_ls_resolve_header *header;
query->mad_buf->context[1] = NULL;
/* Construct the family header first */
header = (struct rdma_ls_resolve_header *)
skb_put(skb, NLMSG_ALIGN(sizeof(*header)));
memcpy(header->device_name, query->port->agent->device->name,
LS_DEVICE_NAME_MAX);
header->port_num = query->port->port_num;
if ((comp_mask & IB_SA_PATH_REC_REVERSIBLE) &&
sa_rec->reversible != 0)
query->path_use = LS_RESOLVE_PATH_USE_GMP;
else
query->path_use = LS_RESOLVE_PATH_USE_UNIDIRECTIONAL;
header->path_use = query->path_use;
/* Now build the attributes */
if (comp_mask & IB_SA_PATH_REC_SERVICE_ID) {
val64 = be64_to_cpu(sa_rec->service_id);
nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SERVICE_ID,
sizeof(val64), &val64);
}
if (comp_mask & IB_SA_PATH_REC_DGID)
nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_DGID,
sizeof(sa_rec->dgid), &sa_rec->dgid);
if (comp_mask & IB_SA_PATH_REC_SGID)
nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SGID,
sizeof(sa_rec->sgid), &sa_rec->sgid);
if (comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS)
nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_TCLASS,
sizeof(sa_rec->traffic_class), &sa_rec->traffic_class);
if (comp_mask & IB_SA_PATH_REC_PKEY) {
val16 = be16_to_cpu(sa_rec->pkey);
nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PKEY,
sizeof(val16), &val16);
}
if (comp_mask & IB_SA_PATH_REC_QOS_CLASS) {
val16 = be16_to_cpu(sa_rec->qos_class);
nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_QOS_CLASS,
sizeof(val16), &val16);
}
}
static int ib_nl_get_path_rec_attrs_len(ib_sa_comp_mask comp_mask)
{
int len = 0;
if (comp_mask & IB_SA_PATH_REC_SERVICE_ID)
len += nla_total_size(sizeof(u64));
if (comp_mask & IB_SA_PATH_REC_DGID)
len += nla_total_size(sizeof(struct rdma_nla_ls_gid));
if (comp_mask & IB_SA_PATH_REC_SGID)
len += nla_total_size(sizeof(struct rdma_nla_ls_gid));
if (comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS)
len += nla_total_size(sizeof(u8));
if (comp_mask & IB_SA_PATH_REC_PKEY)
len += nla_total_size(sizeof(u16));
if (comp_mask & IB_SA_PATH_REC_QOS_CLASS)
len += nla_total_size(sizeof(u16));
/*
* Make sure that at least some of the required comp_mask bits are
* set.
*/
if (WARN_ON(len == 0))
return len;
/* Add the family header */
len += NLMSG_ALIGN(sizeof(struct rdma_ls_resolve_header));
return len;
}
static int ib_nl_send_msg(struct ib_sa_query *query)
{
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
void *data;
int ret = 0;
struct ib_sa_mad *mad;
int len;
mad = query->mad_buf->mad;
len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
if (len <= 0)
return -EMSGSIZE;
skb = nlmsg_new(len, GFP_KERNEL);
if (!skb)
return -ENOMEM;
/* Put nlmsg header only for now */
data = ibnl_put_msg(skb, &nlh, query->seq, 0, RDMA_NL_LS,
RDMA_NL_LS_OP_RESOLVE, NLM_F_REQUEST);
if (!data) {
kfree_skb(skb);
return -EMSGSIZE;
}
/* Add attributes */
ib_nl_set_path_rec_attrs(skb, query);
/* Repair the nlmsg header length */
nlmsg_end(skb, nlh);
ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
if (!ret)
ret = len;
else
ret = 0;
return ret;
}
static int ib_nl_make_request(struct ib_sa_query *query)
{
unsigned long flags;
unsigned long delay;
int ret;
INIT_LIST_HEAD(&query->list);
query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
spin_lock_irqsave(&ib_nl_request_lock, flags);
ret = ib_nl_send_msg(query);
if (ret <= 0) {
ret = -EIO;
goto request_out;
} else {
ret = 0;
}
delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
query->timeout = delay + jiffies;
list_add_tail(&query->list, &ib_nl_request_list);
/* Start the timeout if this is the only request */
if (ib_nl_request_list.next == &query->list)
queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
request_out:
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
return ret;
}
static int ib_nl_cancel_request(struct ib_sa_query *query)
{
unsigned long flags;
struct ib_sa_query *wait_query;
int found = 0;
spin_lock_irqsave(&ib_nl_request_lock, flags);
list_for_each_entry(wait_query, &ib_nl_request_list, list) {
/* Let the timeout to take care of the callback */
if (query == wait_query) {
query->flags |= IB_SA_CANCEL;
query->timeout = jiffies;
list_move(&query->list, &ib_nl_request_list);
found = 1;
mod_delayed_work(ib_nl_wq, &ib_nl_timed_work, 1);
break;
}
}
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
return found;
}
static void send_handler(struct ib_mad_agent *agent,
struct ib_mad_send_wc *mad_send_wc);
static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query,
const struct nlmsghdr *nlh)
{
struct ib_mad_send_wc mad_send_wc;
struct ib_sa_mad *mad = NULL;
const struct nlattr *head, *curr;
struct ib_path_rec_data *rec;
int len, rem;
u32 mask = 0;
int status = -EIO;
if (query->callback) {
head = (const struct nlattr *) nlmsg_data(nlh);
len = nlmsg_len(nlh);
switch (query->path_use) {
case LS_RESOLVE_PATH_USE_UNIDIRECTIONAL:
mask = IB_PATH_PRIMARY | IB_PATH_OUTBOUND;
break;
case LS_RESOLVE_PATH_USE_ALL:
case LS_RESOLVE_PATH_USE_GMP:
default:
mask = IB_PATH_PRIMARY | IB_PATH_GMP |
IB_PATH_BIDIRECTIONAL;
break;
}
nla_for_each_attr(curr, head, len, rem) {
if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
rec = nla_data(curr);
/*
* Get the first one. In the future, we may
* need to get up to 6 pathrecords.
*/
if ((rec->flags & mask) == mask) {
mad = query->mad_buf->mad;
mad->mad_hdr.method |=
IB_MGMT_METHOD_RESP;
memcpy(mad->data, rec->path_rec,
sizeof(rec->path_rec));
status = 0;
break;
}
}
}
query->callback(query, status, mad);
}
mad_send_wc.send_buf = query->mad_buf;
mad_send_wc.status = IB_WC_SUCCESS;
send_handler(query->mad_buf->mad_agent, &mad_send_wc);
}
static void ib_nl_request_timeout(struct work_struct *work)
{
unsigned long flags;
struct ib_sa_query *query;
unsigned long delay;
struct ib_mad_send_wc mad_send_wc;
int ret;
spin_lock_irqsave(&ib_nl_request_lock, flags);
while (!list_empty(&ib_nl_request_list)) {
query = list_entry(ib_nl_request_list.next,
struct ib_sa_query, list);
if (time_after(query->timeout, jiffies)) {
delay = query->timeout - jiffies;
if ((long)delay <= 0)
delay = 1;
queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
break;
}
list_del(&query->list);
ib_sa_disable_local_svc(query);
/* Hold the lock to protect against query cancellation */
if (ib_sa_query_cancelled(query))
ret = -1;
else
ret = ib_post_send_mad(query->mad_buf, NULL);
if (ret) {
mad_send_wc.send_buf = query->mad_buf;
mad_send_wc.status = IB_WC_WR_FLUSH_ERR;
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
send_handler(query->port->agent, &mad_send_wc);
spin_lock_irqsave(&ib_nl_request_lock, flags);
}
}
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
}
static int ib_nl_handle_set_timeout(struct sk_buff *skb,
struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
int timeout, delta, abs_delta;
const struct nlattr *attr;
unsigned long flags;
struct ib_sa_query *query;
long delay = 0;
struct nlattr *tb[LS_NLA_TYPE_MAX];
int ret;
if (!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
ret = nla_parse(tb, LS_NLA_TYPE_MAX - 1, nlmsg_data(nlh),
nlmsg_len(nlh), ib_nl_policy);
attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT];
if (ret || !attr)
goto settimeout_out;
timeout = *(int *) nla_data(attr);
if (timeout < IB_SA_LOCAL_SVC_TIMEOUT_MIN)
timeout = IB_SA_LOCAL_SVC_TIMEOUT_MIN;
if (timeout > IB_SA_LOCAL_SVC_TIMEOUT_MAX)
timeout = IB_SA_LOCAL_SVC_TIMEOUT_MAX;
delta = timeout - sa_local_svc_timeout_ms;
if (delta < 0)
abs_delta = -delta;
else
abs_delta = delta;
if (delta != 0) {
spin_lock_irqsave(&ib_nl_request_lock, flags);
sa_local_svc_timeout_ms = timeout;
list_for_each_entry(query, &ib_nl_request_list, list) {
if (delta < 0 && abs_delta > query->timeout)
query->timeout = 0;
else
query->timeout += delta;
/* Get the new delay from the first entry */
if (!delay) {
delay = query->timeout - jiffies;
if (delay <= 0)
delay = 1;
}
}
if (delay)
mod_delayed_work(ib_nl_wq, &ib_nl_timed_work,
(unsigned long)delay);
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
}
settimeout_out:
return skb->len;
}
static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh)
{
struct nlattr *tb[LS_NLA_TYPE_MAX];
int ret;
if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR)
return 0;
ret = nla_parse(tb, LS_NLA_TYPE_MAX - 1, nlmsg_data(nlh),
nlmsg_len(nlh), ib_nl_policy);
if (ret)
return 0;
return 1;
}
static int ib_nl_handle_resolve_resp(struct sk_buff *skb,
struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
unsigned long flags;
struct ib_sa_query *query;
struct ib_mad_send_buf *send_buf;
struct ib_mad_send_wc mad_send_wc;
int found = 0;
int ret;
if (!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
spin_lock_irqsave(&ib_nl_request_lock, flags);
list_for_each_entry(query, &ib_nl_request_list, list) {
/*
* If the query is cancelled, let the timeout routine
* take care of it.
*/
if (nlh->nlmsg_seq == query->seq) {
found = !ib_sa_query_cancelled(query);
if (found)
list_del(&query->list);
break;
}
}
if (!found) {
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
goto resp_out;
}
send_buf = query->mad_buf;
if (!ib_nl_is_good_resolve_resp(nlh)) {
/* if the result is a failure, send out the packet via IB */
ib_sa_disable_local_svc(query);
ret = ib_post_send_mad(query->mad_buf, NULL);
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
if (ret) {
mad_send_wc.send_buf = send_buf;
mad_send_wc.status = IB_WC_GENERAL_ERR;
send_handler(query->port->agent, &mad_send_wc);
}
} else {
spin_unlock_irqrestore(&ib_nl_request_lock, flags);
ib_nl_process_good_resolve_rsp(query, nlh);
}
resp_out:
return skb->len;
}
static struct ibnl_client_cbs ib_sa_cb_table[] = {
[RDMA_NL_LS_OP_RESOLVE] = {
.dump = ib_nl_handle_resolve_resp,
.module = THIS_MODULE },
[RDMA_NL_LS_OP_SET_TIMEOUT] = {
.dump = ib_nl_handle_set_timeout,
.module = THIS_MODULE },
};
static void free_sm_ah(struct kref *kref)
{
struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref);
@ -502,7 +960,13 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query)
mad_buf = query->mad_buf;
spin_unlock_irqrestore(&idr_lock, flags);
ib_cancel_mad(agent, mad_buf);
/*
* If the query is still on the netlink request list, schedule
* it to be cancelled by the timeout routine. Otherwise, it has been
* sent to the MAD layer and has to be cancelled from there.
*/
if (!ib_nl_cancel_request(query))
ib_cancel_mad(agent, mad_buf);
}
EXPORT_SYMBOL(ib_sa_cancel_query);
@ -639,6 +1103,14 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
query->mad_buf->context[0] = query;
query->id = id;
if (query->flags & IB_SA_ENABLE_LOCAL_SERVICE) {
if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) {
if (!ib_nl_make_request(query))
return id;
}
ib_sa_disable_local_svc(query);
}
ret = ib_post_send_mad(query->mad_buf, NULL);
if (ret) {
spin_lock_irqsave(&idr_lock, flags);
@ -740,7 +1212,7 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
port = &sa_dev->port[port_num - sa_dev->start_port];
agent = port->agent;
query = kmalloc(sizeof *query, gfp_mask);
query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@ -767,6 +1239,9 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
*sa_query = &query->sa_query;
query->sa_query.flags |= IB_SA_ENABLE_LOCAL_SERVICE;
query->sa_query.mad_buf->context[1] = rec;
ret = send_mad(&query->sa_query, timeout_ms, gfp_mask);
if (ret < 0)
goto err2;
@ -862,7 +1337,7 @@ int ib_sa_service_rec_query(struct ib_sa_client *client,
method != IB_SA_METHOD_DELETE)
return -EINVAL;
query = kmalloc(sizeof *query, gfp_mask);
query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@ -954,7 +1429,7 @@ int ib_sa_mcmember_rec_query(struct ib_sa_client *client,
port = &sa_dev->port[port_num - sa_dev->start_port];
agent = port->agent;
query = kmalloc(sizeof *query, gfp_mask);
query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@ -1051,7 +1526,7 @@ int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
port = &sa_dev->port[port_num - sa_dev->start_port];
agent = port->agent;
query = kmalloc(sizeof *query, gfp_mask);
query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@ -1221,9 +1696,9 @@ static void ib_sa_add_one(struct ib_device *device)
return;
}
static void ib_sa_remove_one(struct ib_device *device)
static void ib_sa_remove_one(struct ib_device *device, void *client_data)
{
struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
struct ib_sa_device *sa_dev = client_data;
int i;
if (!sa_dev)
@ -1251,6 +1726,8 @@ static int __init ib_sa_init(void)
get_random_bytes(&tid, sizeof tid);
atomic_set(&ib_nl_sa_request_seq, 0);
ret = ib_register_client(&sa_client);
if (ret) {
printk(KERN_ERR "Couldn't register ib_sa client\n");
@ -1263,7 +1740,25 @@ static int __init ib_sa_init(void)
goto err2;
}
ib_nl_wq = create_singlethread_workqueue("ib_nl_sa_wq");
if (!ib_nl_wq) {
ret = -ENOMEM;
goto err3;
}
if (ibnl_add_client(RDMA_NL_LS, RDMA_NL_LS_NUM_OPS,
ib_sa_cb_table)) {
pr_err("Failed to add netlink callback\n");
ret = -EINVAL;
goto err4;
}
INIT_DELAYED_WORK(&ib_nl_timed_work, ib_nl_request_timeout);
return 0;
err4:
destroy_workqueue(ib_nl_wq);
err3:
mcast_cleanup();
err2:
ib_unregister_client(&sa_client);
err1:
@ -1272,6 +1767,10 @@ static int __init ib_sa_init(void)
static void __exit ib_sa_cleanup(void)
{
ibnl_remove_client(RDMA_NL_LS);
cancel_delayed_work(&ib_nl_timed_work);
flush_workqueue(ib_nl_wq);
destroy_workqueue(ib_nl_wq);
mcast_cleanup();
ib_unregister_client(&sa_client);
idr_destroy(&query_idr);

View File

@ -457,29 +457,6 @@ static struct kobj_type port_type = {
.default_attrs = port_default_attrs
};
static void ib_device_release(struct device *device)
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
kfree(dev->port_immutable);
kfree(dev);
}
static int ib_device_uevent(struct device *device,
struct kobj_uevent_env *env)
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
if (add_uevent_var(env, "NAME=%s", dev->name))
return -ENOMEM;
/*
* It would be nice to pass the node GUID with the event...
*/
return 0;
}
static struct attribute **
alloc_group_attrs(ssize_t (*show)(struct ib_port *,
struct port_attribute *, char *buf),
@ -702,12 +679,6 @@ static struct device_attribute *ib_class_attributes[] = {
&dev_attr_node_desc
};
static struct class ib_class = {
.name = "infiniband",
.dev_release = ib_device_release,
.dev_uevent = ib_device_uevent,
};
/* Show a given an attribute in the statistics group */
static ssize_t show_protocol_stat(const struct device *device,
struct device_attribute *attr, char *buf,
@ -846,14 +817,12 @@ int ib_device_register_sysfs(struct ib_device *device,
int ret;
int i;
class_dev->class = &ib_class;
class_dev->parent = device->dma_device;
dev_set_name(class_dev, "%s", device->name);
dev_set_drvdata(class_dev, device);
device->dev.parent = device->dma_device;
ret = dev_set_name(class_dev, "%s", device->name);
if (ret)
return ret;
INIT_LIST_HEAD(&device->port_list);
ret = device_register(class_dev);
ret = device_add(class_dev);
if (ret)
goto err;
@ -916,13 +885,3 @@ void ib_device_unregister_sysfs(struct ib_device *device)
device_unregister(&device->dev);
}
int ib_sysfs_setup(void)
{
return class_register(&ib_class);
}
void ib_sysfs_cleanup(void)
{
class_unregister(&ib_class);
}

View File

@ -109,7 +109,7 @@ enum {
#define IB_UCM_BASE_DEV MKDEV(IB_UCM_MAJOR, IB_UCM_BASE_MINOR)
static void ib_ucm_add_one(struct ib_device *device);
static void ib_ucm_remove_one(struct ib_device *device);
static void ib_ucm_remove_one(struct ib_device *device, void *client_data);
static struct ib_client ucm_client = {
.name = "ucm",
@ -658,8 +658,7 @@ static ssize_t ib_ucm_listen(struct ib_ucm_file *file,
if (result)
goto out;
result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask,
NULL);
result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask);
out:
ib_ucm_ctx_put(ctx);
return result;
@ -1310,9 +1309,9 @@ static void ib_ucm_add_one(struct ib_device *device)
return;
}
static void ib_ucm_remove_one(struct ib_device *device)
static void ib_ucm_remove_one(struct ib_device *device, void *client_data)
{
struct ib_ucm_device *ucm_dev = ib_get_client_data(device, &ucm_client);
struct ib_ucm_device *ucm_dev = client_data;
if (!ucm_dev)
return;

View File

@ -74,6 +74,7 @@ struct ucma_file {
struct list_head ctx_list;
struct list_head event_list;
wait_queue_head_t poll_wait;
struct workqueue_struct *close_wq;
};
struct ucma_context {
@ -89,6 +90,13 @@ struct ucma_context {
struct list_head list;
struct list_head mc_list;
/* mark that device is in process of destroying the internal HW
* resources, protected by the global mut
*/
int closing;
/* sync between removal event and id destroy, protected by file mut */
int destroying;
struct work_struct close_work;
};
struct ucma_multicast {
@ -107,6 +115,7 @@ struct ucma_event {
struct list_head list;
struct rdma_cm_id *cm_id;
struct rdma_ucm_event_resp resp;
struct work_struct close_work;
};
static DEFINE_MUTEX(mut);
@ -132,8 +141,12 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
mutex_lock(&mut);
ctx = _ucma_find_context(id, file);
if (!IS_ERR(ctx))
atomic_inc(&ctx->ref);
if (!IS_ERR(ctx)) {
if (ctx->closing)
ctx = ERR_PTR(-EIO);
else
atomic_inc(&ctx->ref);
}
mutex_unlock(&mut);
return ctx;
}
@ -144,6 +157,28 @@ static void ucma_put_ctx(struct ucma_context *ctx)
complete(&ctx->comp);
}
static void ucma_close_event_id(struct work_struct *work)
{
struct ucma_event *uevent_close = container_of(work, struct ucma_event, close_work);
rdma_destroy_id(uevent_close->cm_id);
kfree(uevent_close);
}
static void ucma_close_id(struct work_struct *work)
{
struct ucma_context *ctx = container_of(work, struct ucma_context, close_work);
/* once all inflight tasks are finished, we close all underlying
* resources. The context is still alive till its explicit destryoing
* by its creator.
*/
ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp);
/* No new events will be generated after destroying the id. */
rdma_destroy_id(ctx->cm_id);
}
static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
{
struct ucma_context *ctx;
@ -152,6 +187,7 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
if (!ctx)
return NULL;
INIT_WORK(&ctx->close_work, ucma_close_id);
atomic_set(&ctx->ref, 1);
init_completion(&ctx->comp);
INIT_LIST_HEAD(&ctx->mc_list);
@ -242,6 +278,44 @@ static void ucma_set_event_context(struct ucma_context *ctx,
}
}
/* Called with file->mut locked for the relevant context. */
static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
{
struct ucma_context *ctx = cm_id->context;
struct ucma_event *con_req_eve;
int event_found = 0;
if (ctx->destroying)
return;
/* only if context is pointing to cm_id that it owns it and can be
* queued to be closed, otherwise that cm_id is an inflight one that
* is part of that context event list pending to be detached and
* reattached to its new context as part of ucma_get_event,
* handled separately below.
*/
if (ctx->cm_id == cm_id) {
mutex_lock(&mut);
ctx->closing = 1;
mutex_unlock(&mut);
queue_work(ctx->file->close_wq, &ctx->close_work);
return;
}
list_for_each_entry(con_req_eve, &ctx->file->event_list, list) {
if (con_req_eve->cm_id == cm_id &&
con_req_eve->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
list_del(&con_req_eve->list);
INIT_WORK(&con_req_eve->close_work, ucma_close_event_id);
queue_work(ctx->file->close_wq, &con_req_eve->close_work);
event_found = 1;
break;
}
}
if (!event_found)
printk(KERN_ERR "ucma_removal_event_handler: warning: connect request event wasn't found\n");
}
static int ucma_event_handler(struct rdma_cm_id *cm_id,
struct rdma_cm_event *event)
{
@ -276,14 +350,21 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
* We ignore events for new connections until userspace has set
* their context. This can only happen if an error occurs on a
* new connection before the user accepts it. This is okay,
* since the accept will just fail later.
* since the accept will just fail later. However, we do need
* to release the underlying HW resources in case of a device
* removal event.
*/
if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
ucma_removal_event_handler(cm_id);
kfree(uevent);
goto out;
}
list_add_tail(&uevent->list, &ctx->file->event_list);
wake_up_interruptible(&ctx->file->poll_wait);
if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
ucma_removal_event_handler(cm_id);
out:
mutex_unlock(&ctx->file->mut);
return ret;
@ -442,9 +523,15 @@ static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
}
/*
* We cannot hold file->mut when calling rdma_destroy_id() or we can
* deadlock. We also acquire file->mut in ucma_event_handler(), and
* rdma_destroy_id() will wait until all callbacks have completed.
* ucma_free_ctx is called after the underlying rdma CM-ID is destroyed. At
* this point, no new events will be reported from the hardware. However, we
* still need to cleanup the UCMA context for this ID. Specifically, there
* might be events that have not yet been consumed by the user space software.
* These might include pending connect requests which we have not completed
* processing. We cannot call rdma_destroy_id while holding the lock of the
* context (file->mut), as it might cause a deadlock. We therefore extract all
* relevant events from the context pending events list while holding the
* mutex. After that we release them as needed.
*/
static int ucma_free_ctx(struct ucma_context *ctx)
{
@ -452,8 +539,6 @@ static int ucma_free_ctx(struct ucma_context *ctx)
struct ucma_event *uevent, *tmp;
LIST_HEAD(list);
/* No new events will be generated after destroying the id. */
rdma_destroy_id(ctx->cm_id);
ucma_cleanup_multicast(ctx);
@ -501,10 +586,24 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp);
resp.events_reported = ucma_free_ctx(ctx);
mutex_lock(&ctx->file->mut);
ctx->destroying = 1;
mutex_unlock(&ctx->file->mut);
flush_workqueue(ctx->file->close_wq);
/* At this point it's guaranteed that there is no inflight
* closing task */
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp);
rdma_destroy_id(ctx->cm_id);
} else {
mutex_unlock(&mut);
}
resp.events_reported = ucma_free_ctx(ctx);
if (copy_to_user((void __user *)(unsigned long)cmd.response,
&resp, sizeof(resp)))
ret = -EFAULT;
@ -1321,10 +1420,10 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
mc = ERR_PTR(-ENOENT);
else if (mc->ctx->file != file)
mc = ERR_PTR(-EINVAL);
else {
else if (!atomic_inc_not_zero(&mc->ctx->ref))
mc = ERR_PTR(-ENXIO);
else
idr_remove(&multicast_idr, mc->id);
atomic_inc(&mc->ctx->ref);
}
mutex_unlock(&mut);
if (IS_ERR(mc)) {
@ -1529,6 +1628,7 @@ static int ucma_open(struct inode *inode, struct file *filp)
INIT_LIST_HEAD(&file->ctx_list);
init_waitqueue_head(&file->poll_wait);
mutex_init(&file->mut);
file->close_wq = create_singlethread_workqueue("ucma_close_id");
filp->private_data = file;
file->filp = filp;
@ -1543,16 +1643,34 @@ static int ucma_close(struct inode *inode, struct file *filp)
mutex_lock(&file->mut);
list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
ctx->destroying = 1;
mutex_unlock(&file->mut);
mutex_lock(&mut);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
flush_workqueue(file->close_wq);
/* At that step once ctx was marked as destroying and workqueue
* was flushed we are safe from any inflights handlers that
* might put other closing task.
*/
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
/* rdma_destroy_id ensures that no event handlers are
* inflight for that id before releasing it.
*/
rdma_destroy_id(ctx->cm_id);
} else {
mutex_unlock(&mut);
}
ucma_free_ctx(ctx);
mutex_lock(&file->mut);
}
mutex_unlock(&file->mut);
destroy_workqueue(file->close_wq);
kfree(file);
return 0;
}

View File

@ -133,7 +133,7 @@ static DEFINE_SPINLOCK(port_lock);
static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS);
static void ib_umad_add_one(struct ib_device *device);
static void ib_umad_remove_one(struct ib_device *device);
static void ib_umad_remove_one(struct ib_device *device, void *client_data);
static void ib_umad_release_dev(struct kobject *kobj)
{
@ -1322,9 +1322,9 @@ static void ib_umad_add_one(struct ib_device *device)
kobject_put(&umad_dev->kobj);
}
static void ib_umad_remove_one(struct ib_device *device)
static void ib_umad_remove_one(struct ib_device *device, void *client_data)
{
struct ib_umad_device *umad_dev = ib_get_client_data(device, &umad_client);
struct ib_umad_device *umad_dev = client_data;
int i;
if (!umad_dev)

View File

@ -85,15 +85,20 @@
*/
struct ib_uverbs_device {
struct kref ref;
atomic_t refcount;
int num_comp_vectors;
struct completion comp;
struct device *dev;
struct ib_device *ib_dev;
struct ib_device __rcu *ib_dev;
int devnum;
struct cdev cdev;
struct rb_root xrcd_tree;
struct mutex xrcd_tree_mutex;
struct kobject kobj;
struct srcu_struct disassociate_srcu;
struct mutex lists_mutex; /* protect lists */
struct list_head uverbs_file_list;
struct list_head uverbs_events_file_list;
};
struct ib_uverbs_event_file {
@ -105,6 +110,7 @@ struct ib_uverbs_event_file {
wait_queue_head_t poll_wait;
struct fasync_struct *async_queue;
struct list_head event_list;
struct list_head list;
};
struct ib_uverbs_file {
@ -114,6 +120,8 @@ struct ib_uverbs_file {
struct ib_ucontext *ucontext;
struct ib_event_handler event_handler;
struct ib_uverbs_event_file *async_file;
struct list_head list;
int is_closed;
};
struct ib_uverbs_event {
@ -177,7 +185,9 @@ extern struct idr ib_uverbs_rule_idr;
void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
struct ib_device *ib_dev,
int is_async);
void ib_uverbs_free_async_event_file(struct ib_uverbs_file *uverbs_file);
struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
@ -212,6 +222,7 @@ struct ib_uverbs_flow_spec {
#define IB_UVERBS_DECLARE_CMD(name) \
ssize_t ib_uverbs_##name(struct ib_uverbs_file *file, \
struct ib_device *ib_dev, \
const char __user *buf, int in_len, \
int out_len)
@ -253,6 +264,7 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
#define IB_UVERBS_DECLARE_EX_CMD(name) \
int ib_uverbs_ex_##name(struct ib_uverbs_file *file, \
struct ib_device *ib_dev, \
struct ib_udata *ucore, \
struct ib_udata *uhw)

View File

@ -282,13 +282,13 @@ static void put_xrcd_read(struct ib_uobject *uobj)
}
ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf,
int in_len, int out_len)
{
struct ib_uverbs_get_context cmd;
struct ib_uverbs_get_context_resp resp;
struct ib_udata udata;
struct ib_device *ibdev = file->device->ib_dev;
#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
struct ib_device_attr dev_attr;
#endif
@ -313,13 +313,13 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
ucontext = ibdev->alloc_ucontext(ibdev, &udata);
ucontext = ib_dev->alloc_ucontext(ib_dev, &udata);
if (IS_ERR(ucontext)) {
ret = PTR_ERR(ucontext);
goto err;
}
ucontext->device = ibdev;
ucontext->device = ib_dev;
INIT_LIST_HEAD(&ucontext->pd_list);
INIT_LIST_HEAD(&ucontext->mr_list);
INIT_LIST_HEAD(&ucontext->mw_list);
@ -340,7 +340,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
ucontext->odp_mrs_count = 0;
INIT_LIST_HEAD(&ucontext->no_private_counters);
ret = ib_query_device(ibdev, &dev_attr);
ret = ib_query_device(ib_dev, &dev_attr);
if (ret)
goto err_free;
if (!(dev_attr.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
@ -355,7 +355,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
goto err_free;
resp.async_fd = ret;
filp = ib_uverbs_alloc_event_file(file, 1);
filp = ib_uverbs_alloc_event_file(file, ib_dev, 1);
if (IS_ERR(filp)) {
ret = PTR_ERR(filp);
goto err_fd;
@ -367,16 +367,6 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
goto err_file;
}
file->async_file = filp->private_data;
INIT_IB_EVENT_HANDLER(&file->event_handler, file->device->ib_dev,
ib_uverbs_event_handler);
ret = ib_register_event_handler(&file->event_handler);
if (ret)
goto err_file;
kref_get(&file->async_file->ref);
kref_get(&file->ref);
file->ucontext = ucontext;
fd_install(resp.async_fd, filp);
@ -386,6 +376,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
return in_len;
err_file:
ib_uverbs_free_async_event_file(file);
fput(filp);
err_fd:
@ -393,7 +384,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
err_free:
put_pid(ucontext->tgid);
ibdev->dealloc_ucontext(ucontext);
ib_dev->dealloc_ucontext(ucontext);
err:
mutex_unlock(&file->mutex);
@ -401,11 +392,12 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
}
static void copy_query_dev_fields(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_uverbs_query_device_resp *resp,
struct ib_device_attr *attr)
{
resp->fw_ver = attr->fw_ver;
resp->node_guid = file->device->ib_dev->node_guid;
resp->node_guid = ib_dev->node_guid;
resp->sys_image_guid = attr->sys_image_guid;
resp->max_mr_size = attr->max_mr_size;
resp->page_size_cap = attr->page_size_cap;
@ -443,10 +435,11 @@ static void copy_query_dev_fields(struct ib_uverbs_file *file,
resp->max_srq_sge = attr->max_srq_sge;
resp->max_pkeys = attr->max_pkeys;
resp->local_ca_ack_delay = attr->local_ca_ack_delay;
resp->phys_port_cnt = file->device->ib_dev->phys_port_cnt;
resp->phys_port_cnt = ib_dev->phys_port_cnt;
}
ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf,
int in_len, int out_len)
{
@ -461,12 +454,12 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
ret = ib_query_device(file->device->ib_dev, &attr);
ret = ib_query_device(ib_dev, &attr);
if (ret)
return ret;
memset(&resp, 0, sizeof resp);
copy_query_dev_fields(file, &resp, &attr);
copy_query_dev_fields(file, ib_dev, &resp, &attr);
if (copy_to_user((void __user *) (unsigned long) cmd.response,
&resp, sizeof resp))
@ -476,6 +469,7 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf,
int in_len, int out_len)
{
@ -490,7 +484,7 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
ret = ib_query_port(file->device->ib_dev, cmd.port_num, &attr);
ret = ib_query_port(ib_dev, cmd.port_num, &attr);
if (ret)
return ret;
@ -515,7 +509,7 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
resp.active_width = attr.active_width;
resp.active_speed = attr.active_speed;
resp.phys_state = attr.phys_state;
resp.link_layer = rdma_port_get_link_layer(file->device->ib_dev,
resp.link_layer = rdma_port_get_link_layer(ib_dev,
cmd.port_num);
if (copy_to_user((void __user *) (unsigned long) cmd.response,
@ -526,6 +520,7 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf,
int in_len, int out_len)
{
@ -553,15 +548,15 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
down_write(&uobj->mutex);
pd = file->device->ib_dev->alloc_pd(file->device->ib_dev,
file->ucontext, &udata);
pd = ib_dev->alloc_pd(ib_dev, file->ucontext, &udata);
if (IS_ERR(pd)) {
ret = PTR_ERR(pd);
goto err;
}
pd->device = file->device->ib_dev;
pd->device = ib_dev;
pd->uobject = uobj;
pd->local_mr = NULL;
atomic_set(&pd->usecnt, 0);
uobj->object = pd;
@ -600,11 +595,13 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf,
int in_len, int out_len)
{
struct ib_uverbs_dealloc_pd cmd;
struct ib_uobject *uobj;
struct ib_pd *pd;
int ret;
if (copy_from_user(&cmd, buf, sizeof cmd))
@ -613,15 +610,20 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
uobj = idr_write_uobj(&ib_uverbs_pd_idr, cmd.pd_handle, file->ucontext);
if (!uobj)
return -EINVAL;
pd = uobj->object;
ret = ib_dealloc_pd(uobj->object);
if (!ret)
uobj->live = 0;
put_uobj_write(uobj);
if (atomic_read(&pd->usecnt)) {
ret = -EBUSY;
goto err_put;
}
ret = pd->device->dealloc_pd(uobj->object);
WARN_ONCE(ret, "Infiniband HW driver failed dealloc_pd");
if (ret)
return ret;
goto err_put;
uobj->live = 0;
put_uobj_write(uobj);
idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
@ -632,6 +634,10 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
put_uobj(uobj);
return in_len;
err_put:
put_uobj_write(uobj);
return ret;
}
struct xrcd_table_entry {
@ -720,6 +726,7 @@ static void xrcd_table_delete(struct ib_uverbs_device *dev,
}
ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -778,15 +785,14 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
down_write(&obj->uobject.mutex);
if (!xrcd) {
xrcd = file->device->ib_dev->alloc_xrcd(file->device->ib_dev,
file->ucontext, &udata);
xrcd = ib_dev->alloc_xrcd(ib_dev, file->ucontext, &udata);
if (IS_ERR(xrcd)) {
ret = PTR_ERR(xrcd);
goto err;
}
xrcd->inode = inode;
xrcd->device = file->device->ib_dev;
xrcd->device = ib_dev;
atomic_set(&xrcd->usecnt, 0);
mutex_init(&xrcd->tgt_qp_mutex);
INIT_LIST_HEAD(&xrcd->tgt_qp_list);
@ -857,6 +863,7 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -934,6 +941,7 @@ void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev,
}
ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1043,6 +1051,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1136,6 +1145,7 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1174,8 +1184,9 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
const char __user *buf, int in_len,
int out_len)
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
struct ib_uverbs_alloc_mw cmd;
struct ib_uverbs_alloc_mw_resp resp;
@ -1256,8 +1267,9 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
const char __user *buf, int in_len,
int out_len)
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
struct ib_uverbs_dealloc_mw cmd;
struct ib_mw *mw;
@ -1294,6 +1306,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1313,7 +1326,7 @@ ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file,
return ret;
resp.fd = ret;
filp = ib_uverbs_alloc_event_file(file, 0);
filp = ib_uverbs_alloc_event_file(file, ib_dev, 0);
if (IS_ERR(filp)) {
put_unused_fd(resp.fd);
return PTR_ERR(filp);
@ -1331,6 +1344,7 @@ ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file,
}
static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_udata *ucore,
struct ib_udata *uhw,
struct ib_uverbs_ex_create_cq *cmd,
@ -1379,14 +1393,14 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
if (cmd_sz > offsetof(typeof(*cmd), flags) + sizeof(cmd->flags))
attr.flags = cmd->flags;
cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
cq = ib_dev->create_cq(ib_dev, &attr,
file->ucontext, uhw);
if (IS_ERR(cq)) {
ret = PTR_ERR(cq);
goto err_file;
}
cq->device = file->device->ib_dev;
cq->device = ib_dev;
cq->uobject = &obj->uobject;
cq->comp_handler = ib_uverbs_comp_handler;
cq->event_handler = ib_uverbs_cq_event_handler;
@ -1447,6 +1461,7 @@ static int ib_uverbs_create_cq_cb(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1475,7 +1490,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
cmd_ex.comp_vector = cmd.comp_vector;
cmd_ex.comp_channel = cmd.comp_channel;
obj = create_cq(file, &ucore, &uhw, &cmd_ex,
obj = create_cq(file, ib_dev, &ucore, &uhw, &cmd_ex,
offsetof(typeof(cmd_ex), comp_channel) +
sizeof(cmd.comp_channel), ib_uverbs_create_cq_cb,
NULL);
@ -1498,6 +1513,7 @@ static int ib_uverbs_ex_create_cq_cb(struct ib_uverbs_file *file,
}
int ib_uverbs_ex_create_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_udata *ucore,
struct ib_udata *uhw)
{
@ -1523,7 +1539,7 @@ int ib_uverbs_ex_create_cq(struct ib_uverbs_file *file,
sizeof(resp.response_length)))
return -ENOSPC;
obj = create_cq(file, ucore, uhw, &cmd,
obj = create_cq(file, ib_dev, ucore, uhw, &cmd,
min(ucore->inlen, sizeof(cmd)),
ib_uverbs_ex_create_cq_cb, NULL);
@ -1534,6 +1550,7 @@ int ib_uverbs_ex_create_cq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_resize_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1597,6 +1614,7 @@ static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
}
ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1648,6 +1666,7 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_req_notify_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1670,6 +1689,7 @@ ssize_t ib_uverbs_req_notify_cq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1722,6 +1742,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -1917,6 +1938,7 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len, int out_len)
{
struct ib_uverbs_open_qp cmd;
@ -2011,6 +2033,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2125,6 +2148,7 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
}
ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2221,6 +2245,7 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2279,6 +2304,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2346,6 +2372,12 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
next->send_flags = user_wr->send_flags;
if (is_ud) {
if (next->opcode != IB_WR_SEND &&
next->opcode != IB_WR_SEND_WITH_IMM) {
ret = -EINVAL;
goto out_put;
}
next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah,
file->ucontext);
if (!next->wr.ud.ah) {
@ -2385,9 +2417,11 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
user_wr->wr.atomic.compare_add;
next->wr.atomic.swap = user_wr->wr.atomic.swap;
next->wr.atomic.rkey = user_wr->wr.atomic.rkey;
case IB_WR_SEND:
break;
default:
break;
ret = -EINVAL;
goto out_put;
}
}
@ -2523,6 +2557,7 @@ static struct ib_recv_wr *ib_uverbs_unmarshall_recv(const char __user *buf,
}
ssize_t ib_uverbs_post_recv(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2572,6 +2607,7 @@ ssize_t ib_uverbs_post_recv(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_post_srq_recv(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2621,6 +2657,7 @@ ssize_t ib_uverbs_post_srq_recv(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2713,6 +2750,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len, int out_len)
{
struct ib_uverbs_destroy_ah cmd;
@ -2749,6 +2787,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_attach_mcast(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2796,6 +2835,7 @@ ssize_t ib_uverbs_attach_mcast(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_detach_mcast(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -2876,6 +2916,7 @@ static int kern_spec_to_ib_spec(struct ib_uverbs_flow_spec *kern_spec,
}
int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_udata *ucore,
struct ib_udata *uhw)
{
@ -3036,6 +3077,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
}
int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_udata *ucore,
struct ib_udata *uhw)
{
@ -3078,6 +3120,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
}
static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_uverbs_create_xsrq *cmd,
struct ib_udata *udata)
{
@ -3211,6 +3254,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_create_srq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -3238,7 +3282,7 @@ ssize_t ib_uverbs_create_srq(struct ib_uverbs_file *file,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
ret = __uverbs_create_xsrq(file, &xcmd, &udata);
ret = __uverbs_create_xsrq(file, ib_dev, &xcmd, &udata);
if (ret)
return ret;
@ -3246,6 +3290,7 @@ ssize_t ib_uverbs_create_srq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_create_xsrq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len, int out_len)
{
struct ib_uverbs_create_xsrq cmd;
@ -3263,7 +3308,7 @@ ssize_t ib_uverbs_create_xsrq(struct ib_uverbs_file *file,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
ret = __uverbs_create_xsrq(file, &cmd, &udata);
ret = __uverbs_create_xsrq(file, ib_dev, &cmd, &udata);
if (ret)
return ret;
@ -3271,6 +3316,7 @@ ssize_t ib_uverbs_create_xsrq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_modify_srq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -3301,6 +3347,7 @@ ssize_t ib_uverbs_modify_srq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_query_srq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf,
int in_len, int out_len)
{
@ -3341,6 +3388,7 @@ ssize_t ib_uverbs_query_srq(struct ib_uverbs_file *file,
}
ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len)
{
@ -3398,16 +3446,15 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
}
int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_udata *ucore,
struct ib_udata *uhw)
{
struct ib_uverbs_ex_query_device_resp resp;
struct ib_uverbs_ex_query_device cmd;
struct ib_device_attr attr;
struct ib_device *device;
int err;
device = file->device->ib_dev;
if (ucore->inlen < sizeof(cmd))
return -EINVAL;
@ -3428,11 +3475,11 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
memset(&attr, 0, sizeof(attr));
err = device->query_device(device, &attr, uhw);
err = ib_dev->query_device(ib_dev, &attr, uhw);
if (err)
return err;
copy_query_dev_fields(file, &resp.base, &attr);
copy_query_dev_fields(file, ib_dev, &resp.base, &attr);
resp.comp_mask = 0;
if (ucore->outlen < resp.response_length + sizeof(resp.odp_caps))

View File

@ -79,6 +79,7 @@ static DEFINE_SPINLOCK(map_lock);
static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
int out_len) = {
[IB_USER_VERBS_CMD_GET_CONTEXT] = ib_uverbs_get_context,
@ -119,6 +120,7 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
};
static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
struct ib_udata *ucore,
struct ib_udata *uhw) = {
[IB_USER_VERBS_EX_CMD_CREATE_FLOW] = ib_uverbs_ex_create_flow,
@ -128,16 +130,21 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
};
static void ib_uverbs_add_one(struct ib_device *device);
static void ib_uverbs_remove_one(struct ib_device *device);
static void ib_uverbs_remove_one(struct ib_device *device, void *client_data);
static void ib_uverbs_release_dev(struct kref *ref)
static void ib_uverbs_release_dev(struct kobject *kobj)
{
struct ib_uverbs_device *dev =
container_of(ref, struct ib_uverbs_device, ref);
container_of(kobj, struct ib_uverbs_device, kobj);
complete(&dev->comp);
cleanup_srcu_struct(&dev->disassociate_srcu);
kfree(dev);
}
static struct kobj_type ib_uverbs_dev_ktype = {
.release = ib_uverbs_release_dev,
};
static void ib_uverbs_release_event_file(struct kref *ref)
{
struct ib_uverbs_event_file *file =
@ -201,9 +208,6 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
{
struct ib_uobject *uobj, *tmp;
if (!context)
return 0;
context->closing = 1;
list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
@ -303,13 +307,27 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
return context->device->dealloc_ucontext(context);
}
static void ib_uverbs_comp_dev(struct ib_uverbs_device *dev)
{
complete(&dev->comp);
}
static void ib_uverbs_release_file(struct kref *ref)
{
struct ib_uverbs_file *file =
container_of(ref, struct ib_uverbs_file, ref);
struct ib_device *ib_dev;
int srcu_key;
module_put(file->device->ib_dev->owner);
kref_put(&file->device->ref, ib_uverbs_release_dev);
srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
ib_dev = srcu_dereference(file->device->ib_dev,
&file->device->disassociate_srcu);
if (ib_dev && !ib_dev->disassociate_ucontext)
module_put(ib_dev->owner);
srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
if (atomic_dec_and_test(&file->device->refcount))
ib_uverbs_comp_dev(file->device);
kfree(file);
}
@ -331,9 +349,19 @@ static ssize_t ib_uverbs_event_read(struct file *filp, char __user *buf,
return -EAGAIN;
if (wait_event_interruptible(file->poll_wait,
!list_empty(&file->event_list)))
(!list_empty(&file->event_list) ||
/* The barriers built into wait_event_interruptible()
* and wake_up() guarentee this will see the null set
* without using RCU
*/
!file->uverbs_file->device->ib_dev)))
return -ERESTARTSYS;
/* If device was disassociated and no event exists set an error */
if (list_empty(&file->event_list) &&
!file->uverbs_file->device->ib_dev)
return -EIO;
spin_lock_irq(&file->lock);
}
@ -396,8 +424,11 @@ static int ib_uverbs_event_close(struct inode *inode, struct file *filp)
{
struct ib_uverbs_event_file *file = filp->private_data;
struct ib_uverbs_event *entry, *tmp;
int closed_already = 0;
mutex_lock(&file->uverbs_file->device->lists_mutex);
spin_lock_irq(&file->lock);
closed_already = file->is_closed;
file->is_closed = 1;
list_for_each_entry_safe(entry, tmp, &file->event_list, list) {
if (entry->counter)
@ -405,11 +436,15 @@ static int ib_uverbs_event_close(struct inode *inode, struct file *filp)
kfree(entry);
}
spin_unlock_irq(&file->lock);
if (file->is_async) {
ib_unregister_event_handler(&file->uverbs_file->event_handler);
kref_put(&file->uverbs_file->ref, ib_uverbs_release_file);
if (!closed_already) {
list_del(&file->list);
if (file->is_async)
ib_unregister_event_handler(&file->uverbs_file->
event_handler);
}
mutex_unlock(&file->uverbs_file->device->lists_mutex);
kref_put(&file->uverbs_file->ref, ib_uverbs_release_file);
kref_put(&file->ref, ib_uverbs_release_event_file);
return 0;
@ -541,13 +576,21 @@ void ib_uverbs_event_handler(struct ib_event_handler *handler,
NULL, NULL);
}
void ib_uverbs_free_async_event_file(struct ib_uverbs_file *file)
{
kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
file->async_file = NULL;
}
struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
struct ib_device *ib_dev,
int is_async)
{
struct ib_uverbs_event_file *ev_file;
struct file *filp;
int ret;
ev_file = kmalloc(sizeof *ev_file, GFP_KERNEL);
ev_file = kzalloc(sizeof(*ev_file), GFP_KERNEL);
if (!ev_file)
return ERR_PTR(-ENOMEM);
@ -556,15 +599,46 @@ struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
INIT_LIST_HEAD(&ev_file->event_list);
init_waitqueue_head(&ev_file->poll_wait);
ev_file->uverbs_file = uverbs_file;
kref_get(&ev_file->uverbs_file->ref);
ev_file->async_queue = NULL;
ev_file->is_async = is_async;
ev_file->is_closed = 0;
filp = anon_inode_getfile("[infinibandevent]", &uverbs_event_fops,
ev_file, O_RDONLY);
if (IS_ERR(filp))
kfree(ev_file);
goto err_put_refs;
mutex_lock(&uverbs_file->device->lists_mutex);
list_add_tail(&ev_file->list,
&uverbs_file->device->uverbs_events_file_list);
mutex_unlock(&uverbs_file->device->lists_mutex);
if (is_async) {
WARN_ON(uverbs_file->async_file);
uverbs_file->async_file = ev_file;
kref_get(&uverbs_file->async_file->ref);
INIT_IB_EVENT_HANDLER(&uverbs_file->event_handler,
ib_dev,
ib_uverbs_event_handler);
ret = ib_register_event_handler(&uverbs_file->event_handler);
if (ret)
goto err_put_file;
/* At that point async file stuff was fully set */
ev_file->is_async = 1;
}
return filp;
err_put_file:
fput(filp);
kref_put(&uverbs_file->async_file->ref, ib_uverbs_release_event_file);
uverbs_file->async_file = NULL;
return ERR_PTR(ret);
err_put_refs:
kref_put(&ev_file->uverbs_file->ref, ib_uverbs_release_file);
kref_put(&ev_file->ref, ib_uverbs_release_event_file);
return filp;
}
@ -601,8 +675,11 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
size_t count, loff_t *pos)
{
struct ib_uverbs_file *file = filp->private_data;
struct ib_device *ib_dev;
struct ib_uverbs_cmd_hdr hdr;
__u32 flags;
int srcu_key;
ssize_t ret;
if (count < sizeof hdr)
return -EINVAL;
@ -610,6 +687,14 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
if (copy_from_user(&hdr, buf, sizeof hdr))
return -EFAULT;
srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
ib_dev = srcu_dereference(file->device->ib_dev,
&file->device->disassociate_srcu);
if (!ib_dev) {
ret = -EIO;
goto out;
}
flags = (hdr.command &
IB_USER_VERBS_CMD_FLAGS_MASK) >> IB_USER_VERBS_CMD_FLAGS_SHIFT;
@ -617,26 +702,36 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
__u32 command;
if (hdr.command & ~(__u32)(IB_USER_VERBS_CMD_FLAGS_MASK |
IB_USER_VERBS_CMD_COMMAND_MASK))
return -EINVAL;
IB_USER_VERBS_CMD_COMMAND_MASK)) {
ret = -EINVAL;
goto out;
}
command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
if (command >= ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[command])
return -EINVAL;
!uverbs_cmd_table[command]) {
ret = -EINVAL;
goto out;
}
if (!file->ucontext &&
command != IB_USER_VERBS_CMD_GET_CONTEXT)
return -EINVAL;
command != IB_USER_VERBS_CMD_GET_CONTEXT) {
ret = -EINVAL;
goto out;
}
if (!(file->device->ib_dev->uverbs_cmd_mask & (1ull << command)))
return -ENOSYS;
if (!(ib_dev->uverbs_cmd_mask & (1ull << command))) {
ret = -ENOSYS;
goto out;
}
if (hdr.in_words * 4 != count)
return -EINVAL;
if (hdr.in_words * 4 != count) {
ret = -EINVAL;
goto out;
}
return uverbs_cmd_table[command](file,
ret = uverbs_cmd_table[command](file, ib_dev,
buf + sizeof(hdr),
hdr.in_words * 4,
hdr.out_words * 4);
@ -647,51 +742,72 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
struct ib_uverbs_ex_cmd_hdr ex_hdr;
struct ib_udata ucore;
struct ib_udata uhw;
int err;
size_t written_count = count;
if (hdr.command & ~(__u32)(IB_USER_VERBS_CMD_FLAGS_MASK |
IB_USER_VERBS_CMD_COMMAND_MASK))
return -EINVAL;
IB_USER_VERBS_CMD_COMMAND_MASK)) {
ret = -EINVAL;
goto out;
}
command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
if (command >= ARRAY_SIZE(uverbs_ex_cmd_table) ||
!uverbs_ex_cmd_table[command])
return -ENOSYS;
!uverbs_ex_cmd_table[command]) {
ret = -ENOSYS;
goto out;
}
if (!file->ucontext)
return -EINVAL;
if (!file->ucontext) {
ret = -EINVAL;
goto out;
}
if (!(file->device->ib_dev->uverbs_ex_cmd_mask & (1ull << command)))
return -ENOSYS;
if (!(ib_dev->uverbs_ex_cmd_mask & (1ull << command))) {
ret = -ENOSYS;
goto out;
}
if (count < (sizeof(hdr) + sizeof(ex_hdr)))
return -EINVAL;
if (count < (sizeof(hdr) + sizeof(ex_hdr))) {
ret = -EINVAL;
goto out;
}
if (copy_from_user(&ex_hdr, buf + sizeof(hdr), sizeof(ex_hdr)))
return -EFAULT;
if (copy_from_user(&ex_hdr, buf + sizeof(hdr), sizeof(ex_hdr))) {
ret = -EFAULT;
goto out;
}
count -= sizeof(hdr) + sizeof(ex_hdr);
buf += sizeof(hdr) + sizeof(ex_hdr);
if ((hdr.in_words + ex_hdr.provider_in_words) * 8 != count)
return -EINVAL;
if ((hdr.in_words + ex_hdr.provider_in_words) * 8 != count) {
ret = -EINVAL;
goto out;
}
if (ex_hdr.cmd_hdr_reserved)
return -EINVAL;
if (ex_hdr.cmd_hdr_reserved) {
ret = -EINVAL;
goto out;
}
if (ex_hdr.response) {
if (!hdr.out_words && !ex_hdr.provider_out_words)
return -EINVAL;
if (!hdr.out_words && !ex_hdr.provider_out_words) {
ret = -EINVAL;
goto out;
}
if (!access_ok(VERIFY_WRITE,
(void __user *) (unsigned long) ex_hdr.response,
(hdr.out_words + ex_hdr.provider_out_words) * 8))
return -EFAULT;
(hdr.out_words + ex_hdr.provider_out_words) * 8)) {
ret = -EFAULT;
goto out;
}
} else {
if (hdr.out_words || ex_hdr.provider_out_words)
return -EINVAL;
if (hdr.out_words || ex_hdr.provider_out_words) {
ret = -EINVAL;
goto out;
}
}
INIT_UDATA_BUF_OR_NULL(&ucore, buf, (unsigned long) ex_hdr.response,
@ -703,27 +819,43 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
ex_hdr.provider_in_words * 8,
ex_hdr.provider_out_words * 8);
err = uverbs_ex_cmd_table[command](file,
ret = uverbs_ex_cmd_table[command](file,
ib_dev,
&ucore,
&uhw);
if (err)
return err;
return written_count;
if (!ret)
ret = written_count;
} else {
ret = -ENOSYS;
}
return -ENOSYS;
out:
srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
return ret;
}
static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct ib_uverbs_file *file = filp->private_data;
struct ib_device *ib_dev;
int ret = 0;
int srcu_key;
srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
ib_dev = srcu_dereference(file->device->ib_dev,
&file->device->disassociate_srcu);
if (!ib_dev) {
ret = -EIO;
goto out;
}
if (!file->ucontext)
return -ENODEV;
ret = -ENODEV;
else
return file->device->ib_dev->mmap(file->ucontext, vma);
ret = ib_dev->mmap(file->ucontext, vma);
out:
srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
return ret;
}
/*
@ -740,23 +872,43 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
{
struct ib_uverbs_device *dev;
struct ib_uverbs_file *file;
struct ib_device *ib_dev;
int ret;
int module_dependent;
int srcu_key;
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev);
if (dev)
kref_get(&dev->ref);
else
if (!atomic_inc_not_zero(&dev->refcount))
return -ENXIO;
if (!try_module_get(dev->ib_dev->owner)) {
ret = -ENODEV;
srcu_key = srcu_read_lock(&dev->disassociate_srcu);
mutex_lock(&dev->lists_mutex);
ib_dev = srcu_dereference(dev->ib_dev,
&dev->disassociate_srcu);
if (!ib_dev) {
ret = -EIO;
goto err;
}
file = kmalloc(sizeof *file, GFP_KERNEL);
/* In case IB device supports disassociate ucontext, there is no hard
* dependency between uverbs device and its low level device.
*/
module_dependent = !(ib_dev->disassociate_ucontext);
if (module_dependent) {
if (!try_module_get(ib_dev->owner)) {
ret = -ENODEV;
goto err;
}
}
file = kzalloc(sizeof(*file), GFP_KERNEL);
if (!file) {
ret = -ENOMEM;
goto err_module;
if (module_dependent)
goto err_module;
goto err;
}
file->device = dev;
@ -766,27 +918,47 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
mutex_init(&file->mutex);
filp->private_data = file;
kobject_get(&dev->kobj);
list_add_tail(&file->list, &dev->uverbs_file_list);
mutex_unlock(&dev->lists_mutex);
srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
return nonseekable_open(inode, filp);
err_module:
module_put(dev->ib_dev->owner);
module_put(ib_dev->owner);
err:
kref_put(&dev->ref, ib_uverbs_release_dev);
mutex_unlock(&dev->lists_mutex);
srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
if (atomic_dec_and_test(&dev->refcount))
ib_uverbs_comp_dev(dev);
return ret;
}
static int ib_uverbs_close(struct inode *inode, struct file *filp)
{
struct ib_uverbs_file *file = filp->private_data;
struct ib_uverbs_device *dev = file->device;
struct ib_ucontext *ucontext = NULL;
ib_uverbs_cleanup_ucontext(file, file->ucontext);
mutex_lock(&file->device->lists_mutex);
ucontext = file->ucontext;
file->ucontext = NULL;
if (!file->is_closed) {
list_del(&file->list);
file->is_closed = 1;
}
mutex_unlock(&file->device->lists_mutex);
if (ucontext)
ib_uverbs_cleanup_ucontext(file, ucontext);
if (file->async_file)
kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
kref_put(&file->ref, ib_uverbs_release_file);
kobject_put(&dev->kobj);
return 0;
}
@ -817,12 +989,21 @@ static struct ib_client uverbs_client = {
static ssize_t show_ibdev(struct device *device, struct device_attribute *attr,
char *buf)
{
int ret = -ENODEV;
int srcu_key;
struct ib_uverbs_device *dev = dev_get_drvdata(device);
struct ib_device *ib_dev;
if (!dev)
return -ENODEV;
return sprintf(buf, "%s\n", dev->ib_dev->name);
srcu_key = srcu_read_lock(&dev->disassociate_srcu);
ib_dev = srcu_dereference(dev->ib_dev, &dev->disassociate_srcu);
if (ib_dev)
ret = sprintf(buf, "%s\n", ib_dev->name);
srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
return ret;
}
static DEVICE_ATTR(ibdev, S_IRUGO, show_ibdev, NULL);
@ -830,11 +1011,19 @@ static ssize_t show_dev_abi_version(struct device *device,
struct device_attribute *attr, char *buf)
{
struct ib_uverbs_device *dev = dev_get_drvdata(device);
int ret = -ENODEV;
int srcu_key;
struct ib_device *ib_dev;
if (!dev)
return -ENODEV;
srcu_key = srcu_read_lock(&dev->disassociate_srcu);
ib_dev = srcu_dereference(dev->ib_dev, &dev->disassociate_srcu);
if (ib_dev)
ret = sprintf(buf, "%d\n", ib_dev->uverbs_abi_ver);
srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
return sprintf(buf, "%d\n", dev->ib_dev->uverbs_abi_ver);
return ret;
}
static DEVICE_ATTR(abi_version, S_IRUGO, show_dev_abi_version, NULL);
@ -874,6 +1063,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
int devnum;
dev_t base;
struct ib_uverbs_device *uverbs_dev;
int ret;
if (!device->alloc_ucontext)
return;
@ -882,10 +1072,20 @@ static void ib_uverbs_add_one(struct ib_device *device)
if (!uverbs_dev)
return;
kref_init(&uverbs_dev->ref);
ret = init_srcu_struct(&uverbs_dev->disassociate_srcu);
if (ret) {
kfree(uverbs_dev);
return;
}
atomic_set(&uverbs_dev->refcount, 1);
init_completion(&uverbs_dev->comp);
uverbs_dev->xrcd_tree = RB_ROOT;
mutex_init(&uverbs_dev->xrcd_tree_mutex);
kobject_init(&uverbs_dev->kobj, &ib_uverbs_dev_ktype);
mutex_init(&uverbs_dev->lists_mutex);
INIT_LIST_HEAD(&uverbs_dev->uverbs_file_list);
INIT_LIST_HEAD(&uverbs_dev->uverbs_events_file_list);
spin_lock(&map_lock);
devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES);
@ -906,12 +1106,13 @@ static void ib_uverbs_add_one(struct ib_device *device)
}
spin_unlock(&map_lock);
uverbs_dev->ib_dev = device;
rcu_assign_pointer(uverbs_dev->ib_dev, device);
uverbs_dev->num_comp_vectors = device->num_comp_vectors;
cdev_init(&uverbs_dev->cdev, NULL);
uverbs_dev->cdev.owner = THIS_MODULE;
uverbs_dev->cdev.ops = device->mmap ? &uverbs_mmap_fops : &uverbs_fops;
uverbs_dev->cdev.kobj.parent = &uverbs_dev->kobj;
kobject_set_name(&uverbs_dev->cdev.kobj, "uverbs%d", uverbs_dev->devnum);
if (cdev_add(&uverbs_dev->cdev, base, 1))
goto err_cdev;
@ -942,15 +1143,79 @@ static void ib_uverbs_add_one(struct ib_device *device)
clear_bit(devnum, overflow_map);
err:
kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
if (atomic_dec_and_test(&uverbs_dev->refcount))
ib_uverbs_comp_dev(uverbs_dev);
wait_for_completion(&uverbs_dev->comp);
kfree(uverbs_dev);
kobject_put(&uverbs_dev->kobj);
return;
}
static void ib_uverbs_remove_one(struct ib_device *device)
static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
struct ib_device *ib_dev)
{
struct ib_uverbs_device *uverbs_dev = ib_get_client_data(device, &uverbs_client);
struct ib_uverbs_file *file;
struct ib_uverbs_event_file *event_file;
struct ib_event event;
/* Pending running commands to terminate */
synchronize_srcu(&uverbs_dev->disassociate_srcu);
event.event = IB_EVENT_DEVICE_FATAL;
event.element.port_num = 0;
event.device = ib_dev;
mutex_lock(&uverbs_dev->lists_mutex);
while (!list_empty(&uverbs_dev->uverbs_file_list)) {
struct ib_ucontext *ucontext;
file = list_first_entry(&uverbs_dev->uverbs_file_list,
struct ib_uverbs_file, list);
file->is_closed = 1;
ucontext = file->ucontext;
list_del(&file->list);
file->ucontext = NULL;
kref_get(&file->ref);
mutex_unlock(&uverbs_dev->lists_mutex);
/* We must release the mutex before going ahead and calling
* disassociate_ucontext. disassociate_ucontext might end up
* indirectly calling uverbs_close, for example due to freeing
* the resources (e.g mmput).
*/
ib_uverbs_event_handler(&file->event_handler, &event);
if (ucontext) {
ib_dev->disassociate_ucontext(ucontext);
ib_uverbs_cleanup_ucontext(file, ucontext);
}
mutex_lock(&uverbs_dev->lists_mutex);
kref_put(&file->ref, ib_uverbs_release_file);
}
while (!list_empty(&uverbs_dev->uverbs_events_file_list)) {
event_file = list_first_entry(&uverbs_dev->
uverbs_events_file_list,
struct ib_uverbs_event_file,
list);
spin_lock_irq(&event_file->lock);
event_file->is_closed = 1;
spin_unlock_irq(&event_file->lock);
list_del(&event_file->list);
if (event_file->is_async) {
ib_unregister_event_handler(&event_file->uverbs_file->
event_handler);
event_file->uverbs_file->event_handler.device = NULL;
}
wake_up_interruptible(&event_file->poll_wait);
kill_fasync(&event_file->async_queue, SIGIO, POLL_IN);
}
mutex_unlock(&uverbs_dev->lists_mutex);
}
static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
{
struct ib_uverbs_device *uverbs_dev = client_data;
int wait_clients = 1;
if (!uverbs_dev)
return;
@ -964,9 +1229,28 @@ static void ib_uverbs_remove_one(struct ib_device *device)
else
clear_bit(uverbs_dev->devnum - IB_UVERBS_MAX_DEVICES, overflow_map);
kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
wait_for_completion(&uverbs_dev->comp);
kfree(uverbs_dev);
if (device->disassociate_ucontext) {
/* We disassociate HW resources and immediately return.
* Userspace will see a EIO errno for all future access.
* Upon returning, ib_device may be freed internally and is not
* valid any more.
* uverbs_device is still available until all clients close
* their files, then the uverbs device ref count will be zero
* and its resources will be freed.
* Note: At this point no more files can be opened since the
* cdev was deleted, however active clients can still issue
* commands and close their open files.
*/
rcu_assign_pointer(uverbs_dev->ib_dev, NULL);
ib_uverbs_free_hw_resources(uverbs_dev, device);
wait_clients = 0;
}
if (atomic_dec_and_test(&uverbs_dev->refcount))
ib_uverbs_comp_dev(uverbs_dev);
if (wait_clients)
wait_for_completion(&uverbs_dev->comp);
kobject_put(&uverbs_dev->kobj);
}
static char *uverbs_devnode(struct device *dev, umode_t *mode)

View File

@ -213,28 +213,79 @@ EXPORT_SYMBOL(rdma_port_get_link_layer);
/* Protection domains */
/**
* ib_alloc_pd - Allocates an unused protection domain.
* @device: The device on which to allocate the protection domain.
*
* A protection domain object provides an association between QPs, shared
* receive queues, address handles, memory regions, and memory windows.
*
* Every PD has a local_dma_lkey which can be used as the lkey value for local
* memory operations.
*/
struct ib_pd *ib_alloc_pd(struct ib_device *device)
{
struct ib_pd *pd;
struct ib_device_attr devattr;
int rc;
rc = ib_query_device(device, &devattr);
if (rc)
return ERR_PTR(rc);
pd = device->alloc_pd(device, NULL, NULL);
if (IS_ERR(pd))
return pd;
if (!IS_ERR(pd)) {
pd->device = device;
pd->uobject = NULL;
atomic_set(&pd->usecnt, 0);
pd->device = device;
pd->uobject = NULL;
pd->local_mr = NULL;
atomic_set(&pd->usecnt, 0);
if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
pd->local_dma_lkey = device->local_dma_lkey;
else {
struct ib_mr *mr;
mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(mr)) {
ib_dealloc_pd(pd);
return (struct ib_pd *)mr;
}
pd->local_mr = mr;
pd->local_dma_lkey = pd->local_mr->lkey;
}
return pd;
}
EXPORT_SYMBOL(ib_alloc_pd);
int ib_dealloc_pd(struct ib_pd *pd)
/**
* ib_dealloc_pd - Deallocates a protection domain.
* @pd: The protection domain to deallocate.
*
* It is an error to call this function while any resources in the pd still
* exist. The caller is responsible to synchronously destroy them and
* guarantee no new allocations will happen.
*/
void ib_dealloc_pd(struct ib_pd *pd)
{
if (atomic_read(&pd->usecnt))
return -EBUSY;
int ret;
return pd->device->dealloc_pd(pd);
if (pd->local_mr) {
ret = ib_dereg_mr(pd->local_mr);
WARN_ON(ret);
pd->local_mr = NULL;
}
/* uverbs manipulates usecnt with proper locking, while the kabi
requires the caller to guarantee we can't race here. */
WARN_ON(atomic_read(&pd->usecnt));
/* Making delalloc_pd a void return is a WIP, no driver should return
an error here. */
ret = pd->device->dealloc_pd(pd);
WARN_ONCE(ret, "Infiniband HW driver failed dealloc_pd");
}
EXPORT_SYMBOL(ib_dealloc_pd);
@ -1168,16 +1219,28 @@ int ib_dereg_mr(struct ib_mr *mr)
}
EXPORT_SYMBOL(ib_dereg_mr);
struct ib_mr *ib_create_mr(struct ib_pd *pd,
struct ib_mr_init_attr *mr_init_attr)
/**
* ib_alloc_mr() - Allocates a memory region
* @pd: protection domain associated with the region
* @mr_type: memory region type
* @max_num_sg: maximum sg entries available for registration.
*
* Notes:
* Memory registeration page/sg lists must not exceed max_num_sg.
* For mr_type IB_MR_TYPE_MEM_REG, the total length cannot exceed
* max_num_sg * used_page_size.
*
*/
struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct ib_mr *mr;
if (!pd->device->create_mr)
if (!pd->device->alloc_mr)
return ERR_PTR(-ENOSYS);
mr = pd->device->create_mr(pd, mr_init_attr);
mr = pd->device->alloc_mr(pd, mr_type, max_num_sg);
if (!IS_ERR(mr)) {
mr->device = pd->device;
mr->pd = pd;
@ -1188,45 +1251,7 @@ struct ib_mr *ib_create_mr(struct ib_pd *pd,
return mr;
}
EXPORT_SYMBOL(ib_create_mr);
int ib_destroy_mr(struct ib_mr *mr)
{
struct ib_pd *pd;
int ret;
if (atomic_read(&mr->usecnt))
return -EBUSY;
pd = mr->pd;
ret = mr->device->destroy_mr(mr);
if (!ret)
atomic_dec(&pd->usecnt);
return ret;
}
EXPORT_SYMBOL(ib_destroy_mr);
struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
{
struct ib_mr *mr;
if (!pd->device->alloc_fast_reg_mr)
return ERR_PTR(-ENOSYS);
mr = pd->device->alloc_fast_reg_mr(pd, max_page_list_len);
if (!IS_ERR(mr)) {
mr->device = pd->device;
mr->pd = pd;
mr->uobject = NULL;
atomic_inc(&pd->usecnt);
atomic_set(&mr->usecnt, 0);
}
return mr;
}
EXPORT_SYMBOL(ib_alloc_fast_reg_mr);
EXPORT_SYMBOL(ib_alloc_mr);
struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(struct ib_device *device,
int max_page_list_len)

View File

@ -1,8 +1,6 @@
obj-$(CONFIG_INFINIBAND_MTHCA) += mthca/
obj-$(CONFIG_INFINIBAND_IPATH) += ipath/
obj-$(CONFIG_INFINIBAND_QIB) += qib/
obj-$(CONFIG_INFINIBAND_EHCA) += ehca/
obj-$(CONFIG_INFINIBAND_AMSO1100) += amso1100/
obj-$(CONFIG_INFINIBAND_CXGB3) += cxgb3/
obj-$(CONFIG_INFINIBAND_CXGB4) += cxgb4/
obj-$(CONFIG_MLX4_INFINIBAND) += mlx4/

View File

@ -800,7 +800,9 @@ static int iwch_dealloc_mw(struct ib_mw *mw)
return 0;
}
static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct iwch_dev *rhp;
struct iwch_pd *php;
@ -809,6 +811,10 @@ static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
u32 stag = 0;
int ret = 0;
if (mr_type != IB_MR_TYPE_MEM_REG ||
max_num_sg > T3_MAX_FASTREG_DEPTH)
return ERR_PTR(-EINVAL);
php = to_iwch_pd(pd);
rhp = php->rhp;
mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
@ -816,10 +822,10 @@ static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
goto err;
mhp->rhp = rhp;
ret = iwch_alloc_pbl(mhp, pbl_depth);
ret = iwch_alloc_pbl(mhp, max_num_sg);
if (ret)
goto err1;
mhp->attr.pbl_size = pbl_depth;
mhp->attr.pbl_size = max_num_sg;
ret = cxio_allocate_stag(&rhp->rdev, &stag, php->pdid,
mhp->attr.pbl_size, mhp->attr.pbl_addr);
if (ret)
@ -1443,7 +1449,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.alloc_mw = iwch_alloc_mw;
dev->ibdev.bind_mw = iwch_bind_mw;
dev->ibdev.dealloc_mw = iwch_dealloc_mw;
dev->ibdev.alloc_fast_reg_mr = iwch_alloc_fast_reg_mr;
dev->ibdev.alloc_mr = iwch_alloc_mr;
dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
dev->ibdev.attach_mcast = iwch_multicast_attach;

View File

@ -50,6 +50,7 @@
#include <rdma/ib_addr.h>
#include "iw_cxgb4.h"
#include "clip_tbl.h"
static char *states[] = {
"idle",
@ -115,11 +116,11 @@ module_param(ep_timeout_secs, int, 0644);
MODULE_PARM_DESC(ep_timeout_secs, "CM Endpoint operation timeout "
"in seconds (default=60)");
static int mpa_rev = 1;
static int mpa_rev = 2;
module_param(mpa_rev, int, 0644);
MODULE_PARM_DESC(mpa_rev, "MPA Revision, 0 supports amso1100, "
"1 is RFC0544 spec compliant, 2 is IETF MPA Peer Connect Draft"
" compliant (default=1)");
" compliant (default=2)");
static int markers_enabled;
module_param(markers_enabled, int, 0644);
@ -298,6 +299,16 @@ void _c4iw_free_ep(struct kref *kref)
if (test_bit(QP_REFERENCED, &ep->com.flags))
deref_qp(ep);
if (test_bit(RELEASE_RESOURCES, &ep->com.flags)) {
if (ep->com.remote_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 =
(struct sockaddr_in6 *)
&ep->com.mapped_local_addr;
cxgb4_clip_release(
ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr,
1);
}
remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid);
cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid);
dst_release(ep->dst);
@ -442,6 +453,12 @@ static void act_open_req_arp_failure(void *handle, struct sk_buff *skb)
kfree_skb(skb);
connect_reply_upcall(ep, -EHOSTUNREACH);
state_set(&ep->com, DEAD);
if (ep->com.remote_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 =
(struct sockaddr_in6 *)&ep->com.mapped_local_addr;
cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr, 1);
}
remove_handle(ep->com.dev, &ep->com.dev->atid_idr, ep->atid);
cxgb4_free_atid(ep->com.dev->rdev.lldi.tids, ep->atid);
dst_release(ep->dst);
@ -640,6 +657,7 @@ static int send_connect(struct c4iw_ep *ep)
struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)
&ep->com.mapped_remote_addr;
int win;
int ret;
wrlen = (ep->com.remote_addr.ss_family == AF_INET) ?
roundup(sizev4, 16) :
@ -693,6 +711,11 @@ static int send_connect(struct c4iw_ep *ep)
opt2 |= CONG_CNTRL_V(CONG_ALG_TAHOE);
opt2 |= T5_ISS_F;
}
if (ep->com.remote_addr.ss_family == AF_INET6)
cxgb4_clip_get(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&la6->sin6_addr.s6_addr, 1);
t4_set_arp_err_handler(skb, ep, act_open_req_arp_failure);
if (is_t4(ep->com.dev->rdev.lldi.adapter_type)) {
@ -790,7 +813,11 @@ static int send_connect(struct c4iw_ep *ep)
}
set_bit(ACT_OPEN_REQ, &ep->com.history);
return c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
ret = c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
if (ret && ep->com.remote_addr.ss_family == AF_INET6)
cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&la6->sin6_addr.s6_addr, 1);
return ret;
}
static void send_mpa_req(struct c4iw_ep *ep, struct sk_buff *skb,
@ -2091,6 +2118,15 @@ static int act_open_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
case CPL_ERR_CONN_EXIST:
if (ep->retry_count++ < ACT_OPEN_RETRY_COUNT) {
set_bit(ACT_RETRY_INUSE, &ep->com.history);
if (ep->com.remote_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 =
(struct sockaddr_in6 *)
&ep->com.mapped_local_addr;
cxgb4_clip_release(
ep->com.dev->rdev.lldi.ports[0],
(const u32 *)
&sin6->sin6_addr.s6_addr, 1);
}
remove_handle(ep->com.dev, &ep->com.dev->atid_idr,
atid);
cxgb4_free_atid(t, atid);
@ -2118,6 +2154,12 @@ static int act_open_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
connect_reply_upcall(ep, status2errno(status));
state_set(&ep->com, DEAD);
if (ep->com.remote_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 =
(struct sockaddr_in6 *)&ep->com.mapped_local_addr;
cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr, 1);
}
if (status && act_open_has_tid(status))
cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, GET_TID(rpl));
@ -2302,6 +2344,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
struct dst_entry *dst;
__u8 local_ip[16], peer_ip[16];
__be16 local_port, peer_port;
struct sockaddr_in6 *sin6;
int err;
u16 peer_mss = ntohs(req->tcpopt.mss);
int iptype;
@ -2400,9 +2443,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
sin->sin_port = peer_port;
sin->sin_addr.s_addr = *(__be32 *)peer_ip;
} else {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)
&child_ep->com.mapped_local_addr;
sin6 = (struct sockaddr_in6 *)&child_ep->com.mapped_local_addr;
sin6->sin6_family = PF_INET6;
sin6->sin6_port = local_port;
memcpy(sin6->sin6_addr.s6_addr, local_ip, 16);
@ -2436,6 +2477,11 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
insert_handle(dev, &dev->hwtid_idr, child_ep, child_ep->hwtid);
accept_cr(child_ep, skb, req);
set_bit(PASS_ACCEPT_REQ, &child_ep->com.history);
if (iptype == 6) {
sin6 = (struct sockaddr_in6 *)&child_ep->com.mapped_local_addr;
cxgb4_clip_get(child_ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr, 1);
}
goto out;
reject:
reject_cr(dev, hwtid, skb);
@ -2672,6 +2718,15 @@ static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
if (release)
release_ep_resources(ep);
else if (ep->retry_with_mpa_v1) {
if (ep->com.remote_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 =
(struct sockaddr_in6 *)
&ep->com.mapped_local_addr;
cxgb4_clip_release(
ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr,
1);
}
remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid);
cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid);
dst_release(ep->dst);
@ -2976,7 +3031,7 @@ static int pick_local_ip6addrs(struct c4iw_dev *dev, struct iw_cm_id *cm_id)
struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)&cm_id->local_addr;
struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)&cm_id->remote_addr;
if (get_lladdr(dev->rdev.lldi.ports[0], &addr, IFA_F_TENTATIVE)) {
if (!get_lladdr(dev->rdev.lldi.ports[0], &addr, IFA_F_TENTATIVE)) {
memcpy(la6->sin6_addr.s6_addr, &addr, 16);
memcpy(ra6->sin6_addr.s6_addr, &addr, 16);
return 0;
@ -3186,6 +3241,9 @@ static int create_server6(struct c4iw_dev *dev, struct c4iw_listen_ep *ep)
pr_err("cxgb4_create_server6/filter failed err %d stid %d laddr %pI6 lport %d\n",
err, ep->stid,
sin6->sin6_addr.s6_addr, ntohs(sin6->sin6_port));
else
cxgb4_clip_get(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr, 1);
return err;
}
@ -3334,6 +3392,7 @@ int c4iw_destroy_listen(struct iw_cm_id *cm_id)
ep->com.dev->rdev.lldi.ports[0], ep->stid,
ep->com.dev->rdev.lldi.rxq_ids[0], 0);
} else {
struct sockaddr_in6 *sin6;
c4iw_init_wr_wait(&ep->com.wr_wait);
err = cxgb4_remove_server(
ep->com.dev->rdev.lldi.ports[0], ep->stid,
@ -3342,6 +3401,9 @@ int c4iw_destroy_listen(struct iw_cm_id *cm_id)
goto done;
err = c4iw_wait_for_reply(&ep->com.dev->rdev, &ep->com.wr_wait,
0, 0, __func__);
sin6 = (struct sockaddr_in6 *)&ep->com.mapped_local_addr;
cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr, 1);
}
remove_handle(ep->com.dev, &ep->com.dev->stid_idr, ep->stid);
cxgb4_free_stid(ep->com.dev->rdev.lldi.tids, ep->stid,
@ -3461,6 +3523,12 @@ static void active_ofld_conn_reply(struct c4iw_dev *dev, struct sk_buff *skb,
mutex_unlock(&dev->rdev.stats.lock);
connect_reply_upcall(ep, status2errno(req->retval));
state_set(&ep->com, DEAD);
if (ep->com.remote_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 =
(struct sockaddr_in6 *)&ep->com.mapped_local_addr;
cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
(const u32 *)&sin6->sin6_addr.s6_addr, 1);
}
remove_handle(dev, &dev->atid_idr, atid);
cxgb4_free_atid(dev->rdev.lldi.tids, atid);
dst_release(ep->dst);

View File

@ -970,7 +970,9 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list);
struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
struct ib_device *device,
int page_list_len);
struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth);
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
int c4iw_dealloc_mw(struct ib_mw *mw);
struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,

View File

@ -853,7 +853,9 @@ int c4iw_dealloc_mw(struct ib_mw *mw)
return 0;
}
struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct c4iw_dev *rhp;
struct c4iw_pd *php;
@ -862,6 +864,10 @@ struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
u32 stag = 0;
int ret = 0;
if (mr_type != IB_MR_TYPE_MEM_REG ||
max_num_sg > t4_max_fr_depth(use_dsgl))
return ERR_PTR(-EINVAL);
php = to_c4iw_pd(pd);
rhp = php->rhp;
mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
@ -871,10 +877,10 @@ struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
}
mhp->rhp = rhp;
ret = alloc_pbl(mhp, pbl_depth);
ret = alloc_pbl(mhp, max_num_sg);
if (ret)
goto err1;
mhp->attr.pbl_size = pbl_depth;
mhp->attr.pbl_size = max_num_sg;
ret = allocate_stag(&rhp->rdev, &stag, php->pdid,
mhp->attr.pbl_size, mhp->attr.pbl_addr);
if (ret)

View File

@ -556,7 +556,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.alloc_mw = c4iw_alloc_mw;
dev->ibdev.bind_mw = c4iw_bind_mw;
dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
dev->ibdev.alloc_fast_reg_mr = c4iw_alloc_fast_reg_mr;
dev->ibdev.alloc_mr = c4iw_alloc_mr;
dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
dev->ibdev.attach_mcast = c4iw_multicast_attach;

View File

@ -89,7 +89,7 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
if (vlan_tag < 0x1000)
vlan_tag |= (ah_attr->sl & 7) << 13;
ah->av.eth.port_pd = cpu_to_be32(to_mpd(pd)->pdn | (ah_attr->port_num << 24));
ah->av.eth.gid_index = ah_attr->grh.sgid_index;
ah->av.eth.gid_index = mlx4_ib_gid_index_to_real_index(ibdev, ah_attr->port_num, ah_attr->grh.sgid_index);
ah->av.eth.vlan = cpu_to_be16(vlan_tag);
if (ah_attr->static_rate) {
ah->av.eth.stat_rate = ah_attr->static_rate + MLX4_STAT_RATE_OFFSET;
@ -148,9 +148,13 @@ int mlx4_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
enum rdma_link_layer ll;
memset(ah_attr, 0, sizeof *ah_attr);
ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
ah_attr->port_num = be32_to_cpu(ah->av.ib.port_pd) >> 24;
ll = rdma_port_get_link_layer(ibah->device, ah_attr->port_num);
if (ll == IB_LINK_LAYER_ETHERNET)
ah_attr->sl = be32_to_cpu(ah->av.eth.sl_tclass_flowlabel) >> 29;
else
ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
ah_attr->dlid = ll == IB_LINK_LAYER_INFINIBAND ? be16_to_cpu(ah->av.ib.dlid) : 0;
if (ah->av.ib.stat_rate)
ah_attr->static_rate = ah->av.ib.stat_rate - MLX4_STAT_RATE_OFFSET;

View File

@ -638,7 +638,7 @@ static void mlx4_ib_poll_sw_comp(struct mlx4_ib_cq *cq, int num_entries,
* simulated FLUSH_ERR completions
*/
list_for_each_entry(qp, &cq->send_qp_list, cq_send_list) {
mlx4_ib_qp_sw_comp(qp, num_entries, wc, npolled, 1);
mlx4_ib_qp_sw_comp(qp, num_entries, wc + *npolled, npolled, 1);
if (*npolled >= num_entries)
goto out;
}

View File

@ -580,7 +580,7 @@ int mlx4_ib_send_to_slave(struct mlx4_ib_dev *dev, int slave, u8 port,
list.addr = tun_qp->tx_ring[tun_tx_ix].buf.map;
list.length = sizeof (struct mlx4_rcv_tunnel_mad);
list.lkey = tun_ctx->mr->lkey;
list.lkey = tun_ctx->pd->local_dma_lkey;
wr.wr.ud.ah = ah;
wr.wr.ud.port_num = port;
@ -1133,7 +1133,7 @@ static int mlx4_ib_post_pv_qp_buf(struct mlx4_ib_demux_pv_ctx *ctx,
sg_list.addr = tun_qp->ring[index].map;
sg_list.length = size;
sg_list.lkey = ctx->mr->lkey;
sg_list.lkey = ctx->pd->local_dma_lkey;
recv_wr.next = NULL;
recv_wr.sg_list = &sg_list;
@ -1244,7 +1244,7 @@ int mlx4_ib_send_to_wire(struct mlx4_ib_dev *dev, int slave, u8 port,
list.addr = sqp->tx_ring[wire_tx_ix].buf.map;
list.length = sizeof (struct mlx4_mad_snd_buf);
list.lkey = sqp_ctx->mr->lkey;
list.lkey = sqp_ctx->pd->local_dma_lkey;
wr.wr.ud.ah = ah;
wr.wr.ud.port_num = port;
@ -1827,19 +1827,12 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port,
goto err_cq;
}
ctx->mr = ib_get_dma_mr(ctx->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(ctx->mr)) {
ret = PTR_ERR(ctx->mr);
pr_err("Couldn't get tunnel DMA MR (%d)\n", ret);
goto err_pd;
}
if (ctx->has_smi) {
ret = create_pv_sqp(ctx, IB_QPT_SMI, create_tun);
if (ret) {
pr_err("Couldn't create %s QP0 (%d)\n",
create_tun ? "tunnel for" : "", ret);
goto err_mr;
goto err_pd;
}
}
@ -1876,10 +1869,6 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port,
ib_destroy_qp(ctx->qp[0].qp);
ctx->qp[0].qp = NULL;
err_mr:
ib_dereg_mr(ctx->mr);
ctx->mr = NULL;
err_pd:
ib_dealloc_pd(ctx->pd);
ctx->pd = NULL;
@ -1916,8 +1905,6 @@ static void destroy_pv_resources(struct mlx4_ib_dev *dev, int slave, int port,
ib_destroy_qp(ctx->qp[1].qp);
ctx->qp[1].qp = NULL;
mlx4_ib_free_pv_qp_bufs(ctx, IB_QPT_GSI, 1);
ib_dereg_mr(ctx->mr);
ctx->mr = NULL;
ib_dealloc_pd(ctx->pd);
ctx->pd = NULL;
ib_destroy_cq(ctx->cq);
@ -2050,8 +2037,6 @@ static void mlx4_ib_free_sqp_ctx(struct mlx4_ib_demux_pv_ctx *sqp_ctx)
ib_destroy_qp(sqp_ctx->qp[1].qp);
sqp_ctx->qp[1].qp = NULL;
mlx4_ib_free_pv_qp_bufs(sqp_ctx, IB_QPT_GSI, 0);
ib_dereg_mr(sqp_ctx->mr);
sqp_ctx->mr = NULL;
ib_dealloc_pd(sqp_ctx->pd);
sqp_ctx->pd = NULL;
ib_destroy_cq(sqp_ctx->cq);

File diff suppressed because it is too large Load Diff

View File

@ -51,6 +51,10 @@
pr_warn("%s-%d: %16s (port %d): WARNING: " format, __func__, __LINE__,\
(group)->name, group->demux->port, ## arg)
#define mcg_debug_group(group, format, arg...) \
pr_debug("%s-%d: %16s (port %d): WARNING: " format, __func__, __LINE__,\
(group)->name, (group)->demux->port, ## arg)
#define mcg_error_group(group, format, arg...) \
pr_err(" %16s: " format, (group)->name, ## arg)
@ -206,15 +210,16 @@ static int send_mad_to_wire(struct mlx4_ib_demux_ctx *ctx, struct ib_mad *mad)
{
struct mlx4_ib_dev *dev = ctx->dev;
struct ib_ah_attr ah_attr;
unsigned long flags;
spin_lock(&dev->sm_lock);
spin_lock_irqsave(&dev->sm_lock, flags);
if (!dev->sm_ah[ctx->port - 1]) {
/* port is not yet Active, sm_ah not ready */
spin_unlock(&dev->sm_lock);
spin_unlock_irqrestore(&dev->sm_lock, flags);
return -EAGAIN;
}
mlx4_ib_query_ah(dev->sm_ah[ctx->port - 1], &ah_attr);
spin_unlock(&dev->sm_lock);
spin_unlock_irqrestore(&dev->sm_lock, flags);
return mlx4_ib_send_to_wire(dev, mlx4_master_func_num(dev->dev),
ctx->port, IB_QPT_GSI, 0, 1, IB_QP1_QKEY,
&ah_attr, NULL, mad);
@ -961,8 +966,8 @@ int mlx4_ib_mcg_multiplex_handler(struct ib_device *ibdev, int port,
mutex_lock(&group->lock);
if (group->func[slave].num_pend_reqs > MAX_PEND_REQS_PER_FUNC) {
mutex_unlock(&group->lock);
mcg_warn_group(group, "Port %d, Func %d has too many pending requests (%d), dropping\n",
port, slave, MAX_PEND_REQS_PER_FUNC);
mcg_debug_group(group, "Port %d, Func %d has too many pending requests (%d), dropping\n",
port, slave, MAX_PEND_REQS_PER_FUNC);
release_group(group, 0);
kfree(req);
return -ENOMEM;

View File

@ -70,11 +70,24 @@ extern int mlx4_ib_sm_guid_assign;
#define MLX4_IB_UC_STEER_QPN_ALIGN 1
#define MLX4_IB_UC_MAX_NUM_QPS 256
enum hw_bar_type {
HW_BAR_BF,
HW_BAR_DB,
HW_BAR_CLOCK,
HW_BAR_COUNT
};
struct mlx4_ib_vma_private_data {
struct vm_area_struct *vma;
};
struct mlx4_ib_ucontext {
struct ib_ucontext ibucontext;
struct mlx4_uar uar;
struct list_head db_page_list;
struct mutex db_page_mutex;
struct mlx4_ib_vma_private_data hw_bar_info[HW_BAR_COUNT];
};
struct mlx4_ib_pd {
@ -415,7 +428,6 @@ struct mlx4_ib_demux_pv_ctx {
struct ib_device *ib_dev;
struct ib_cq *cq;
struct ib_pd *pd;
struct ib_mr *mr;
struct work_struct work;
struct workqueue_struct *wq;
struct mlx4_ib_demux_pv_qp qp[2];
@ -457,15 +469,26 @@ struct mlx4_ib_sriov {
struct idr pv_id_table;
};
struct gid_cache_context {
int real_index;
int refcount;
};
struct gid_entry {
union ib_gid gid;
struct gid_cache_context *ctx;
};
struct mlx4_port_gid_table {
struct gid_entry gids[MLX4_MAX_PORT_GIDS];
};
struct mlx4_ib_iboe {
spinlock_t lock;
struct net_device *netdevs[MLX4_MAX_PORTS];
struct net_device *masters[MLX4_MAX_PORTS];
atomic64_t mac[MLX4_MAX_PORTS];
struct notifier_block nb;
struct notifier_block nb_inet;
struct notifier_block nb_inet6;
union ib_gid gid_table[MLX4_MAX_PORTS][128];
struct mlx4_port_gid_table gids[MLX4_MAX_PORTS];
};
struct pkey_mgt {
@ -680,8 +703,9 @@ struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
int mlx4_ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw,
struct ib_mw_bind *mw_bind);
int mlx4_ib_dealloc_mw(struct ib_mw *mw);
struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
@ -838,5 +862,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
u64 start, u64 length, u64 virt_addr,
int mr_access_flags, struct ib_pd *pd,
struct ib_udata *udata);
int mlx4_ib_gid_index_to_real_index(struct mlx4_ib_dev *ibdev,
u8 port_num, int index);
#endif /* MLX4_IB_H */

View File

@ -350,19 +350,24 @@ int mlx4_ib_dealloc_mw(struct ib_mw *ibmw)
return 0;
}
struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len)
struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct mlx4_ib_dev *dev = to_mdev(pd->device);
struct mlx4_ib_mr *mr;
int err;
if (mr_type != IB_MR_TYPE_MEM_REG ||
max_num_sg > MLX4_MAX_FAST_REG_PAGES)
return ERR_PTR(-EINVAL);
mr = kmalloc(sizeof *mr, GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);
err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, 0, 0, 0,
max_page_list_len, 0, &mr->mmr);
max_num_sg, 0, &mr->mmr);
if (err)
goto err_free;

View File

@ -1292,14 +1292,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
path->static_rate = 0;
if (ah->ah_flags & IB_AH_GRH) {
if (ah->grh.sgid_index >= dev->dev->caps.gid_table_len[port]) {
int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev,
port,
ah->grh.sgid_index);
if (real_sgid_index >= dev->dev->caps.gid_table_len[port]) {
pr_err("sgid_index (%u) too large. max is %d\n",
ah->grh.sgid_index, dev->dev->caps.gid_table_len[port] - 1);
real_sgid_index, dev->dev->caps.gid_table_len[port] - 1);
return -1;
}
path->grh_mylmc |= 1 << 7;
path->mgid_index = ah->grh.sgid_index;
path->mgid_index = real_sgid_index;
path->hop_limit = ah->grh.hop_limit;
path->tclass_flowlabel =
cpu_to_be32((ah->grh.traffic_class << 20) |

View File

@ -640,6 +640,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
struct mlx4_port *p;
int i;
int ret;
int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port_num) ==
IB_LINK_LAYER_ETHERNET;
p = kzalloc(sizeof *p, GFP_KERNEL);
if (!p)
@ -657,7 +659,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
p->pkey_group.name = "pkey_idx";
p->pkey_group.attrs =
alloc_group_attrs(show_port_pkey, store_port_pkey,
alloc_group_attrs(show_port_pkey,
is_eth ? NULL : store_port_pkey,
dev->dev->caps.pkey_table_len[port_num]);
if (!p->pkey_group.attrs) {
ret = -ENOMEM;

View File

@ -33,6 +33,7 @@
#include <linux/kref.h>
#include <rdma/ib_umem.h>
#include <rdma/ib_user_verbs.h>
#include <rdma/ib_cache.h>
#include "mlx5_ib.h"
#include "user.h"
@ -227,7 +228,14 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
wc->dlid_path_bits = cqe->ml_path;
g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
wc->wc_flags |= g ? IB_WC_GRH : 0;
wc->pkey_index = be32_to_cpu(cqe->imm_inval_pkey) & 0xffff;
if (unlikely(is_qp1(qp->ibqp.qp_type))) {
u16 pkey = be32_to_cpu(cqe->imm_inval_pkey) & 0xffff;
ib_find_cached_pkey(&dev->ib_dev, qp->port, pkey,
&wc->pkey_index);
} else {
wc->pkey_index = 0;
}
}
static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)

View File

@ -212,6 +212,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
int err = -ENOMEM;
int max_rq_sg;
int max_sq_sg;
u64 min_page_size = 1ull << MLX5_CAP_GEN(mdev, log_pg_sz);
if (uhw->inlen || uhw->outlen)
return -EINVAL;
@ -264,7 +265,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props->hw_ver = mdev->pdev->revision;
props->max_mr_size = ~0ull;
props->page_size_cap = 1ull << MLX5_CAP_GEN(mdev, log_pg_sz);
props->page_size_cap = ~(min_page_size - 1);
props->max_qp = 1 << MLX5_CAP_GEN(mdev, log_max_qp);
props->max_qp_wr = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
max_rq_sg = MLX5_CAP_GEN(mdev, max_wqe_sz_rq) /
@ -273,6 +274,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
sizeof(struct mlx5_wqe_ctrl_seg)) /
sizeof(struct mlx5_wqe_data_seg);
props->max_sge = min(max_rq_sg, max_sq_sg);
props->max_sge_rd = props->max_sge;
props->max_cq = 1 << MLX5_CAP_GEN(mdev, log_max_cq);
props->max_cqe = (1 << MLX5_CAP_GEN(mdev, log_max_eq_sz)) - 1;
props->max_mr = 1 << MLX5_CAP_GEN(mdev, log_max_mkey);
@ -1121,7 +1123,6 @@ static void destroy_umrc_res(struct mlx5_ib_dev *dev)
mlx5_ib_destroy_qp(dev->umrc.qp);
ib_destroy_cq(dev->umrc.cq);
ib_dereg_mr(dev->umrc.mr);
ib_dealloc_pd(dev->umrc.pd);
}
@ -1136,7 +1137,6 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
struct ib_pd *pd;
struct ib_cq *cq;
struct ib_qp *qp;
struct ib_mr *mr;
struct ib_cq_init_attr cq_attr = {};
int ret;
@ -1154,13 +1154,6 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
goto error_0;
}
mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(mr)) {
mlx5_ib_dbg(dev, "Couldn't create DMA MR for sync UMR QP\n");
ret = PTR_ERR(mr);
goto error_1;
}
cq_attr.cqe = 128;
cq = ib_create_cq(&dev->ib_dev, mlx5_umr_cq_handler, NULL, NULL,
&cq_attr);
@ -1218,7 +1211,6 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
dev->umrc.qp = qp;
dev->umrc.cq = cq;
dev->umrc.mr = mr;
dev->umrc.pd = pd;
sema_init(&dev->umrc.sem, MAX_UMR_WR);
@ -1240,9 +1232,6 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
ib_destroy_cq(cq);
error_2:
ib_dereg_mr(mr);
error_1:
ib_dealloc_pd(pd);
error_0:
@ -1256,10 +1245,18 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
struct ib_srq_init_attr attr;
struct mlx5_ib_dev *dev;
struct ib_cq_init_attr cq_attr = {.cqe = 1};
u32 rsvd_lkey;
int ret = 0;
dev = container_of(devr, struct mlx5_ib_dev, devr);
ret = mlx5_core_query_special_context(dev->mdev, &rsvd_lkey);
if (ret) {
pr_err("Failed to query special context %d\n", ret);
return ret;
}
dev->ib_dev.local_dma_lkey = rsvd_lkey;
devr->p0 = mlx5_ib_alloc_pd(&dev->ib_dev, NULL, NULL);
if (IS_ERR(devr->p0)) {
ret = PTR_ERR(devr->p0);
@ -1421,7 +1418,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
strlcpy(dev->ib_dev.name, "mlx5_%d", IB_DEVICE_NAME_MAX);
dev->ib_dev.owner = THIS_MODULE;
dev->ib_dev.node_type = RDMA_NODE_IB_CA;
dev->ib_dev.local_dma_lkey = 0 /* not supported for now */;
dev->num_ports = MLX5_CAP_GEN(mdev, num_ports);
dev->ib_dev.phys_port_cnt = dev->num_ports;
dev->ib_dev.num_comp_vectors =
@ -1490,12 +1486,10 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev->ib_dev.get_dma_mr = mlx5_ib_get_dma_mr;
dev->ib_dev.reg_user_mr = mlx5_ib_reg_user_mr;
dev->ib_dev.dereg_mr = mlx5_ib_dereg_mr;
dev->ib_dev.destroy_mr = mlx5_ib_destroy_mr;
dev->ib_dev.attach_mcast = mlx5_ib_mcg_attach;
dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach;
dev->ib_dev.process_mad = mlx5_ib_process_mad;
dev->ib_dev.create_mr = mlx5_ib_create_mr;
dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr;
dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr;
dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;

View File

@ -349,7 +349,6 @@ struct umr_common {
struct ib_pd *pd;
struct ib_cq *cq;
struct ib_qp *qp;
struct ib_mr *mr;
/* control access to UMR QP
*/
struct semaphore sem;
@ -573,11 +572,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index,
int npages, int zap);
int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
int mlx5_ib_destroy_mr(struct ib_mr *ibmr);
struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
struct ib_mr_init_attr *mr_init_attr);
struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len);
void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
@ -683,6 +680,11 @@ static inline u8 convert_access(int acc)
MLX5_PERM_LOCAL_READ;
}
static inline int is_qp1(enum ib_qp_type qp_type)
{
return qp_type == IB_QPT_GSI;
}
#define MLX5_MAX_UMR_SHIFT 16
#define MLX5_MAX_UMR_PAGES (1 << MLX5_MAX_UMR_SHIFT)

View File

@ -441,9 +441,6 @@ static struct mlx5_ib_mr *alloc_cached_mr(struct mlx5_ib_dev *dev, int order)
spin_unlock_irq(&ent->lock);
queue_work(cache->wq, &ent->work);
if (mr)
break;
}
if (!mr)
@ -690,12 +687,11 @@ static void prep_umr_reg_wqe(struct ib_pd *pd, struct ib_send_wr *wr,
int access_flags)
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct ib_mr *mr = dev->umrc.mr;
struct mlx5_umr_wr *umrwr = (struct mlx5_umr_wr *)&wr->wr.fast_reg;
sg->addr = dma;
sg->length = ALIGN(sizeof(u64) * n, 64);
sg->lkey = mr->lkey;
sg->lkey = dev->umrc.pd->local_dma_lkey;
wr->next = NULL;
wr->send_flags = 0;
@ -926,7 +922,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,
sg.addr = dma;
sg.length = ALIGN(npages * sizeof(u64),
MLX5_UMR_MTT_ALIGNMENT);
sg.lkey = dev->umrc.mr->lkey;
sg.lkey = dev->umrc.pd->local_dma_lkey;
wr.send_flags = MLX5_IB_SEND_UMR_FAIL_IF_FREE |
MLX5_IB_SEND_UMR_UPDATE_MTT;
@ -1118,19 +1114,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
return &mr->ibmr;
error:
/*
* Destroy the umem *before* destroying the MR, to ensure we
* will not have any in-flight notifiers when destroying the
* MR.
*
* As the MR is completely invalid to begin with, and this
* error path is only taken if we can't push the mr entry into
* the pagefault tree, this is safe.
*/
ib_umem_release(umem);
/* Kill the MR, and return an error code. */
clean_mr(mr);
return ERR_PTR(err);
}
@ -1173,6 +1157,19 @@ static int clean_mr(struct mlx5_ib_mr *mr)
int umred = mr->umred;
int err;
if (mr->sig) {
if (mlx5_core_destroy_psv(dev->mdev,
mr->sig->psv_memory.psv_idx))
mlx5_ib_warn(dev, "failed to destroy mem psv %d\n",
mr->sig->psv_memory.psv_idx);
if (mlx5_core_destroy_psv(dev->mdev,
mr->sig->psv_wire.psv_idx))
mlx5_ib_warn(dev, "failed to destroy wire psv %d\n",
mr->sig->psv_wire.psv_idx);
kfree(mr->sig);
mr->sig = NULL;
}
if (!umred) {
err = destroy_mkey(dev, mr);
if (err) {
@ -1234,14 +1231,15 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
}
struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
struct ib_mr_init_attr *mr_init_attr)
struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct mlx5_create_mkey_mbox_in *in;
struct mlx5_ib_mr *mr;
int access_mode, err;
int ndescs = roundup(mr_init_attr->max_reg_descriptors, 4);
int ndescs = roundup(max_num_sg, 4);
mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
@ -1257,9 +1255,11 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
in->seg.xlt_oct_size = cpu_to_be32(ndescs);
in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn);
access_mode = MLX5_ACCESS_MODE_MTT;
if (mr_init_attr->flags & IB_MR_SIGNATURE_EN) {
if (mr_type == IB_MR_TYPE_MEM_REG) {
access_mode = MLX5_ACCESS_MODE_MTT;
in->seg.log2_page_size = PAGE_SHIFT;
} else if (mr_type == IB_MR_TYPE_SIGNATURE) {
u32 psv_index[2];
in->seg.flags_pd = cpu_to_be32(be32_to_cpu(in->seg.flags_pd) |
@ -1285,6 +1285,10 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
mr->sig->sig_err_exists = false;
/* Next UMR, Arm SIGERR */
++mr->sig->sigerr_count;
} else {
mlx5_ib_warn(dev, "Invalid mr type %d\n", mr_type);
err = -EINVAL;
goto err_free_in;
}
in->seg.flags = MLX5_PERM_UMR_EN | access_mode;
@ -1320,80 +1324,6 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
return ERR_PTR(err);
}
int mlx5_ib_destroy_mr(struct ib_mr *ibmr)
{
struct mlx5_ib_dev *dev = to_mdev(ibmr->device);
struct mlx5_ib_mr *mr = to_mmr(ibmr);
int err;
if (mr->sig) {
if (mlx5_core_destroy_psv(dev->mdev,
mr->sig->psv_memory.psv_idx))
mlx5_ib_warn(dev, "failed to destroy mem psv %d\n",
mr->sig->psv_memory.psv_idx);
if (mlx5_core_destroy_psv(dev->mdev,
mr->sig->psv_wire.psv_idx))
mlx5_ib_warn(dev, "failed to destroy wire psv %d\n",
mr->sig->psv_wire.psv_idx);
kfree(mr->sig);
}
err = destroy_mkey(dev, mr);
if (err) {
mlx5_ib_warn(dev, "failed to destroy mkey 0x%x (%d)\n",
mr->mmr.key, err);
return err;
}
kfree(mr);
return err;
}
struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len)
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct mlx5_create_mkey_mbox_in *in;
struct mlx5_ib_mr *mr;
int err;
mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);
in = kzalloc(sizeof(*in), GFP_KERNEL);
if (!in) {
err = -ENOMEM;
goto err_free;
}
in->seg.status = MLX5_MKEY_STATUS_FREE;
in->seg.xlt_oct_size = cpu_to_be32((max_page_list_len + 1) / 2);
in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
in->seg.flags = MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn);
/*
* TBD not needed - issue 197292 */
in->seg.log2_page_size = PAGE_SHIFT;
err = mlx5_core_create_mkey(dev->mdev, &mr->mmr, in, sizeof(*in), NULL,
NULL, NULL);
kfree(in);
if (err)
goto err_free;
mr->ibmr.lkey = mr->mmr.key;
mr->ibmr.rkey = mr->mmr.key;
mr->umem = NULL;
return &mr->ibmr;
err_free:
kfree(mr);
return ERR_PTR(err);
}
struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
int page_list_len)
{

View File

@ -76,11 +76,6 @@ static int is_qp0(enum ib_qp_type qp_type)
return qp_type == IB_QPT_SMI;
}
static int is_qp1(enum ib_qp_type qp_type)
{
return qp_type == IB_QPT_GSI;
}
static int is_sqp(enum ib_qp_type qp_type)
{
return is_qp0(qp_type) || is_qp1(qp_type);

View File

@ -97,6 +97,7 @@ static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *pr
props->max_qp = mdev->limits.num_qps - mdev->limits.reserved_qps;
props->max_qp_wr = mdev->limits.max_wqes;
props->max_sge = mdev->limits.max_sg;
props->max_sge_rd = props->max_sge;
props->max_cq = mdev->limits.num_cqs - mdev->limits.reserved_cqs;
props->max_cqe = mdev->limits.max_cqes;
props->max_mr = mdev->limits.num_mpts - mdev->limits.reserved_mrws;

View File

@ -375,9 +375,11 @@ static int alloc_fast_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd,
}
/*
* nes_alloc_fast_reg_mr
* nes_alloc_mr
*/
static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len)
static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct nes_pd *nespd = to_nespd(ibpd);
struct nes_vnic *nesvnic = to_nesvnic(ibpd->device);
@ -393,11 +395,18 @@ static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list
u32 stag;
int ret;
struct ib_mr *ibmr;
if (mr_type != IB_MR_TYPE_MEM_REG)
return ERR_PTR(-EINVAL);
if (max_num_sg > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
return ERR_PTR(-E2BIG);
/*
* Note: Set to always use a fixed length single page entry PBL. This is to allow
* for the fast_reg_mr operation to always know the size of the PBL.
*/
if (max_page_list_len > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
if (max_num_sg > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
return ERR_PTR(-E2BIG);
get_random_bytes(&next_stag_index, sizeof(next_stag_index));
@ -424,7 +433,7 @@ static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list
nes_debug(NES_DBG_MR, "Allocating STag 0x%08X index = 0x%08X\n",
stag, stag_index);
ret = alloc_fast_reg_mr(nesdev, nespd, stag, max_page_list_len);
ret = alloc_fast_reg_mr(nesdev, nespd, stag, max_num_sg);
if (ret == 0) {
nesmr->ibmr.rkey = stag;
@ -3929,7 +3938,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
nesibdev->ibdev.dealloc_mw = nes_dealloc_mw;
nesibdev->ibdev.bind_mw = nes_bind_mw;
nesibdev->ibdev.alloc_fast_reg_mr = nes_alloc_fast_reg_mr;
nesibdev->ibdev.alloc_mr = nes_alloc_mr;
nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;

View File

@ -246,7 +246,6 @@ struct ocrdma_dev {
u16 base_eqid;
u16 max_eq;
union ib_gid *sgid_tbl;
/* provided synchronization to sgid table for
* updating gid entries triggered by notifier.
*/

View File

@ -67,8 +67,6 @@ static LIST_HEAD(ocrdma_dev_list);
static DEFINE_SPINLOCK(ocrdma_devlist_lock);
static DEFINE_IDR(ocrdma_dev_id);
static union ib_gid ocrdma_zero_sgid;
void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
{
u8 mac_addr[6];
@ -83,135 +81,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
guid[6] = mac_addr[4];
guid[7] = mac_addr[5];
}
static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid)
{
int i;
unsigned long flags;
memset(&ocrdma_zero_sgid, 0, sizeof(union ib_gid));
spin_lock_irqsave(&dev->sgid_lock, flags);
for (i = 0; i < OCRDMA_MAX_SGID; i++) {
if (!memcmp(&dev->sgid_tbl[i], &ocrdma_zero_sgid,
sizeof(union ib_gid))) {
/* found free entry */
memcpy(&dev->sgid_tbl[i], new_sgid,
sizeof(union ib_gid));
spin_unlock_irqrestore(&dev->sgid_lock, flags);
return true;
} else if (!memcmp(&dev->sgid_tbl[i], new_sgid,
sizeof(union ib_gid))) {
/* entry already present, no addition is required. */
spin_unlock_irqrestore(&dev->sgid_lock, flags);
return false;
}
}
spin_unlock_irqrestore(&dev->sgid_lock, flags);
return false;
}
static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid)
{
int found = false;
int i;
unsigned long flags;
spin_lock_irqsave(&dev->sgid_lock, flags);
/* first is default sgid, which cannot be deleted. */
for (i = 1; i < OCRDMA_MAX_SGID; i++) {
if (!memcmp(&dev->sgid_tbl[i], sgid, sizeof(union ib_gid))) {
/* found matching entry */
memset(&dev->sgid_tbl[i], 0, sizeof(union ib_gid));
found = true;
break;
}
}
spin_unlock_irqrestore(&dev->sgid_lock, flags);
return found;
}
static int ocrdma_addr_event(unsigned long event, struct net_device *netdev,
union ib_gid *gid)
{
struct ib_event gid_event;
struct ocrdma_dev *dev;
bool found = false;
bool updated = false;
bool is_vlan = false;
is_vlan = netdev->priv_flags & IFF_802_1Q_VLAN;
if (is_vlan)
netdev = rdma_vlan_dev_real_dev(netdev);
rcu_read_lock();
list_for_each_entry_rcu(dev, &ocrdma_dev_list, entry) {
if (dev->nic_info.netdev == netdev) {
found = true;
break;
}
}
rcu_read_unlock();
if (!found)
return NOTIFY_DONE;
mutex_lock(&dev->dev_lock);
switch (event) {
case NETDEV_UP:
updated = ocrdma_add_sgid(dev, gid);
break;
case NETDEV_DOWN:
updated = ocrdma_del_sgid(dev, gid);
break;
default:
break;
}
if (updated) {
/* GID table updated, notify the consumers about it */
gid_event.device = &dev->ibdev;
gid_event.element.port_num = 1;
gid_event.event = IB_EVENT_GID_CHANGE;
ib_dispatch_event(&gid_event);
}
mutex_unlock(&dev->dev_lock);
return NOTIFY_OK;
}
static int ocrdma_inetaddr_event(struct notifier_block *notifier,
unsigned long event, void *ptr)
{
struct in_ifaddr *ifa = ptr;
union ib_gid gid;
struct net_device *netdev = ifa->ifa_dev->dev;
ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
return ocrdma_addr_event(event, netdev, &gid);
}
static struct notifier_block ocrdma_inetaddr_notifier = {
.notifier_call = ocrdma_inetaddr_event
};
#if IS_ENABLED(CONFIG_IPV6)
static int ocrdma_inet6addr_event(struct notifier_block *notifier,
unsigned long event, void *ptr)
{
struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
union ib_gid *gid = (union ib_gid *)&ifa->addr;
struct net_device *netdev = ifa->idev->dev;
return ocrdma_addr_event(event, netdev, gid);
}
static struct notifier_block ocrdma_inet6addr_notifier = {
.notifier_call = ocrdma_inet6addr_event
};
#endif /* IPV6 and VLAN */
static enum rdma_link_layer ocrdma_link_layer(struct ib_device *device,
u8 port_num)
{
@ -280,6 +149,9 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
dev->ibdev.query_port = ocrdma_query_port;
dev->ibdev.modify_port = ocrdma_modify_port;
dev->ibdev.query_gid = ocrdma_query_gid;
dev->ibdev.get_netdev = ocrdma_get_netdev;
dev->ibdev.add_gid = ocrdma_add_gid;
dev->ibdev.del_gid = ocrdma_del_gid;
dev->ibdev.get_link_layer = ocrdma_link_layer;
dev->ibdev.alloc_pd = ocrdma_alloc_pd;
dev->ibdev.dealloc_pd = ocrdma_dealloc_pd;
@ -309,7 +181,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
dev->ibdev.dereg_mr = ocrdma_dereg_mr;
dev->ibdev.reg_user_mr = ocrdma_reg_user_mr;
dev->ibdev.alloc_fast_reg_mr = ocrdma_alloc_frmr;
dev->ibdev.alloc_mr = ocrdma_alloc_mr;
dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;
@ -342,12 +214,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
static int ocrdma_alloc_resources(struct ocrdma_dev *dev)
{
mutex_init(&dev->dev_lock);
dev->sgid_tbl = kzalloc(sizeof(union ib_gid) *
OCRDMA_MAX_SGID, GFP_KERNEL);
if (!dev->sgid_tbl)
goto alloc_err;
spin_lock_init(&dev->sgid_lock);
dev->cq_tbl = kzalloc(sizeof(struct ocrdma_cq *) *
OCRDMA_MAX_CQ, GFP_KERNEL);
if (!dev->cq_tbl)
@ -379,7 +245,6 @@ static void ocrdma_free_resources(struct ocrdma_dev *dev)
kfree(dev->stag_arr);
kfree(dev->qp_tbl);
kfree(dev->cq_tbl);
kfree(dev->sgid_tbl);
}
/* OCRDMA sysfs interface */
@ -425,68 +290,6 @@ static void ocrdma_remove_sysfiles(struct ocrdma_dev *dev)
device_remove_file(&dev->ibdev.dev, ocrdma_attributes[i]);
}
static void ocrdma_add_default_sgid(struct ocrdma_dev *dev)
{
/* GID Index 0 - Invariant manufacturer-assigned EUI-64 */
union ib_gid *sgid = &dev->sgid_tbl[0];
sgid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
ocrdma_get_guid(dev, &sgid->raw[8]);
}
static void ocrdma_init_ipv4_gids(struct ocrdma_dev *dev,
struct net_device *net)
{
struct in_device *in_dev;
union ib_gid gid;
in_dev = in_dev_get(net);
if (in_dev) {
for_ifa(in_dev) {
ipv6_addr_set_v4mapped(ifa->ifa_address,
(struct in6_addr *)&gid);
ocrdma_add_sgid(dev, &gid);
}
endfor_ifa(in_dev);
in_dev_put(in_dev);
}
}
static void ocrdma_init_ipv6_gids(struct ocrdma_dev *dev,
struct net_device *net)
{
#if IS_ENABLED(CONFIG_IPV6)
struct inet6_dev *in6_dev;
union ib_gid *pgid;
struct inet6_ifaddr *ifp;
in6_dev = in6_dev_get(net);
if (in6_dev) {
read_lock_bh(&in6_dev->lock);
list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
pgid = (union ib_gid *)&ifp->addr;
ocrdma_add_sgid(dev, pgid);
}
read_unlock_bh(&in6_dev->lock);
in6_dev_put(in6_dev);
}
#endif
}
static void ocrdma_init_gid_table(struct ocrdma_dev *dev)
{
struct net_device *net_dev;
for_each_netdev(&init_net, net_dev) {
struct net_device *real_dev = rdma_vlan_dev_real_dev(net_dev) ?
rdma_vlan_dev_real_dev(net_dev) : net_dev;
if (real_dev == dev->nic_info.netdev) {
ocrdma_add_default_sgid(dev);
ocrdma_init_ipv4_gids(dev, net_dev);
ocrdma_init_ipv6_gids(dev, net_dev);
}
}
}
static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info)
{
int status = 0, i;
@ -515,7 +318,6 @@ static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info)
goto alloc_err;
ocrdma_init_service_level(dev);
ocrdma_init_gid_table(dev);
status = ocrdma_register_device(dev);
if (status)
goto alloc_err;
@ -662,34 +464,12 @@ static struct ocrdma_driver ocrdma_drv = {
.be_abi_version = OCRDMA_BE_ROCE_ABI_VERSION,
};
static void ocrdma_unregister_inet6addr_notifier(void)
{
#if IS_ENABLED(CONFIG_IPV6)
unregister_inet6addr_notifier(&ocrdma_inet6addr_notifier);
#endif
}
static void ocrdma_unregister_inetaddr_notifier(void)
{
unregister_inetaddr_notifier(&ocrdma_inetaddr_notifier);
}
static int __init ocrdma_init_module(void)
{
int status;
ocrdma_init_debugfs();
status = register_inetaddr_notifier(&ocrdma_inetaddr_notifier);
if (status)
return status;
#if IS_ENABLED(CONFIG_IPV6)
status = register_inet6addr_notifier(&ocrdma_inet6addr_notifier);
if (status)
goto err_notifier6;
#endif
status = be_roce_register_driver(&ocrdma_drv);
if (status)
goto err_be_reg;
@ -697,19 +477,13 @@ static int __init ocrdma_init_module(void)
return 0;
err_be_reg:
#if IS_ENABLED(CONFIG_IPV6)
ocrdma_unregister_inet6addr_notifier();
err_notifier6:
#endif
ocrdma_unregister_inetaddr_notifier();
return status;
}
static void __exit ocrdma_exit_module(void)
{
be_roce_unregister_driver(&ocrdma_drv);
ocrdma_unregister_inet6addr_notifier();
ocrdma_unregister_inetaddr_notifier();
ocrdma_rem_debugfs();
idr_destroy(&ocrdma_dev_id);
}

View File

@ -140,6 +140,8 @@ enum {
OCRDMA_DB_RQ_SHIFT = 24
};
#define OCRDMA_ROUDP_FLAGS_SHIFT 0x03
#define OCRDMA_DB_CQ_RING_ID_MASK 0x3FF /* bits 0 - 9 */
#define OCRDMA_DB_CQ_RING_ID_EXT_MASK 0x0C00 /* bits 10-11 of qid at 12-11 */
/* qid #2 msbits at 12-11 */

View File

@ -46,6 +46,7 @@
#include <rdma/iw_cm.h>
#include <rdma/ib_umem.h>
#include <rdma/ib_addr.h>
#include <rdma/ib_cache.h>
#include "ocrdma.h"
#include "ocrdma_hw.h"
@ -64,6 +65,7 @@ int ocrdma_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 *pkey)
int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
int index, union ib_gid *sgid)
{
int ret;
struct ocrdma_dev *dev;
dev = get_ocrdma_dev(ibdev);
@ -71,8 +73,28 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
if (index >= OCRDMA_MAX_SGID)
return -EINVAL;
memcpy(sgid, &dev->sgid_tbl[index], sizeof(*sgid));
ret = ib_get_cached_gid(ibdev, port, index, sgid);
if (ret == -EAGAIN) {
memcpy(sgid, &zgid, sizeof(*sgid));
return 0;
}
return ret;
}
int ocrdma_add_gid(struct ib_device *device,
u8 port_num,
unsigned int index,
const union ib_gid *gid,
const struct ib_gid_attr *attr,
void **context) {
return 0;
}
int ocrdma_del_gid(struct ib_device *device,
u8 port_num,
unsigned int index,
void **context) {
return 0;
}
@ -125,6 +147,24 @@ int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr,
return 0;
}
struct net_device *ocrdma_get_netdev(struct ib_device *ibdev, u8 port_num)
{
struct ocrdma_dev *dev;
struct net_device *ndev = NULL;
rcu_read_lock();
dev = get_ocrdma_dev(ibdev);
if (dev)
ndev = dev->nic_info.netdev;
if (ndev)
dev_hold(ndev);
rcu_read_unlock();
return ndev;
}
static inline void get_link_speed_and_width(struct ocrdma_dev *dev,
u8 *ib_speed, u8 *ib_width)
{
@ -194,7 +234,8 @@ int ocrdma_query_port(struct ib_device *ibdev,
props->port_cap_flags =
IB_PORT_CM_SUP |
IB_PORT_REINIT_SUP |
IB_PORT_DEVICE_MGMT_SUP | IB_PORT_VENDOR_CLASS_SUP | IB_PORT_IP_BASED_GIDS;
IB_PORT_DEVICE_MGMT_SUP | IB_PORT_VENDOR_CLASS_SUP |
IB_PORT_IP_BASED_GIDS;
props->gid_tbl_len = OCRDMA_MAX_SGID;
props->pkey_tbl_len = 1;
props->bad_pkey_cntr = 0;
@ -2998,21 +3039,26 @@ int ocrdma_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags cq_flags)
return 0;
}
struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *ibpd, int max_page_list_len)
struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
int status;
struct ocrdma_mr *mr;
struct ocrdma_pd *pd = get_ocrdma_pd(ibpd);
struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device);
if (max_page_list_len > dev->attr.max_pages_per_frmr)
if (mr_type != IB_MR_TYPE_MEM_REG)
return ERR_PTR(-EINVAL);
if (max_num_sg > dev->attr.max_pages_per_frmr)
return ERR_PTR(-EINVAL);
mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);
status = ocrdma_get_pbl_info(dev, mr, max_page_list_len);
status = ocrdma_get_pbl_info(dev, mr, max_num_sg);
if (status)
goto pbl_err;
mr->hwmr.fr_mr = 1;

View File

@ -63,6 +63,17 @@ ocrdma_query_protocol(struct ib_device *device, u8 port_num);
void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid);
int ocrdma_query_gid(struct ib_device *, u8 port,
int index, union ib_gid *gid);
struct net_device *ocrdma_get_netdev(struct ib_device *device, u8 port_num);
int ocrdma_add_gid(struct ib_device *device,
u8 port_num,
unsigned int index,
const union ib_gid *gid,
const struct ib_gid_attr *attr,
void **context);
int ocrdma_del_gid(struct ib_device *device,
u8 port_num,
unsigned int index,
void **context);
int ocrdma_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
struct ib_ucontext *ocrdma_alloc_ucontext(struct ib_device *,
@ -111,7 +122,9 @@ struct ib_mr *ocrdma_reg_kernel_mr(struct ib_pd *,
int num_phys_buf, int acc, u64 *iova_start);
struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
u64 virt, int acc, struct ib_udata *);
struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *pd, int max_page_list_len);
struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
*ibdev,
int page_list_len);

View File

@ -86,6 +86,10 @@ int qib_alloc_lkey(struct qib_mregion *mr, int dma_region)
* unrestricted LKEY.
*/
rkt->gen++;
/*
* bits are capped in qib_verbs.c to insure enough bits
* for generation number
*/
mr->lkey = (r << (32 - ib_qib_lkey_table_size)) |
((((1 << (24 - ib_qib_lkey_table_size)) - 1) & rkt->gen)
<< 8);

View File

@ -36,148 +36,17 @@
#include <rdma/ib_pma.h>
#define IB_SMP_UNSUP_VERSION cpu_to_be16(0x0004)
#define IB_SMP_UNSUP_METHOD cpu_to_be16(0x0008)
#define IB_SMP_UNSUP_METH_ATTR cpu_to_be16(0x000C)
#define IB_SMP_INVALID_FIELD cpu_to_be16(0x001C)
#define IB_SMP_UNSUP_VERSION \
cpu_to_be16(IB_MGMT_MAD_STATUS_BAD_VERSION)
struct ib_node_info {
u8 base_version;
u8 class_version;
u8 node_type;
u8 num_ports;
__be64 sys_guid;
__be64 node_guid;
__be64 port_guid;
__be16 partition_cap;
__be16 device_id;
__be32 revision;
u8 local_port_num;
u8 vendor_id[3];
} __packed;
#define IB_SMP_UNSUP_METHOD \
cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD)
struct ib_mad_notice_attr {
u8 generic_type;
u8 prod_type_msb;
__be16 prod_type_lsb;
__be16 trap_num;
__be16 issuer_lid;
__be16 toggle_count;
#define IB_SMP_UNSUP_METH_ATTR \
cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD_ATTRIB)
union {
struct {
u8 details[54];
} raw_data;
struct {
__be16 reserved;
__be16 lid; /* where violation happened */
u8 port_num; /* where violation happened */
} __packed ntc_129_131;
struct {
__be16 reserved;
__be16 lid; /* LID where change occurred */
u8 reserved2;
u8 local_changes; /* low bit - local changes */
__be32 new_cap_mask; /* new capability mask */
u8 reserved3;
u8 change_flags; /* low 3 bits only */
} __packed ntc_144;
struct {
__be16 reserved;
__be16 lid; /* lid where sys guid changed */
__be16 reserved2;
__be64 new_sys_guid;
} __packed ntc_145;
struct {
__be16 reserved;
__be16 lid;
__be16 dr_slid;
u8 method;
u8 reserved2;
__be16 attr_id;
__be32 attr_mod;
__be64 mkey;
u8 reserved3;
u8 dr_trunc_hop;
u8 dr_rtn_path[30];
} __packed ntc_256;
struct {
__be16 reserved;
__be16 lid1;
__be16 lid2;
__be32 key;
__be32 sl_qp1; /* SL: high 4 bits */
__be32 qp2; /* high 8 bits reserved */
union ib_gid gid1;
union ib_gid gid2;
} __packed ntc_257_258;
} details;
};
/*
* Generic trap/notice types
*/
#define IB_NOTICE_TYPE_FATAL 0x80
#define IB_NOTICE_TYPE_URGENT 0x81
#define IB_NOTICE_TYPE_SECURITY 0x82
#define IB_NOTICE_TYPE_SM 0x83
#define IB_NOTICE_TYPE_INFO 0x84
/*
* Generic trap/notice producers
*/
#define IB_NOTICE_PROD_CA cpu_to_be16(1)
#define IB_NOTICE_PROD_SWITCH cpu_to_be16(2)
#define IB_NOTICE_PROD_ROUTER cpu_to_be16(3)
#define IB_NOTICE_PROD_CLASS_MGR cpu_to_be16(4)
/*
* Generic trap/notice numbers
*/
#define IB_NOTICE_TRAP_LLI_THRESH cpu_to_be16(129)
#define IB_NOTICE_TRAP_EBO_THRESH cpu_to_be16(130)
#define IB_NOTICE_TRAP_FLOW_UPDATE cpu_to_be16(131)
#define IB_NOTICE_TRAP_CAP_MASK_CHG cpu_to_be16(144)
#define IB_NOTICE_TRAP_SYS_GUID_CHG cpu_to_be16(145)
#define IB_NOTICE_TRAP_BAD_MKEY cpu_to_be16(256)
#define IB_NOTICE_TRAP_BAD_PKEY cpu_to_be16(257)
#define IB_NOTICE_TRAP_BAD_QKEY cpu_to_be16(258)
/*
* Repress trap/notice flags
*/
#define IB_NOTICE_REPRESS_LLI_THRESH (1 << 0)
#define IB_NOTICE_REPRESS_EBO_THRESH (1 << 1)
#define IB_NOTICE_REPRESS_FLOW_UPDATE (1 << 2)
#define IB_NOTICE_REPRESS_CAP_MASK_CHG (1 << 3)
#define IB_NOTICE_REPRESS_SYS_GUID_CHG (1 << 4)
#define IB_NOTICE_REPRESS_BAD_MKEY (1 << 5)
#define IB_NOTICE_REPRESS_BAD_PKEY (1 << 6)
#define IB_NOTICE_REPRESS_BAD_QKEY (1 << 7)
/*
* Generic trap/notice other local changes flags (trap 144).
*/
#define IB_NOTICE_TRAP_LSE_CHG 0x04 /* Link Speed Enable changed */
#define IB_NOTICE_TRAP_LWE_CHG 0x02 /* Link Width Enable changed */
#define IB_NOTICE_TRAP_NODE_DESC_CHG 0x01
/*
* Generic trap/notice M_Key volation flags in dr_trunc_hop (trap 256).
*/
#define IB_NOTICE_TRAP_DR_NOTICE 0x80
#define IB_NOTICE_TRAP_DR_TRUNC 0x40
struct ib_vl_weight_elem {
u8 vl; /* Only low 4 bits, upper 4 bits reserved */
u8 weight;
};
#define IB_SMP_INVALID_FIELD \
cpu_to_be16(IB_MGMT_MAD_STATUS_INVALID_ATTRIB_VALUE)
#define IB_VLARB_LOWPRI_0_31 1
#define IB_VLARB_LOWPRI_32_63 2

View File

@ -327,11 +327,16 @@ int qib_dereg_mr(struct ib_mr *ibmr)
*
* Return the memory region on success, otherwise return an errno.
*/
struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct qib_mr *mr;
mr = alloc_mr(max_page_list_len, pd);
if (mr_type != IB_MR_TYPE_MEM_REG)
return ERR_PTR(-EINVAL);
mr = alloc_mr(max_num_sg, pd);
if (IS_ERR(mr))
return (struct ib_mr *)mr;

View File

@ -32,6 +32,7 @@
*/
#include <linux/spinlock.h>
#include <rdma/ib_smi.h>
#include "qib.h"
#include "qib_mad.h"

View File

@ -40,6 +40,7 @@
#include <linux/rculist.h>
#include <linux/mm.h>
#include <linux/random.h>
#include <linux/vmalloc.h>
#include "qib.h"
#include "qib_common.h"
@ -1574,6 +1575,7 @@ static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *prop
props->max_qp = ib_qib_max_qps;
props->max_qp_wr = ib_qib_max_qp_wrs;
props->max_sge = ib_qib_max_sges;
props->max_sge_rd = ib_qib_max_sges;
props->max_cq = ib_qib_max_cqs;
props->max_ah = ib_qib_max_ahs;
props->max_cqe = ib_qib_max_cqes;
@ -2109,10 +2111,16 @@ int qib_register_ib_device(struct qib_devdata *dd)
* the LKEY). The remaining bits act as a generation number or tag.
*/
spin_lock_init(&dev->lk_table.lock);
/* insure generation is at least 4 bits see keys.c */
if (ib_qib_lkey_table_size > MAX_LKEY_TABLE_BITS) {
qib_dev_warn(dd, "lkey bits %u too large, reduced to %u\n",
ib_qib_lkey_table_size, MAX_LKEY_TABLE_BITS);
ib_qib_lkey_table_size = MAX_LKEY_TABLE_BITS;
}
dev->lk_table.max = 1 << ib_qib_lkey_table_size;
lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
dev->lk_table.table = (struct qib_mregion __rcu **)
__get_free_pages(GFP_KERNEL, get_order(lk_tab_size));
vmalloc(lk_tab_size);
if (dev->lk_table.table == NULL) {
ret = -ENOMEM;
goto err_lk;
@ -2235,7 +2243,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->reg_phys_mr = qib_reg_phys_mr;
ibdev->reg_user_mr = qib_reg_user_mr;
ibdev->dereg_mr = qib_dereg_mr;
ibdev->alloc_fast_reg_mr = qib_alloc_fast_reg_mr;
ibdev->alloc_mr = qib_alloc_mr;
ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
ibdev->alloc_fmr = qib_alloc_fmr;
@ -2286,7 +2294,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
sizeof(struct qib_pio_header),
dev->pio_hdrs, dev->pio_hdrs_phys);
err_hdrs:
free_pages((unsigned long) dev->lk_table.table, get_order(lk_tab_size));
vfree(dev->lk_table.table);
err_lk:
kfree(dev->qp_table);
err_qpt:
@ -2340,8 +2348,7 @@ void qib_unregister_ib_device(struct qib_devdata *dd)
sizeof(struct qib_pio_header),
dev->pio_hdrs, dev->pio_hdrs_phys);
lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
free_pages((unsigned long) dev->lk_table.table,
get_order(lk_tab_size));
vfree(dev->lk_table.table);
kfree(dev->qp_table);
}

View File

@ -647,6 +647,8 @@ struct qib_qpn_table {
struct qpn_map map[QPNMAP_ENTRIES];
};
#define MAX_LKEY_TABLE_BITS 23
struct qib_lkey_table {
spinlock_t lock; /* protect changes in this struct */
u32 next; /* next unused index (speeds search) */
@ -1032,7 +1034,9 @@ struct ib_mr *qib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
int qib_dereg_mr(struct ib_mr *ibmr);
struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len);
struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_entries);
struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
struct ib_device *ibdev, int page_list_len);

View File

@ -342,7 +342,6 @@ struct ipoib_dev_priv {
u16 pkey;
u16 pkey_index;
struct ib_pd *pd;
struct ib_mr *mr;
struct ib_cq *recv_cq;
struct ib_cq *send_cq;
struct ib_qp *qp;

View File

@ -332,7 +332,7 @@ static void ipoib_cm_init_rx_wr(struct net_device *dev,
int i;
for (i = 0; i < priv->cm.num_frags; ++i)
sge[i].lkey = priv->mr->lkey;
sge[i].lkey = priv->pd->local_dma_lkey;
sge[0].length = IPOIB_CM_HEAD_SIZE;
for (i = 1; i < priv->cm.num_frags; ++i)
@ -848,7 +848,7 @@ int ipoib_cm_dev_open(struct net_device *dev)
}
ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num),
0, NULL);
0);
if (ret) {
printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name,
IPOIB_CM_IETF_ID | priv->qp->qp_num);

View File

@ -48,6 +48,9 @@
#include <linux/jhash.h>
#include <net/arp.h>
#include <net/addrconf.h>
#include <linux/inetdevice.h>
#include <rdma/ib_cache.h>
#define DRV_VERSION "1.0.0"
@ -89,13 +92,18 @@ struct workqueue_struct *ipoib_workqueue;
struct ib_sa_client ipoib_sa_client;
static void ipoib_add_one(struct ib_device *device);
static void ipoib_remove_one(struct ib_device *device);
static void ipoib_remove_one(struct ib_device *device, void *client_data);
static void ipoib_neigh_reclaim(struct rcu_head *rp);
static struct net_device *ipoib_get_net_dev_by_params(
struct ib_device *dev, u8 port, u16 pkey,
const union ib_gid *gid, const struct sockaddr *addr,
void *client_data);
static struct ib_client ipoib_client = {
.name = "ipoib",
.add = ipoib_add_one,
.remove = ipoib_remove_one
.remove = ipoib_remove_one,
.get_net_dev_by_params = ipoib_get_net_dev_by_params,
};
int ipoib_open(struct net_device *dev)
@ -222,6 +230,225 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
return 0;
}
/* Called with an RCU read lock taken */
static bool ipoib_is_dev_match_addr_rcu(const struct sockaddr *addr,
struct net_device *dev)
{
struct net *net = dev_net(dev);
struct in_device *in_dev;
struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
__be32 ret_addr;
switch (addr->sa_family) {
case AF_INET:
in_dev = in_dev_get(dev);
if (!in_dev)
return false;
ret_addr = inet_confirm_addr(net, in_dev, 0,
addr_in->sin_addr.s_addr,
RT_SCOPE_HOST);
in_dev_put(in_dev);
if (ret_addr)
return true;
break;
case AF_INET6:
if (IS_ENABLED(CONFIG_IPV6) &&
ipv6_chk_addr(net, &addr_in6->sin6_addr, dev, 1))
return true;
break;
}
return false;
}
/**
* Find the master net_device on top of the given net_device.
* @dev: base IPoIB net_device
*
* Returns the master net_device with a reference held, or the same net_device
* if no master exists.
*/
static struct net_device *ipoib_get_master_net_dev(struct net_device *dev)
{
struct net_device *master;
rcu_read_lock();
master = netdev_master_upper_dev_get_rcu(dev);
if (master)
dev_hold(master);
rcu_read_unlock();
if (master)
return master;
dev_hold(dev);
return dev;
}
/**
* Find a net_device matching the given address, which is an upper device of
* the given net_device.
* @addr: IP address to look for.
* @dev: base IPoIB net_device
*
* If found, returns the net_device with a reference held. Otherwise return
* NULL.
*/
static struct net_device *ipoib_get_net_dev_match_addr(
const struct sockaddr *addr, struct net_device *dev)
{
struct net_device *upper,
*result = NULL;
struct list_head *iter;
rcu_read_lock();
if (ipoib_is_dev_match_addr_rcu(addr, dev)) {
dev_hold(dev);
result = dev;
goto out;
}
netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
if (ipoib_is_dev_match_addr_rcu(addr, upper)) {
dev_hold(upper);
result = upper;
break;
}
}
out:
rcu_read_unlock();
return result;
}
/* returns the number of IPoIB netdevs on top a given ipoib device matching a
* pkey_index and address, if one exists.
*
* @found_net_dev: contains a matching net_device if the return value >= 1,
* with a reference held. */
static int ipoib_match_gid_pkey_addr(struct ipoib_dev_priv *priv,
const union ib_gid *gid,
u16 pkey_index,
const struct sockaddr *addr,
int nesting,
struct net_device **found_net_dev)
{
struct ipoib_dev_priv *child_priv;
struct net_device *net_dev = NULL;
int matches = 0;
if (priv->pkey_index == pkey_index &&
(!gid || !memcmp(gid, &priv->local_gid, sizeof(*gid)))) {
if (!addr) {
net_dev = ipoib_get_master_net_dev(priv->dev);
} else {
/* Verify the net_device matches the IP address, as
* IPoIB child devices currently share a GID. */
net_dev = ipoib_get_net_dev_match_addr(addr, priv->dev);
}
if (net_dev) {
if (!*found_net_dev)
*found_net_dev = net_dev;
else
dev_put(net_dev);
++matches;
}
}
/* Check child interfaces */
down_read_nested(&priv->vlan_rwsem, nesting);
list_for_each_entry(child_priv, &priv->child_intfs, list) {
matches += ipoib_match_gid_pkey_addr(child_priv, gid,
pkey_index, addr,
nesting + 1,
found_net_dev);
if (matches > 1)
break;
}
up_read(&priv->vlan_rwsem);
return matches;
}
/* Returns the number of matching net_devs found (between 0 and 2). Also
* return the matching net_device in the @net_dev parameter, holding a
* reference to the net_device, if the number of matches >= 1 */
static int __ipoib_get_net_dev_by_params(struct list_head *dev_list, u8 port,
u16 pkey_index,
const union ib_gid *gid,
const struct sockaddr *addr,
struct net_device **net_dev)
{
struct ipoib_dev_priv *priv;
int matches = 0;
*net_dev = NULL;
list_for_each_entry(priv, dev_list, list) {
if (priv->port != port)
continue;
matches += ipoib_match_gid_pkey_addr(priv, gid, pkey_index,
addr, 0, net_dev);
if (matches > 1)
break;
}
return matches;
}
static struct net_device *ipoib_get_net_dev_by_params(
struct ib_device *dev, u8 port, u16 pkey,
const union ib_gid *gid, const struct sockaddr *addr,
void *client_data)
{
struct net_device *net_dev;
struct list_head *dev_list = client_data;
u16 pkey_index;
int matches;
int ret;
if (!rdma_protocol_ib(dev, port))
return NULL;
ret = ib_find_cached_pkey(dev, port, pkey, &pkey_index);
if (ret)
return NULL;
if (!dev_list)
return NULL;
/* See if we can find a unique device matching the L2 parameters */
matches = __ipoib_get_net_dev_by_params(dev_list, port, pkey_index,
gid, NULL, &net_dev);
switch (matches) {
case 0:
return NULL;
case 1:
return net_dev;
}
dev_put(net_dev);
/* Couldn't find a unique device with L2 parameters only. Use L3
* address to uniquely match the net device */
matches = __ipoib_get_net_dev_by_params(dev_list, port, pkey_index,
gid, addr, &net_dev);
switch (matches) {
case 0:
return NULL;
default:
dev_warn_ratelimited(&dev->dev,
"duplicate IP address detected\n");
/* Fall through */
case 1:
return net_dev;
}
}
int ipoib_set_mode(struct net_device *dev, const char *buf)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
@ -1715,12 +1942,11 @@ static void ipoib_add_one(struct ib_device *device)
ib_set_client_data(device, &ipoib_client, dev_list);
}
static void ipoib_remove_one(struct ib_device *device)
static void ipoib_remove_one(struct ib_device *device, void *client_data)
{
struct ipoib_dev_priv *priv, *tmp;
struct list_head *dev_list;
struct list_head *dev_list = client_data;
dev_list = ib_get_client_data(device, &ipoib_client);
if (!dev_list)
return;

View File

@ -393,8 +393,13 @@ static int ipoib_mcast_join_complete(int status,
goto out_locked;
}
} else {
if (mcast->logcount++ < 20) {
if (status == -ETIMEDOUT || status == -EAGAIN) {
bool silent_fail =
test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
status == -EINVAL;
if (mcast->logcount < 20) {
if (status == -ETIMEDOUT || status == -EAGAIN ||
silent_fail) {
ipoib_dbg_mcast(priv, "%smulticast join failed for %pI6, status %d\n",
test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
mcast->mcmember.mgid.raw, status);
@ -403,6 +408,9 @@ static int ipoib_mcast_join_complete(int status,
test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
mcast->mcmember.mgid.raw, status);
}
if (!silent_fail)
mcast->logcount++;
}
if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
@ -448,8 +456,7 @@ static int ipoib_mcast_join_complete(int status,
return status;
}
static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
int create)
static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ib_sa_multicast *multicast;
@ -471,7 +478,14 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
IB_SA_MCMEMBER_REC_PKEY |
IB_SA_MCMEMBER_REC_JOIN_STATE;
if (create) {
if (mcast != priv->broadcast) {
/*
* RFC 4391:
* The MGID MUST use the same P_Key, Q_Key, SL, MTU,
* and HopLimit as those used in the broadcast-GID. The rest
* of attributes SHOULD follow the values used in the
* broadcast-GID as well.
*/
comp_mask |=
IB_SA_MCMEMBER_REC_QKEY |
IB_SA_MCMEMBER_REC_MTU_SELECTOR |
@ -492,6 +506,22 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
rec.sl = priv->broadcast->mcmember.sl;
rec.flow_label = priv->broadcast->mcmember.flow_label;
rec.hop_limit = priv->broadcast->mcmember.hop_limit;
/*
* Historically Linux IPoIB has never properly supported SEND
* ONLY join. It emulated it by not providing all the required
* attributes, which is enough to prevent group creation and
* detect if there are full members or not. A major problem
* with supporting SEND ONLY is detecting when the group is
* auto-destroyed as IPoIB will cache the MLID..
*/
#if 1
if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
comp_mask &= ~IB_SA_MCMEMBER_REC_TRAFFIC_CLASS;
#else
if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
rec.join_state = 4;
#endif
}
multicast = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
@ -517,7 +547,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
struct ib_port_attr port_attr;
unsigned long delay_until = 0;
struct ipoib_mcast *mcast = NULL;
int create = 1;
if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags))
return;
@ -566,7 +595,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) {
mcast = priv->broadcast;
create = 0;
if (mcast->backoff > 1 &&
time_before(jiffies, mcast->delay_until)) {
delay_until = mcast->delay_until;
@ -590,12 +618,8 @@ void ipoib_mcast_join_task(struct work_struct *work)
/* Found the next unjoined group */
init_completion(&mcast->done);
set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
create = 0;
else
create = 1;
spin_unlock_irq(&priv->lock);
ipoib_mcast_join(dev, mcast, create);
ipoib_mcast_join(dev, mcast);
spin_lock_irq(&priv->lock);
} else if (!delay_until ||
time_before(mcast->delay_until, delay_until))
@ -618,7 +642,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
}
spin_unlock_irq(&priv->lock);
if (mcast)
ipoib_mcast_join(dev, mcast, create);
ipoib_mcast_join(dev, mcast);
}
int ipoib_mcast_start_thread(struct net_device *dev)

View File

@ -152,12 +152,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
return -ENODEV;
}
priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(priv->mr)) {
printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name);
goto out_free_pd;
}
/*
* the various IPoIB tasks assume they will never race against
* themselves, so always use a single thread workqueue
@ -165,7 +159,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
priv->wq = create_singlethread_workqueue("ipoib_wq");
if (!priv->wq) {
printk(KERN_WARNING "ipoib: failed to allocate device WQ\n");
goto out_free_mr;
goto out_free_pd;
}
size = ipoib_recvq_size + 1;
@ -225,13 +219,13 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
priv->dev->dev_addr[3] = (priv->qp->qp_num ) & 0xff;
for (i = 0; i < MAX_SKB_FRAGS + 1; ++i)
priv->tx_sge[i].lkey = priv->mr->lkey;
priv->tx_sge[i].lkey = priv->pd->local_dma_lkey;
priv->tx_wr.opcode = IB_WR_SEND;
priv->tx_wr.sg_list = priv->tx_sge;
priv->tx_wr.send_flags = IB_SEND_SIGNALED;
priv->rx_sge[0].lkey = priv->mr->lkey;
priv->rx_sge[0].lkey = priv->pd->local_dma_lkey;
priv->rx_sge[0].length = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu);
priv->rx_wr.num_sge = 1;
@ -254,9 +248,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
destroy_workqueue(priv->wq);
priv->wq = NULL;
out_free_mr:
ib_dereg_mr(priv->mr);
out_free_pd:
ib_dealloc_pd(priv->pd);
@ -289,12 +280,7 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
priv->wq = NULL;
}
if (ib_dereg_mr(priv->mr))
ipoib_warn(priv, "ib_dereg_mr failed\n");
if (ib_dealloc_pd(priv->pd))
ipoib_warn(priv, "ib_dealloc_pd failed\n");
ib_dealloc_pd(priv->pd);
}
void ipoib_event(struct ib_event_handler *handler,

View File

@ -74,34 +74,37 @@
#include "iscsi_iser.h"
static struct scsi_host_template iscsi_iser_sht;
static struct iscsi_transport iscsi_iser_transport;
static struct scsi_transport_template *iscsi_iser_scsi_transport;
static unsigned int iscsi_max_lun = 512;
module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
int iser_debug_level = 0;
bool iser_pi_enable = false;
int iser_pi_guard = 1;
MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz");
MODULE_VERSION(DRV_VER);
module_param_named(debug_level, iser_debug_level, int, 0644);
MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)");
module_param_named(pi_enable, iser_pi_enable, bool, 0644);
MODULE_PARM_DESC(pi_enable, "Enable T10-PI offload support (default:disabled)");
module_param_named(pi_guard, iser_pi_guard, int, 0644);
MODULE_PARM_DESC(pi_guard, "T10-PI guard_type [deprecated]");
static struct scsi_host_template iscsi_iser_sht;
static struct iscsi_transport iscsi_iser_transport;
static struct scsi_transport_template *iscsi_iser_scsi_transport;
static struct workqueue_struct *release_wq;
struct iser_global ig;
int iser_debug_level = 0;
module_param_named(debug_level, iser_debug_level, int, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)");
static unsigned int iscsi_max_lun = 512;
module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
MODULE_PARM_DESC(max_lun, "Max LUNs to allow per session (default:512");
unsigned int iser_max_sectors = ISER_DEF_MAX_SECTORS;
module_param_named(max_sectors, iser_max_sectors, uint, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(max_sectors, "Max number of sectors in a single scsi command (default:1024");
bool iser_pi_enable = false;
module_param_named(pi_enable, iser_pi_enable, bool, S_IRUGO);
MODULE_PARM_DESC(pi_enable, "Enable T10-PI offload support (default:disabled)");
int iser_pi_guard;
module_param_named(pi_guard, iser_pi_guard, int, S_IRUGO);
MODULE_PARM_DESC(pi_guard, "T10-PI guard_type [deprecated]");
/*
* iscsi_iser_recv() - Process a successfull recv completion
* @conn: iscsi connection
@ -201,10 +204,12 @@ iser_initialize_task_headers(struct iscsi_task *task,
goto out;
}
tx_desc->wr_idx = 0;
tx_desc->mapped = true;
tx_desc->dma_addr = dma_addr;
tx_desc->tx_sg[0].addr = tx_desc->dma_addr;
tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
tx_desc->tx_sg[0].lkey = device->mr->lkey;
tx_desc->tx_sg[0].lkey = device->pd->local_dma_lkey;
iser_task->iser_conn = iser_conn;
out:
@ -360,16 +365,19 @@ iscsi_iser_task_xmit(struct iscsi_task *task)
static void iscsi_iser_cleanup_task(struct iscsi_task *task)
{
struct iscsi_iser_task *iser_task = task->dd_data;
struct iser_tx_desc *tx_desc = &iser_task->desc;
struct iser_conn *iser_conn = task->conn->dd_data;
struct iser_tx_desc *tx_desc = &iser_task->desc;
struct iser_conn *iser_conn = task->conn->dd_data;
struct iser_device *device = iser_conn->ib_conn.device;
/* DEVICE_REMOVAL event might have already released the device */
if (!device)
return;
ib_dma_unmap_single(device->ib_device,
tx_desc->dma_addr, ISER_HEADERS_LEN, DMA_TO_DEVICE);
if (likely(tx_desc->mapped)) {
ib_dma_unmap_single(device->ib_device, tx_desc->dma_addr,
ISER_HEADERS_LEN, DMA_TO_DEVICE);
tx_desc->mapped = false;
}
/* mgmt tasks do not need special cleanup */
if (!task->sc)
@ -622,6 +630,8 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
if (ep) {
iser_conn = ep->dd_data;
max_cmds = iser_conn->max_cmds;
shost->sg_tablesize = iser_conn->scsi_sg_tablesize;
shost->max_sectors = iser_conn->scsi_max_sectors;
mutex_lock(&iser_conn->state_mutex);
if (iser_conn->state != ISER_CONN_UP) {
@ -640,6 +650,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
SHOST_DIX_GUARD_CRC);
}
/*
* Limit the sg_tablesize and max_sectors based on the device
* max fastreg page list length.
*/
shost->sg_tablesize = min_t(unsigned short, shost->sg_tablesize,
ib_conn->device->dev_attr.max_fast_reg_page_list_len);
shost->max_sectors = min_t(unsigned int,
1024, (shost->sg_tablesize * PAGE_SIZE) >> 9);
if (iscsi_host_add(shost,
ib_conn->device->ib_device->dma_device)) {
mutex_unlock(&iser_conn->state_mutex);
@ -742,15 +761,9 @@ iscsi_iser_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *s
stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */
stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt;
stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt;
stats->custom_length = 4;
strcpy(stats->custom[0].desc, "qp_tx_queue_full");
stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */
strcpy(stats->custom[1].desc, "fmr_map_not_avail");
stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */;
strcpy(stats->custom[2].desc, "eh_abort_cnt");
stats->custom[2].value = conn->eh_abort_cnt;
strcpy(stats->custom[3].desc, "fmr_unalign_cnt");
stats->custom[3].value = conn->fmr_unalign_cnt;
stats->custom_length = 1;
strcpy(stats->custom[0].desc, "fmr_unalign_cnt");
stats->custom[0].value = conn->fmr_unalign_cnt;
}
static int iscsi_iser_get_ep_param(struct iscsi_endpoint *ep,
@ -839,10 +852,9 @@ iscsi_iser_ep_connect(struct Scsi_Host *shost, struct sockaddr *dst_addr,
static int
iscsi_iser_ep_poll(struct iscsi_endpoint *ep, int timeout_ms)
{
struct iser_conn *iser_conn;
struct iser_conn *iser_conn = ep->dd_data;
int rc;
iser_conn = ep->dd_data;
rc = wait_for_completion_interruptible_timeout(&iser_conn->up_completion,
msecs_to_jiffies(timeout_ms));
/* if conn establishment failed, return error code to iscsi */
@ -854,7 +866,7 @@ iscsi_iser_ep_poll(struct iscsi_endpoint *ep, int timeout_ms)
mutex_unlock(&iser_conn->state_mutex);
}
iser_info("ib conn %p rc = %d\n", iser_conn, rc);
iser_info("iser conn %p rc = %d\n", iser_conn, rc);
if (rc > 0)
return 1; /* success, this is the equivalent of POLLOUT */
@ -876,11 +888,9 @@ iscsi_iser_ep_poll(struct iscsi_endpoint *ep, int timeout_ms)
static void
iscsi_iser_ep_disconnect(struct iscsi_endpoint *ep)
{
struct iser_conn *iser_conn;
struct iser_conn *iser_conn = ep->dd_data;
iser_conn = ep->dd_data;
iser_info("ep %p iser conn %p state %d\n",
ep, iser_conn, iser_conn->state);
iser_info("ep %p iser conn %p\n", ep, iser_conn);
mutex_lock(&iser_conn->state_mutex);
iser_conn_terminate(iser_conn);
@ -900,6 +910,7 @@ iscsi_iser_ep_disconnect(struct iscsi_endpoint *ep)
mutex_unlock(&iser_conn->state_mutex);
iser_conn_release(iser_conn);
}
iscsi_destroy_endpoint(ep);
}
@ -962,8 +973,8 @@ static struct scsi_host_template iscsi_iser_sht = {
.name = "iSCSI Initiator over iSER",
.queuecommand = iscsi_queuecommand,
.change_queue_depth = scsi_change_queue_depth,
.sg_tablesize = ISCSI_ISER_SG_TABLESIZE,
.max_sectors = 1024,
.sg_tablesize = ISCSI_ISER_DEF_SG_TABLESIZE,
.max_sectors = ISER_DEF_MAX_SECTORS,
.cmd_per_lun = ISER_DEF_CMD_PER_LUN,
.eh_abort_handler = iscsi_eh_abort,
.eh_device_reset_handler= iscsi_eh_device_reset,
@ -1074,7 +1085,7 @@ static void __exit iser_exit(void)
if (!connlist_empty) {
iser_err("Error cleanup stage completed but we still have iser "
"connections, destroying them anyway.\n");
"connections, destroying them anyway\n");
list_for_each_entry_safe(iser_conn, n, &ig.connlist,
conn_list) {
iser_conn_release(iser_conn);

View File

@ -98,8 +98,13 @@
#define SHIFT_4K 12
#define SIZE_4K (1ULL << SHIFT_4K)
#define MASK_4K (~(SIZE_4K-1))
/* support up to 512KB in one RDMA */
#define ISCSI_ISER_SG_TABLESIZE (0x80000 >> SHIFT_4K)
/* Default support is 512KB I/O size */
#define ISER_DEF_MAX_SECTORS 1024
#define ISCSI_ISER_DEF_SG_TABLESIZE ((ISER_DEF_MAX_SECTORS * 512) >> SHIFT_4K)
/* Maximum support is 8MB I/O size */
#define ISCSI_ISER_MAX_SG_TABLESIZE ((16384 * 512) >> SHIFT_4K)
#define ISER_DEF_XMIT_CMDS_DEFAULT 512
#if ISCSI_DEF_XMIT_CMDS_MAX > ISER_DEF_XMIT_CMDS_DEFAULT
#define ISER_DEF_XMIT_CMDS_MAX ISCSI_DEF_XMIT_CMDS_MAX
@ -239,6 +244,7 @@ struct iser_data_buf {
struct iser_device;
struct iscsi_iser_task;
struct iscsi_endpoint;
struct iser_reg_resources;
/**
* struct iser_mem_reg - iSER memory registration info
@ -259,6 +265,14 @@ enum iser_desc_type {
ISCSI_TX_DATAOUT
};
/* Maximum number of work requests per task:
* Data memory region local invalidate + fast registration
* Protection memory region local invalidate + fast registration
* Signature memory region local invalidate + fast registration
* PDU send
*/
#define ISER_MAX_WRS 7
/**
* struct iser_tx_desc - iSER TX descriptor (for send wr_id)
*
@ -270,6 +284,12 @@ enum iser_desc_type {
* sg[1] optionally points to either of immediate data
* unsolicited data-out or control
* @num_sge: number sges used on this TX task
* @mapped: Is the task header mapped
* @wr_idx: Current WR index
* @wrs: Array of WRs per task
* @data_reg: Data buffer registration details
* @prot_reg: Protection buffer registration details
* @sig_attrs: Signature attributes
*/
struct iser_tx_desc {
struct iser_hdr iser_header;
@ -278,6 +298,12 @@ struct iser_tx_desc {
u64 dma_addr;
struct ib_sge tx_sg[2];
int num_sge;
bool mapped;
u8 wr_idx;
struct ib_send_wr wrs[ISER_MAX_WRS];
struct iser_mem_reg data_reg;
struct iser_mem_reg prot_reg;
struct ib_sig_attrs sig_attrs;
};
#define ISER_RX_PAD_SIZE (256 - (ISER_RX_PAYLOAD_SIZE + \
@ -323,6 +349,33 @@ struct iser_comp {
int active_qps;
};
/**
* struct iser_device - Memory registration operations
* per-device registration schemes
*
* @alloc_reg_res: Allocate registration resources
* @free_reg_res: Free registration resources
* @fast_reg_mem: Register memory buffers
* @unreg_mem: Un-register memory buffers
* @reg_desc_get: Get a registration descriptor for pool
* @reg_desc_put: Get a registration descriptor to pool
*/
struct iser_reg_ops {
int (*alloc_reg_res)(struct ib_conn *ib_conn,
unsigned cmds_max,
unsigned int size);
void (*free_reg_res)(struct ib_conn *ib_conn);
int (*reg_mem)(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
struct iser_reg_resources *rsc,
struct iser_mem_reg *reg);
void (*unreg_mem)(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir);
struct iser_fr_desc * (*reg_desc_get)(struct ib_conn *ib_conn);
void (*reg_desc_put)(struct ib_conn *ib_conn,
struct iser_fr_desc *desc);
};
/**
* struct iser_device - iSER device handle
*
@ -336,11 +389,7 @@ struct iser_comp {
* @comps_used: Number of completion contexts used, Min between online
* cpus and device max completion vectors
* @comps: Dinamically allocated array of completion handlers
* Memory registration pool Function pointers (FMR or Fastreg):
* @iser_alloc_rdma_reg_res: Allocation of memory regions pool
* @iser_free_rdma_reg_res: Free of memory regions pool
* @iser_reg_rdma_mem: Memory registration routine
* @iser_unreg_rdma_mem: Memory deregistration routine
* @reg_ops: Registration ops
*/
struct iser_device {
struct ib_device *ib_device;
@ -352,54 +401,73 @@ struct iser_device {
int refcount;
int comps_used;
struct iser_comp *comps;
int (*iser_alloc_rdma_reg_res)(struct ib_conn *ib_conn,
unsigned cmds_max);
void (*iser_free_rdma_reg_res)(struct ib_conn *ib_conn);
int (*iser_reg_rdma_mem)(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir);
void (*iser_unreg_rdma_mem)(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir);
struct iser_reg_ops *reg_ops;
};
#define ISER_CHECK_GUARD 0xc0
#define ISER_CHECK_REFTAG 0x0f
#define ISER_CHECK_APPTAG 0x30
enum iser_reg_indicator {
ISER_DATA_KEY_VALID = 1 << 0,
ISER_PROT_KEY_VALID = 1 << 1,
ISER_SIG_KEY_VALID = 1 << 2,
ISER_FASTREG_PROTECTED = 1 << 3,
/**
* struct iser_reg_resources - Fast registration recources
*
* @mr: memory region
* @fmr_pool: pool of fmrs
* @frpl: fast reg page list used by frwrs
* @page_vec: fast reg page list used by fmr pool
* @mr_valid: is mr valid indicator
*/
struct iser_reg_resources {
union {
struct ib_mr *mr;
struct ib_fmr_pool *fmr_pool;
};
union {
struct ib_fast_reg_page_list *frpl;
struct iser_page_vec *page_vec;
};
u8 mr_valid:1;
};
/**
* struct iser_pi_context - Protection information context
*
* @prot_mr: protection memory region
* @prot_frpl: protection fastreg page list
* @sig_mr: signature feature enabled memory region
* @rsc: protection buffer registration resources
* @sig_mr: signature enable memory region
* @sig_mr_valid: is sig_mr valid indicator
* @sig_protected: is region protected indicator
*/
struct iser_pi_context {
struct ib_mr *prot_mr;
struct ib_fast_reg_page_list *prot_frpl;
struct iser_reg_resources rsc;
struct ib_mr *sig_mr;
u8 sig_mr_valid:1;
u8 sig_protected:1;
};
/**
* struct fast_reg_descriptor - Fast registration descriptor
* struct iser_fr_desc - Fast registration descriptor
*
* @list: entry in connection fastreg pool
* @data_mr: data memory region
* @data_frpl: data fastreg page list
* @rsc: data buffer registration resources
* @pi_ctx: protection information context
* @reg_indicators: fast registration indicators
*/
struct fast_reg_descriptor {
struct iser_fr_desc {
struct list_head list;
struct ib_mr *data_mr;
struct ib_fast_reg_page_list *data_frpl;
struct iser_reg_resources rsc;
struct iser_pi_context *pi_ctx;
u8 reg_indicators;
};
/**
* struct iser_fr_pool: connection fast registration pool
*
* @list: list of fastreg descriptors
* @lock: protects fmr/fastreg pool
* @size: size of the pool
*/
struct iser_fr_pool {
struct list_head list;
spinlock_t lock;
int size;
};
/**
@ -415,15 +483,7 @@ struct fast_reg_descriptor {
* @pi_support: Indicate device T10-PI support
* @beacon: beacon send wr to signal all flush errors were drained
* @flush_comp: completes when all connection completions consumed
* @lock: protects fmr/fastreg pool
* @union.fmr:
* @pool: FMR pool for fast registrations
* @page_vec: page vector to hold mapped commands pages
* used for registration
* @union.fastreg:
* @pool: Fast registration descriptors pool for fast
* registrations
* @pool_size: Size of pool
* @fr_pool: connection fast registration poool
*/
struct ib_conn {
struct rdma_cm_id *cma_id;
@ -436,17 +496,7 @@ struct ib_conn {
bool pi_support;
struct ib_send_wr beacon;
struct completion flush_comp;
spinlock_t lock;
union {
struct {
struct ib_fmr_pool *pool;
struct iser_page_vec *page_vec;
} fmr;
struct {
struct list_head pool;
int pool_size;
} fastreg;
};
struct iser_fr_pool fr_pool;
};
/**
@ -477,6 +527,8 @@ struct ib_conn {
* @rx_desc_head: head of rx_descs cyclic buffer
* @rx_descs: rx buffers array (cyclic buffer)
* @num_rx_descs: number of rx descriptors
* @scsi_sg_tablesize: scsi host sg_tablesize
* @scsi_max_sectors: scsi host max sectors
*/
struct iser_conn {
struct ib_conn ib_conn;
@ -501,6 +553,8 @@ struct iser_conn {
unsigned int rx_desc_head;
struct iser_rx_desc *rx_descs;
u32 num_rx_descs;
unsigned short scsi_sg_tablesize;
unsigned int scsi_max_sectors;
};
/**
@ -556,6 +610,9 @@ extern struct iser_global ig;
extern int iser_debug_level;
extern bool iser_pi_enable;
extern int iser_pi_guard;
extern unsigned int iser_max_sectors;
int iser_assign_reg_ops(struct iser_device *device);
int iser_send_control(struct iscsi_conn *conn,
struct iscsi_task *task);
@ -597,10 +654,10 @@ void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
enum iser_data_dir cmd_dir);
int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *task,
enum iser_data_dir cmd_dir);
int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *task,
enum iser_data_dir cmd_dir);
int iser_reg_rdma_mem(struct iscsi_iser_task *task,
enum iser_data_dir dir);
void iser_unreg_rdma_mem(struct iscsi_iser_task *task,
enum iser_data_dir dir);
int iser_connect(struct iser_conn *iser_conn,
struct sockaddr *src_addr,
@ -630,15 +687,40 @@ int iser_initialize_task_headers(struct iscsi_task *task,
struct iser_tx_desc *tx_desc);
int iser_alloc_rx_descriptors(struct iser_conn *iser_conn,
struct iscsi_session *session);
int iser_create_fmr_pool(struct ib_conn *ib_conn, unsigned cmds_max);
int iser_alloc_fmr_pool(struct ib_conn *ib_conn,
unsigned cmds_max,
unsigned int size);
void iser_free_fmr_pool(struct ib_conn *ib_conn);
int iser_create_fastreg_pool(struct ib_conn *ib_conn, unsigned cmds_max);
int iser_alloc_fastreg_pool(struct ib_conn *ib_conn,
unsigned cmds_max,
unsigned int size);
void iser_free_fastreg_pool(struct ib_conn *ib_conn);
u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir, sector_t *sector);
struct fast_reg_descriptor *
iser_reg_desc_get(struct ib_conn *ib_conn);
struct iser_fr_desc *
iser_reg_desc_get_fr(struct ib_conn *ib_conn);
void
iser_reg_desc_put(struct ib_conn *ib_conn,
struct fast_reg_descriptor *desc);
iser_reg_desc_put_fr(struct ib_conn *ib_conn,
struct iser_fr_desc *desc);
struct iser_fr_desc *
iser_reg_desc_get_fmr(struct ib_conn *ib_conn);
void
iser_reg_desc_put_fmr(struct ib_conn *ib_conn,
struct iser_fr_desc *desc);
static inline struct ib_send_wr *
iser_tx_next_wr(struct iser_tx_desc *tx_desc)
{
struct ib_send_wr *cur_wr = &tx_desc->wrs[tx_desc->wr_idx];
struct ib_send_wr *last_wr;
if (tx_desc->wr_idx) {
last_wr = &tx_desc->wrs[tx_desc->wr_idx - 1];
last_wr->next = cur_wr;
}
tx_desc->wr_idx++;
return cur_wr;
}
#endif

View File

@ -49,7 +49,6 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
{
struct iscsi_iser_task *iser_task = task->dd_data;
struct iser_device *device = iser_task->iser_conn->ib_conn.device;
struct iser_mem_reg *mem_reg;
int err;
struct iser_hdr *hdr = &iser_task->desc.iser_header;
@ -73,7 +72,7 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
return err;
}
err = device->iser_reg_rdma_mem(iser_task, ISER_DIR_IN);
err = iser_reg_rdma_mem(iser_task, ISER_DIR_IN);
if (err) {
iser_err("Failed to set up Data-IN RDMA\n");
return err;
@ -103,7 +102,6 @@ iser_prepare_write_cmd(struct iscsi_task *task,
unsigned int edtl)
{
struct iscsi_iser_task *iser_task = task->dd_data;
struct iser_device *device = iser_task->iser_conn->ib_conn.device;
struct iser_mem_reg *mem_reg;
int err;
struct iser_hdr *hdr = &iser_task->desc.iser_header;
@ -128,7 +126,7 @@ iser_prepare_write_cmd(struct iscsi_task *task,
return err;
}
err = device->iser_reg_rdma_mem(iser_task, ISER_DIR_OUT);
err = iser_reg_rdma_mem(iser_task, ISER_DIR_OUT);
if (err != 0) {
iser_err("Failed to register write cmd RDMA mem\n");
return err;
@ -170,13 +168,7 @@ static void iser_create_send_desc(struct iser_conn *iser_conn,
memset(&tx_desc->iser_header, 0, sizeof(struct iser_hdr));
tx_desc->iser_header.flags = ISER_VER;
tx_desc->num_sge = 1;
if (tx_desc->tx_sg[0].lkey != device->mr->lkey) {
tx_desc->tx_sg[0].lkey = device->mr->lkey;
iser_dbg("sdesc %p lkey mismatch, fixing\n", tx_desc);
}
}
static void iser_free_login_buf(struct iser_conn *iser_conn)
@ -266,7 +258,8 @@ int iser_alloc_rx_descriptors(struct iser_conn *iser_conn,
iser_conn->qp_max_recv_dtos_mask = session->cmds_max - 1; /* cmds_max is 2^N */
iser_conn->min_posted_rx = iser_conn->qp_max_recv_dtos >> 2;
if (device->iser_alloc_rdma_reg_res(ib_conn, session->scsi_cmds_max))
if (device->reg_ops->alloc_reg_res(ib_conn, session->scsi_cmds_max,
iser_conn->scsi_sg_tablesize))
goto create_rdma_reg_res_failed;
if (iser_alloc_login_buf(iser_conn))
@ -291,7 +284,7 @@ int iser_alloc_rx_descriptors(struct iser_conn *iser_conn,
rx_sg = &rx_desc->rx_sg;
rx_sg->addr = rx_desc->dma_addr;
rx_sg->length = ISER_RX_PAYLOAD_SIZE;
rx_sg->lkey = device->mr->lkey;
rx_sg->lkey = device->pd->local_dma_lkey;
}
iser_conn->rx_desc_head = 0;
@ -307,7 +300,7 @@ int iser_alloc_rx_descriptors(struct iser_conn *iser_conn,
rx_desc_alloc_fail:
iser_free_login_buf(iser_conn);
alloc_login_buf_fail:
device->iser_free_rdma_reg_res(ib_conn);
device->reg_ops->free_reg_res(ib_conn);
create_rdma_reg_res_failed:
iser_err("failed allocating rx descriptors / data buffers\n");
return -ENOMEM;
@ -320,8 +313,8 @@ void iser_free_rx_descriptors(struct iser_conn *iser_conn)
struct ib_conn *ib_conn = &iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
if (device->iser_free_rdma_reg_res)
device->iser_free_rdma_reg_res(ib_conn);
if (device->reg_ops->free_reg_res)
device->reg_ops->free_reg_res(ib_conn);
rx_desc = iser_conn->rx_descs;
for (i = 0; i < iser_conn->qp_max_recv_dtos; i++, rx_desc++)
@ -454,7 +447,7 @@ int iser_send_data_out(struct iscsi_conn *conn,
unsigned long buf_offset;
unsigned long data_seg_len;
uint32_t itt;
int err = 0;
int err;
struct ib_sge *tx_dsg;
itt = (__force uint32_t)hdr->itt;
@ -475,7 +468,9 @@ int iser_send_data_out(struct iscsi_conn *conn,
memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr));
/* build the tx desc */
iser_initialize_task_headers(task, tx_desc);
err = iser_initialize_task_headers(task, tx_desc);
if (err)
goto send_data_out_error;
mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT];
tx_dsg = &tx_desc->tx_sg[1];
@ -502,7 +497,7 @@ int iser_send_data_out(struct iscsi_conn *conn,
send_data_out_error:
kmem_cache_free(ig.desc_cache, tx_desc);
iser_err("conn %p failed err %d\n",conn, err);
iser_err("conn %p failed err %d\n", conn, err);
return err;
}
@ -543,7 +538,7 @@ int iser_send_control(struct iscsi_conn *conn,
tx_dsg->addr = iser_conn->login_req_dma;
tx_dsg->length = task->data_count;
tx_dsg->lkey = device->mr->lkey;
tx_dsg->lkey = device->pd->local_dma_lkey;
mdesc->num_sge = 2;
}
@ -666,7 +661,6 @@ void iser_task_rdma_init(struct iscsi_iser_task *iser_task)
void iser_task_rdma_finalize(struct iscsi_iser_task *iser_task)
{
struct iser_device *device = iser_task->iser_conn->ib_conn.device;
int is_rdma_data_aligned = 1;
int is_rdma_prot_aligned = 1;
int prot_count = scsi_prot_sg_count(iser_task->sc);
@ -703,7 +697,7 @@ void iser_task_rdma_finalize(struct iscsi_iser_task *iser_task)
}
if (iser_task->dir[ISER_DIR_IN]) {
device->iser_unreg_rdma_mem(iser_task, ISER_DIR_IN);
iser_unreg_rdma_mem(iser_task, ISER_DIR_IN);
if (is_rdma_data_aligned)
iser_dma_unmap_task_data(iser_task,
&iser_task->data[ISER_DIR_IN],
@ -715,7 +709,7 @@ void iser_task_rdma_finalize(struct iscsi_iser_task *iser_task)
}
if (iser_task->dir[ISER_DIR_OUT]) {
device->iser_unreg_rdma_mem(iser_task, ISER_DIR_OUT);
iser_unreg_rdma_mem(iser_task, ISER_DIR_OUT);
if (is_rdma_data_aligned)
iser_dma_unmap_task_data(iser_task,
&iser_task->data[ISER_DIR_OUT],

View File

@ -38,6 +38,55 @@
#include <linux/scatterlist.h>
#include "iscsi_iser.h"
static
int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
struct iser_reg_resources *rsc,
struct iser_mem_reg *mem_reg);
static
int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
struct iser_reg_resources *rsc,
struct iser_mem_reg *mem_reg);
static struct iser_reg_ops fastreg_ops = {
.alloc_reg_res = iser_alloc_fastreg_pool,
.free_reg_res = iser_free_fastreg_pool,
.reg_mem = iser_fast_reg_mr,
.unreg_mem = iser_unreg_mem_fastreg,
.reg_desc_get = iser_reg_desc_get_fr,
.reg_desc_put = iser_reg_desc_put_fr,
};
static struct iser_reg_ops fmr_ops = {
.alloc_reg_res = iser_alloc_fmr_pool,
.free_reg_res = iser_free_fmr_pool,
.reg_mem = iser_fast_reg_fmr,
.unreg_mem = iser_unreg_mem_fmr,
.reg_desc_get = iser_reg_desc_get_fmr,
.reg_desc_put = iser_reg_desc_put_fmr,
};
int iser_assign_reg_ops(struct iser_device *device)
{
struct ib_device_attr *dev_attr = &device->dev_attr;
/* Assign function handles - based on FMR support */
if (device->ib_device->alloc_fmr && device->ib_device->dealloc_fmr &&
device->ib_device->map_phys_fmr && device->ib_device->unmap_fmr) {
iser_info("FMR supported, using FMR for registration\n");
device->reg_ops = &fmr_ops;
} else
if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
iser_info("FastReg supported, using FastReg for registration\n");
device->reg_ops = &fastreg_ops;
} else {
iser_err("IB device does not support FMRs nor FastRegs, can't register memory\n");
return -1;
}
return 0;
}
static void
iser_free_bounce_sg(struct iser_data_buf *data)
@ -146,30 +195,47 @@ iser_copy_to_bounce(struct iser_data_buf *data)
iser_copy_bounce(data, true);
}
struct fast_reg_descriptor *
iser_reg_desc_get(struct ib_conn *ib_conn)
struct iser_fr_desc *
iser_reg_desc_get_fr(struct ib_conn *ib_conn)
{
struct fast_reg_descriptor *desc;
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
struct iser_fr_desc *desc;
unsigned long flags;
spin_lock_irqsave(&ib_conn->lock, flags);
desc = list_first_entry(&ib_conn->fastreg.pool,
struct fast_reg_descriptor, list);
spin_lock_irqsave(&fr_pool->lock, flags);
desc = list_first_entry(&fr_pool->list,
struct iser_fr_desc, list);
list_del(&desc->list);
spin_unlock_irqrestore(&ib_conn->lock, flags);
spin_unlock_irqrestore(&fr_pool->lock, flags);
return desc;
}
void
iser_reg_desc_put(struct ib_conn *ib_conn,
struct fast_reg_descriptor *desc)
iser_reg_desc_put_fr(struct ib_conn *ib_conn,
struct iser_fr_desc *desc)
{
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
unsigned long flags;
spin_lock_irqsave(&ib_conn->lock, flags);
list_add(&desc->list, &ib_conn->fastreg.pool);
spin_unlock_irqrestore(&ib_conn->lock, flags);
spin_lock_irqsave(&fr_pool->lock, flags);
list_add(&desc->list, &fr_pool->list);
spin_unlock_irqrestore(&fr_pool->lock, flags);
}
struct iser_fr_desc *
iser_reg_desc_get_fmr(struct ib_conn *ib_conn)
{
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
return list_first_entry(&fr_pool->list,
struct iser_fr_desc, list);
}
void
iser_reg_desc_put_fmr(struct ib_conn *ib_conn,
struct iser_fr_desc *desc)
{
}
/**
@ -297,7 +363,8 @@ static int iser_sg_to_page_vec(struct iser_data_buf *data,
* consecutive SG elements are actually fragments of the same physcial page.
*/
static int iser_data_buf_aligned_len(struct iser_data_buf *data,
struct ib_device *ibdev)
struct ib_device *ibdev,
unsigned sg_tablesize)
{
struct scatterlist *sg, *sgl, *next_sg = NULL;
u64 start_addr, end_addr;
@ -309,6 +376,14 @@ static int iser_data_buf_aligned_len(struct iser_data_buf *data,
sgl = data->sg;
start_addr = ib_sg_dma_address(ibdev, sgl);
if (unlikely(sgl[0].offset &&
data->data_len >= sg_tablesize * PAGE_SIZE)) {
iser_dbg("can't register length %lx with offset %x "
"fall to bounce buffer\n", data->data_len,
sgl[0].offset);
return 0;
}
for_each_sg(sgl, sg, data->dma_nents, i) {
if (start_check && !IS_4K_ALIGNED(start_addr))
break;
@ -330,8 +405,11 @@ static int iser_data_buf_aligned_len(struct iser_data_buf *data,
break;
}
ret_len = (next_sg) ? i : i+1;
iser_dbg("Found %d aligned entries out of %d in sg:0x%p\n",
ret_len, data->dma_nents, data);
if (unlikely(ret_len != data->dma_nents))
iser_warn("rdma alignment violation (%d/%d aligned)\n",
ret_len, data->dma_nents);
return ret_len;
}
@ -393,7 +471,7 @@ iser_reg_dma(struct iser_device *device, struct iser_data_buf *mem,
{
struct scatterlist *sg = mem->sg;
reg->sge.lkey = device->mr->lkey;
reg->sge.lkey = device->pd->local_dma_lkey;
reg->rkey = device->mr->rkey;
reg->sge.addr = ib_sg_dma_address(device->ib_device, &sg[0]);
reg->sge.length = ib_sg_dma_len(device->ib_device, &sg[0]);
@ -407,15 +485,12 @@ iser_reg_dma(struct iser_device *device, struct iser_data_buf *mem,
static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
enum iser_data_dir cmd_dir,
int aligned_len)
enum iser_data_dir cmd_dir)
{
struct iscsi_conn *iscsi_conn = iser_task->iser_conn->iscsi_conn;
struct iser_device *device = iser_task->iser_conn->ib_conn.device;
iscsi_conn->fmr_unalign_cnt++;
iser_warn("rdma alignment violation (%d/%d aligned) or FMR not supported\n",
aligned_len, mem->size);
if (iser_debug_level > 0)
iser_data_buf_dump(mem, device->ib_device);
@ -439,13 +514,15 @@ static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task,
* returns: 0 on success, errno code on failure
*/
static
int iser_reg_page_vec(struct iscsi_iser_task *iser_task,
int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
struct iser_page_vec *page_vec,
struct iser_mem_reg *mem_reg)
struct iser_reg_resources *rsc,
struct iser_mem_reg *reg)
{
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
struct iser_page_vec *page_vec = rsc->page_vec;
struct ib_fmr_pool *fmr_pool = rsc->fmr_pool;
struct ib_pool_fmr *fmr;
int ret, plen;
@ -461,7 +538,7 @@ int iser_reg_page_vec(struct iscsi_iser_task *iser_task,
return -EINVAL;
}
fmr = ib_fmr_pool_map_phys(ib_conn->fmr.pool,
fmr = ib_fmr_pool_map_phys(fmr_pool,
page_vec->pages,
page_vec->length,
page_vec->pages[0]);
@ -471,11 +548,15 @@ int iser_reg_page_vec(struct iscsi_iser_task *iser_task,
return ret;
}
mem_reg->sge.lkey = fmr->fmr->lkey;
mem_reg->rkey = fmr->fmr->rkey;
mem_reg->sge.addr = page_vec->pages[0] + page_vec->offset;
mem_reg->sge.length = page_vec->data_size;
mem_reg->mem_h = fmr;
reg->sge.lkey = fmr->fmr->lkey;
reg->rkey = fmr->fmr->rkey;
reg->sge.addr = page_vec->pages[0] + page_vec->offset;
reg->sge.length = page_vec->data_size;
reg->mem_h = fmr;
iser_dbg("fmr reg: lkey=0x%x, rkey=0x%x, addr=0x%llx,"
" length=0x%x\n", reg->sge.lkey, reg->rkey,
reg->sge.addr, reg->sge.length);
return 0;
}
@ -505,71 +586,17 @@ void iser_unreg_mem_fmr(struct iscsi_iser_task *iser_task,
void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir)
{
struct iser_device *device = iser_task->iser_conn->ib_conn.device;
struct iser_mem_reg *reg = &iser_task->rdma_reg[cmd_dir];
if (!reg->mem_h)
return;
iser_reg_desc_put(&iser_task->iser_conn->ib_conn,
reg->mem_h);
device->reg_ops->reg_desc_put(&iser_task->iser_conn->ib_conn,
reg->mem_h);
reg->mem_h = NULL;
}
/**
* iser_reg_rdma_mem_fmr - Registers memory intended for RDMA,
* using FMR (if possible) obtaining rkey and va
*
* returns 0 on success, errno code on failure
*/
int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir)
{
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
struct ib_device *ibdev = device->ib_device;
struct iser_data_buf *mem = &iser_task->data[cmd_dir];
struct iser_mem_reg *mem_reg;
int aligned_len;
int err;
int i;
mem_reg = &iser_task->rdma_reg[cmd_dir];
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
err = fall_to_bounce_buf(iser_task, mem,
cmd_dir, aligned_len);
if (err) {
iser_err("failed to allocate bounce buffer\n");
return err;
}
}
/* if there a single dma entry, FMR is not needed */
if (mem->dma_nents == 1) {
return iser_reg_dma(device, mem, mem_reg);
} else { /* use FMR for multiple dma entries */
err = iser_reg_page_vec(iser_task, mem, ib_conn->fmr.page_vec,
mem_reg);
if (err && err != -EAGAIN) {
iser_data_buf_dump(mem, ibdev);
iser_err("mem->dma_nents = %d (dlength = 0x%x)\n",
mem->dma_nents,
ntoh24(iser_task->desc.iscsi_header.dlength));
iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n",
ib_conn->fmr.page_vec->data_size,
ib_conn->fmr.page_vec->length,
ib_conn->fmr.page_vec->offset);
for (i = 0; i < ib_conn->fmr.page_vec->length; i++)
iser_err("page_vec[%d] = 0x%llx\n", i,
(unsigned long long)ib_conn->fmr.page_vec->pages[i]);
}
if (err)
return err;
}
return 0;
}
static void
iser_set_dif_domain(struct scsi_cmnd *sc, struct ib_sig_attrs *sig_attrs,
struct ib_sig_domain *domain)
@ -637,10 +664,11 @@ iser_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
{
u32 rkey;
memset(inv_wr, 0, sizeof(*inv_wr));
inv_wr->opcode = IB_WR_LOCAL_INV;
inv_wr->wr_id = ISER_FASTREG_LI_WRID;
inv_wr->ex.invalidate_rkey = mr->rkey;
inv_wr->send_flags = 0;
inv_wr->num_sge = 0;
rkey = ib_inc_rkey(mr->rkey);
ib_update_fast_reg_key(mr, rkey);
@ -648,61 +676,51 @@ iser_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
static int
iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
struct fast_reg_descriptor *desc,
struct iser_pi_context *pi_ctx,
struct iser_mem_reg *data_reg,
struct iser_mem_reg *prot_reg,
struct iser_mem_reg *sig_reg)
{
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_pi_context *pi_ctx = desc->pi_ctx;
struct ib_send_wr sig_wr, inv_wr;
struct ib_send_wr *bad_wr, *wr = NULL;
struct ib_sig_attrs sig_attrs;
struct iser_tx_desc *tx_desc = &iser_task->desc;
struct ib_sig_attrs *sig_attrs = &tx_desc->sig_attrs;
struct ib_send_wr *wr;
int ret;
memset(&sig_attrs, 0, sizeof(sig_attrs));
ret = iser_set_sig_attrs(iser_task->sc, &sig_attrs);
memset(sig_attrs, 0, sizeof(*sig_attrs));
ret = iser_set_sig_attrs(iser_task->sc, sig_attrs);
if (ret)
goto err;
iser_set_prot_checks(iser_task->sc, &sig_attrs.check_mask);
iser_set_prot_checks(iser_task->sc, &sig_attrs->check_mask);
if (!(desc->reg_indicators & ISER_SIG_KEY_VALID)) {
iser_inv_rkey(&inv_wr, pi_ctx->sig_mr);
wr = &inv_wr;
if (!pi_ctx->sig_mr_valid) {
wr = iser_tx_next_wr(tx_desc);
iser_inv_rkey(wr, pi_ctx->sig_mr);
}
memset(&sig_wr, 0, sizeof(sig_wr));
sig_wr.opcode = IB_WR_REG_SIG_MR;
sig_wr.wr_id = ISER_FASTREG_LI_WRID;
sig_wr.sg_list = &data_reg->sge;
sig_wr.num_sge = 1;
sig_wr.wr.sig_handover.sig_attrs = &sig_attrs;
sig_wr.wr.sig_handover.sig_mr = pi_ctx->sig_mr;
wr = iser_tx_next_wr(tx_desc);
wr->opcode = IB_WR_REG_SIG_MR;
wr->wr_id = ISER_FASTREG_LI_WRID;
wr->sg_list = &data_reg->sge;
wr->num_sge = 1;
wr->send_flags = 0;
wr->wr.sig_handover.sig_attrs = sig_attrs;
wr->wr.sig_handover.sig_mr = pi_ctx->sig_mr;
if (scsi_prot_sg_count(iser_task->sc))
sig_wr.wr.sig_handover.prot = &prot_reg->sge;
sig_wr.wr.sig_handover.access_flags = IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_READ |
IB_ACCESS_REMOTE_WRITE;
if (!wr)
wr = &sig_wr;
wr->wr.sig_handover.prot = &prot_reg->sge;
else
wr->next = &sig_wr;
ret = ib_post_send(ib_conn->qp, wr, &bad_wr);
if (ret) {
iser_err("reg_sig_mr failed, ret:%d\n", ret);
goto err;
}
desc->reg_indicators &= ~ISER_SIG_KEY_VALID;
wr->wr.sig_handover.prot = NULL;
wr->wr.sig_handover.access_flags = IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_READ |
IB_ACCESS_REMOTE_WRITE;
pi_ctx->sig_mr_valid = 0;
sig_reg->sge.lkey = pi_ctx->sig_mr->lkey;
sig_reg->rkey = pi_ctx->sig_mr->rkey;
sig_reg->sge.addr = 0;
sig_reg->sge.length = scsi_transfer_length(iser_task->sc);
iser_dbg("sig_sge: lkey: 0x%x, rkey: 0x%x, addr: 0x%llx, length: %u\n",
iser_dbg("sig reg: lkey: 0x%x, rkey: 0x%x, addr: 0x%llx, length: %u\n",
sig_reg->sge.lkey, sig_reg->rkey, sig_reg->sge.addr,
sig_reg->sge.length);
err:
@ -711,29 +729,16 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
struct iser_data_buf *mem,
struct fast_reg_descriptor *desc,
enum iser_reg_indicator ind,
struct iser_reg_resources *rsc,
struct iser_mem_reg *reg)
{
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
struct ib_mr *mr;
struct ib_fast_reg_page_list *frpl;
struct ib_send_wr fastreg_wr, inv_wr;
struct ib_send_wr *bad_wr, *wr = NULL;
int ret, offset, size, plen;
/* if there a single dma entry, dma mr suffices */
if (mem->dma_nents == 1)
return iser_reg_dma(device, mem, reg);
if (ind == ISER_DATA_KEY_VALID) {
mr = desc->data_mr;
frpl = desc->data_frpl;
} else {
mr = desc->pi_ctx->prot_mr;
frpl = desc->pi_ctx->prot_frpl;
}
struct ib_mr *mr = rsc->mr;
struct ib_fast_reg_page_list *frpl = rsc->frpl;
struct iser_tx_desc *tx_desc = &iser_task->desc;
struct ib_send_wr *wr;
int offset, size, plen;
plen = iser_sg_to_page_vec(mem, device->ib_device, frpl->page_list,
&offset, &size);
@ -742,118 +747,151 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
return -EINVAL;
}
if (!(desc->reg_indicators & ind)) {
iser_inv_rkey(&inv_wr, mr);
wr = &inv_wr;
if (!rsc->mr_valid) {
wr = iser_tx_next_wr(tx_desc);
iser_inv_rkey(wr, mr);
}
/* Prepare FASTREG WR */
memset(&fastreg_wr, 0, sizeof(fastreg_wr));
fastreg_wr.wr_id = ISER_FASTREG_LI_WRID;
fastreg_wr.opcode = IB_WR_FAST_REG_MR;
fastreg_wr.wr.fast_reg.iova_start = frpl->page_list[0] + offset;
fastreg_wr.wr.fast_reg.page_list = frpl;
fastreg_wr.wr.fast_reg.page_list_len = plen;
fastreg_wr.wr.fast_reg.page_shift = SHIFT_4K;
fastreg_wr.wr.fast_reg.length = size;
fastreg_wr.wr.fast_reg.rkey = mr->rkey;
fastreg_wr.wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_READ);
if (!wr)
wr = &fastreg_wr;
else
wr->next = &fastreg_wr;
ret = ib_post_send(ib_conn->qp, wr, &bad_wr);
if (ret) {
iser_err("fast registration failed, ret:%d\n", ret);
return ret;
}
desc->reg_indicators &= ~ind;
wr = iser_tx_next_wr(tx_desc);
wr->opcode = IB_WR_FAST_REG_MR;
wr->wr_id = ISER_FASTREG_LI_WRID;
wr->send_flags = 0;
wr->wr.fast_reg.iova_start = frpl->page_list[0] + offset;
wr->wr.fast_reg.page_list = frpl;
wr->wr.fast_reg.page_list_len = plen;
wr->wr.fast_reg.page_shift = SHIFT_4K;
wr->wr.fast_reg.length = size;
wr->wr.fast_reg.rkey = mr->rkey;
wr->wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_READ);
rsc->mr_valid = 0;
reg->sge.lkey = mr->lkey;
reg->rkey = mr->rkey;
reg->sge.addr = frpl->page_list[0] + offset;
reg->sge.length = size;
return ret;
iser_dbg("fast reg: lkey=0x%x, rkey=0x%x, addr=0x%llx,"
" length=0x%x\n", reg->sge.lkey, reg->rkey,
reg->sge.addr, reg->sge.length);
return 0;
}
/**
* iser_reg_rdma_mem_fastreg - Registers memory intended for RDMA,
* using Fast Registration WR (if possible) obtaining rkey and va
*
* returns 0 on success, errno code on failure
*/
int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir)
static int
iser_handle_unaligned_buf(struct iscsi_iser_task *task,
struct iser_data_buf *mem,
enum iser_data_dir dir)
{
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
struct ib_device *ibdev = device->ib_device;
struct iser_data_buf *mem = &iser_task->data[cmd_dir];
struct iser_mem_reg *mem_reg = &iser_task->rdma_reg[cmd_dir];
struct fast_reg_descriptor *desc = NULL;
struct iser_conn *iser_conn = task->iser_conn;
struct iser_device *device = iser_conn->ib_conn.device;
int err, aligned_len;
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
aligned_len = iser_data_buf_aligned_len(mem, device->ib_device,
iser_conn->scsi_sg_tablesize);
if (aligned_len != mem->dma_nents) {
err = fall_to_bounce_buf(iser_task, mem,
cmd_dir, aligned_len);
if (err) {
iser_err("failed to allocate bounce buffer\n");
err = fall_to_bounce_buf(task, mem, dir);
if (err)
return err;
}
}
if (mem->dma_nents != 1 ||
scsi_get_prot_op(iser_task->sc) != SCSI_PROT_NORMAL) {
desc = iser_reg_desc_get(ib_conn);
mem_reg->mem_h = desc;
}
err = iser_fast_reg_mr(iser_task, mem, desc,
ISER_DATA_KEY_VALID, mem_reg);
if (err)
goto err_reg;
if (scsi_get_prot_op(iser_task->sc) != SCSI_PROT_NORMAL) {
struct iser_mem_reg prot_reg;
memset(&prot_reg, 0, sizeof(prot_reg));
if (scsi_prot_sg_count(iser_task->sc)) {
mem = &iser_task->prot[cmd_dir];
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
err = fall_to_bounce_buf(iser_task, mem,
cmd_dir, aligned_len);
if (err) {
iser_err("failed to allocate bounce buffer\n");
return err;
}
}
err = iser_fast_reg_mr(iser_task, mem, desc,
ISER_PROT_KEY_VALID, &prot_reg);
if (err)
goto err_reg;
}
err = iser_reg_sig_mr(iser_task, desc, mem_reg,
&prot_reg, mem_reg);
if (err) {
iser_err("Failed to register signature mr\n");
return err;
}
desc->reg_indicators |= ISER_FASTREG_PROTECTED;
}
return 0;
}
static int
iser_reg_prot_sg(struct iscsi_iser_task *task,
struct iser_data_buf *mem,
struct iser_fr_desc *desc,
struct iser_mem_reg *reg)
{
struct iser_device *device = task->iser_conn->ib_conn.device;
if (mem->dma_nents == 1)
return iser_reg_dma(device, mem, reg);
return device->reg_ops->reg_mem(task, mem, &desc->pi_ctx->rsc, reg);
}
static int
iser_reg_data_sg(struct iscsi_iser_task *task,
struct iser_data_buf *mem,
struct iser_fr_desc *desc,
struct iser_mem_reg *reg)
{
struct iser_device *device = task->iser_conn->ib_conn.device;
if (mem->dma_nents == 1)
return iser_reg_dma(device, mem, reg);
return device->reg_ops->reg_mem(task, mem, &desc->rsc, reg);
}
int iser_reg_rdma_mem(struct iscsi_iser_task *task,
enum iser_data_dir dir)
{
struct ib_conn *ib_conn = &task->iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
struct iser_data_buf *mem = &task->data[dir];
struct iser_mem_reg *reg = &task->rdma_reg[dir];
struct iser_mem_reg *data_reg;
struct iser_fr_desc *desc = NULL;
int err;
err = iser_handle_unaligned_buf(task, mem, dir);
if (unlikely(err))
return err;
if (mem->dma_nents != 1 ||
scsi_get_prot_op(task->sc) != SCSI_PROT_NORMAL) {
desc = device->reg_ops->reg_desc_get(ib_conn);
reg->mem_h = desc;
}
if (scsi_get_prot_op(task->sc) == SCSI_PROT_NORMAL)
data_reg = reg;
else
data_reg = &task->desc.data_reg;
err = iser_reg_data_sg(task, mem, desc, data_reg);
if (unlikely(err))
goto err_reg;
if (scsi_get_prot_op(task->sc) != SCSI_PROT_NORMAL) {
struct iser_mem_reg *prot_reg = &task->desc.prot_reg;
if (scsi_prot_sg_count(task->sc)) {
mem = &task->prot[dir];
err = iser_handle_unaligned_buf(task, mem, dir);
if (unlikely(err))
goto err_reg;
err = iser_reg_prot_sg(task, mem, desc, prot_reg);
if (unlikely(err))
goto err_reg;
}
err = iser_reg_sig_mr(task, desc->pi_ctx, data_reg,
prot_reg, reg);
if (unlikely(err))
goto err_reg;
desc->pi_ctx->sig_protected = 1;
}
return 0;
err_reg:
if (desc)
iser_reg_desc_put(ib_conn, desc);
device->reg_ops->reg_desc_put(ib_conn, desc);
return err;
}
void iser_unreg_rdma_mem(struct iscsi_iser_task *task,
enum iser_data_dir dir)
{
struct iser_device *device = task->iser_conn->ib_conn.device;
device->reg_ops->unreg_mem(task, dir);
}

View File

@ -87,25 +87,9 @@ static int iser_create_device_ib_res(struct iser_device *device)
return ret;
}
/* Assign function handles - based on FMR support */
if (device->ib_device->alloc_fmr && device->ib_device->dealloc_fmr &&
device->ib_device->map_phys_fmr && device->ib_device->unmap_fmr) {
iser_info("FMR supported, using FMR for registration\n");
device->iser_alloc_rdma_reg_res = iser_create_fmr_pool;
device->iser_free_rdma_reg_res = iser_free_fmr_pool;
device->iser_reg_rdma_mem = iser_reg_rdma_mem_fmr;
device->iser_unreg_rdma_mem = iser_unreg_mem_fmr;
} else
if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
iser_info("FastReg supported, using FastReg for registration\n");
device->iser_alloc_rdma_reg_res = iser_create_fastreg_pool;
device->iser_free_rdma_reg_res = iser_free_fastreg_pool;
device->iser_reg_rdma_mem = iser_reg_rdma_mem_fastreg;
device->iser_unreg_rdma_mem = iser_unreg_mem_fastreg;
} else {
iser_err("IB device does not support FMRs nor FastRegs, can't register memory\n");
return -1;
}
ret = iser_assign_reg_ops(device);
if (ret)
return ret;
device->comps_used = min_t(int, num_online_cpus(),
device->ib_device->num_comp_vectors);
@ -201,7 +185,7 @@ static void iser_free_device_ib_res(struct iser_device *device)
(void)ib_unregister_event_handler(&device->event_handler);
(void)ib_dereg_mr(device->mr);
(void)ib_dealloc_pd(device->pd);
ib_dealloc_pd(device->pd);
kfree(device->comps);
device->comps = NULL;
@ -211,28 +195,40 @@ static void iser_free_device_ib_res(struct iser_device *device)
}
/**
* iser_create_fmr_pool - Creates FMR pool and page_vector
* iser_alloc_fmr_pool - Creates FMR pool and page_vector
*
* returns 0 on success, or errno code on failure
*/
int iser_create_fmr_pool(struct ib_conn *ib_conn, unsigned cmds_max)
int iser_alloc_fmr_pool(struct ib_conn *ib_conn,
unsigned cmds_max,
unsigned int size)
{
struct iser_device *device = ib_conn->device;
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
struct iser_page_vec *page_vec;
struct iser_fr_desc *desc;
struct ib_fmr_pool *fmr_pool;
struct ib_fmr_pool_param params;
int ret = -ENOMEM;
int ret;
ib_conn->fmr.page_vec = kmalloc(sizeof(*ib_conn->fmr.page_vec) +
(sizeof(u64)*(ISCSI_ISER_SG_TABLESIZE + 1)),
GFP_KERNEL);
if (!ib_conn->fmr.page_vec)
return ret;
INIT_LIST_HEAD(&fr_pool->list);
spin_lock_init(&fr_pool->lock);
ib_conn->fmr.page_vec->pages = (u64 *)(ib_conn->fmr.page_vec + 1);
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
if (!desc)
return -ENOMEM;
page_vec = kmalloc(sizeof(*page_vec) + (sizeof(u64) * size),
GFP_KERNEL);
if (!page_vec) {
ret = -ENOMEM;
goto err_frpl;
}
page_vec->pages = (u64 *)(page_vec + 1);
params.page_shift = SHIFT_4K;
/* when the first/last SG element are not start/end *
* page aligned, the map whould be of N+1 pages */
params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1;
params.max_pages_per_fmr = size;
/* make the pool size twice the max number of SCSI commands *
* the ML is expected to queue, watermark for unmap at 50% */
params.pool_size = cmds_max * 2;
@ -243,23 +239,25 @@ int iser_create_fmr_pool(struct ib_conn *ib_conn, unsigned cmds_max)
IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_READ);
ib_conn->fmr.pool = ib_create_fmr_pool(device->pd, &params);
if (!IS_ERR(ib_conn->fmr.pool))
return 0;
/* no FMR => no need for page_vec */
kfree(ib_conn->fmr.page_vec);
ib_conn->fmr.page_vec = NULL;
ret = PTR_ERR(ib_conn->fmr.pool);
ib_conn->fmr.pool = NULL;
if (ret != -ENOSYS) {
fmr_pool = ib_create_fmr_pool(device->pd, &params);
if (IS_ERR(fmr_pool)) {
ret = PTR_ERR(fmr_pool);
iser_err("FMR allocation failed, err %d\n", ret);
return ret;
} else {
iser_warn("FMRs are not supported, using unaligned mode\n");
return 0;
goto err_fmr;
}
desc->rsc.page_vec = page_vec;
desc->rsc.fmr_pool = fmr_pool;
list_add(&desc->list, &fr_pool->list);
return 0;
err_fmr:
kfree(page_vec);
err_frpl:
kfree(desc);
return ret;
}
/**
@ -267,26 +265,68 @@ int iser_create_fmr_pool(struct ib_conn *ib_conn, unsigned cmds_max)
*/
void iser_free_fmr_pool(struct ib_conn *ib_conn)
{
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
struct iser_fr_desc *desc;
desc = list_first_entry(&fr_pool->list,
struct iser_fr_desc, list);
list_del(&desc->list);
iser_info("freeing conn %p fmr pool %p\n",
ib_conn, ib_conn->fmr.pool);
ib_conn, desc->rsc.fmr_pool);
if (ib_conn->fmr.pool != NULL)
ib_destroy_fmr_pool(ib_conn->fmr.pool);
ib_conn->fmr.pool = NULL;
kfree(ib_conn->fmr.page_vec);
ib_conn->fmr.page_vec = NULL;
ib_destroy_fmr_pool(desc->rsc.fmr_pool);
kfree(desc->rsc.page_vec);
kfree(desc);
}
static int
iser_alloc_pi_ctx(struct ib_device *ib_device, struct ib_pd *pd,
struct fast_reg_descriptor *desc)
iser_alloc_reg_res(struct ib_device *ib_device,
struct ib_pd *pd,
struct iser_reg_resources *res,
unsigned int size)
{
int ret;
res->frpl = ib_alloc_fast_reg_page_list(ib_device, size);
if (IS_ERR(res->frpl)) {
ret = PTR_ERR(res->frpl);
iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n",
ret);
return PTR_ERR(res->frpl);
}
res->mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, size);
if (IS_ERR(res->mr)) {
ret = PTR_ERR(res->mr);
iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
goto fast_reg_mr_failure;
}
res->mr_valid = 1;
return 0;
fast_reg_mr_failure:
ib_free_fast_reg_page_list(res->frpl);
return ret;
}
static void
iser_free_reg_res(struct iser_reg_resources *rsc)
{
ib_dereg_mr(rsc->mr);
ib_free_fast_reg_page_list(rsc->frpl);
}
static int
iser_alloc_pi_ctx(struct ib_device *ib_device,
struct ib_pd *pd,
struct iser_fr_desc *desc,
unsigned int size)
{
struct iser_pi_context *pi_ctx = NULL;
struct ib_mr_init_attr mr_init_attr = {.max_reg_descriptors = 2,
.flags = IB_MR_SIGNATURE_EN};
int ret = 0;
int ret;
desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
if (!desc->pi_ctx)
@ -294,36 +334,25 @@ iser_alloc_pi_ctx(struct ib_device *ib_device, struct ib_pd *pd,
pi_ctx = desc->pi_ctx;
pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(ib_device,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(pi_ctx->prot_frpl)) {
ret = PTR_ERR(pi_ctx->prot_frpl);
goto prot_frpl_failure;
ret = iser_alloc_reg_res(ib_device, pd, &pi_ctx->rsc, size);
if (ret) {
iser_err("failed to allocate reg_resources\n");
goto alloc_reg_res_err;
}
pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd,
ISCSI_ISER_SG_TABLESIZE + 1);
if (IS_ERR(pi_ctx->prot_mr)) {
ret = PTR_ERR(pi_ctx->prot_mr);
goto prot_mr_failure;
}
desc->reg_indicators |= ISER_PROT_KEY_VALID;
pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2);
if (IS_ERR(pi_ctx->sig_mr)) {
ret = PTR_ERR(pi_ctx->sig_mr);
goto sig_mr_failure;
}
desc->reg_indicators |= ISER_SIG_KEY_VALID;
desc->reg_indicators &= ~ISER_FASTREG_PROTECTED;
pi_ctx->sig_mr_valid = 1;
desc->pi_ctx->sig_protected = 0;
return 0;
sig_mr_failure:
ib_dereg_mr(desc->pi_ctx->prot_mr);
prot_mr_failure:
ib_free_fast_reg_page_list(desc->pi_ctx->prot_frpl);
prot_frpl_failure:
iser_free_reg_res(&pi_ctx->rsc);
alloc_reg_res_err:
kfree(desc->pi_ctx);
return ret;
@ -332,82 +361,71 @@ iser_alloc_pi_ctx(struct ib_device *ib_device, struct ib_pd *pd,
static void
iser_free_pi_ctx(struct iser_pi_context *pi_ctx)
{
ib_free_fast_reg_page_list(pi_ctx->prot_frpl);
ib_dereg_mr(pi_ctx->prot_mr);
ib_destroy_mr(pi_ctx->sig_mr);
iser_free_reg_res(&pi_ctx->rsc);
ib_dereg_mr(pi_ctx->sig_mr);
kfree(pi_ctx);
}
static int
iser_create_fastreg_desc(struct ib_device *ib_device, struct ib_pd *pd,
bool pi_enable, struct fast_reg_descriptor *desc)
static struct iser_fr_desc *
iser_create_fastreg_desc(struct ib_device *ib_device,
struct ib_pd *pd,
bool pi_enable,
unsigned int size)
{
struct iser_fr_desc *desc;
int ret;
desc->data_frpl = ib_alloc_fast_reg_page_list(ib_device,
ISCSI_ISER_SG_TABLESIZE + 1);
if (IS_ERR(desc->data_frpl)) {
ret = PTR_ERR(desc->data_frpl);
iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n",
ret);
return PTR_ERR(desc->data_frpl);
}
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
if (!desc)
return ERR_PTR(-ENOMEM);
desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);
if (IS_ERR(desc->data_mr)) {
ret = PTR_ERR(desc->data_mr);
iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
goto fast_reg_mr_failure;
}
desc->reg_indicators |= ISER_DATA_KEY_VALID;
ret = iser_alloc_reg_res(ib_device, pd, &desc->rsc, size);
if (ret)
goto reg_res_alloc_failure;
if (pi_enable) {
ret = iser_alloc_pi_ctx(ib_device, pd, desc);
ret = iser_alloc_pi_ctx(ib_device, pd, desc, size);
if (ret)
goto pi_ctx_alloc_failure;
}
return 0;
pi_ctx_alloc_failure:
ib_dereg_mr(desc->data_mr);
fast_reg_mr_failure:
ib_free_fast_reg_page_list(desc->data_frpl);
return desc;
return ret;
pi_ctx_alloc_failure:
iser_free_reg_res(&desc->rsc);
reg_res_alloc_failure:
kfree(desc);
return ERR_PTR(ret);
}
/**
* iser_create_fastreg_pool - Creates pool of fast_reg descriptors
* iser_alloc_fastreg_pool - Creates pool of fast_reg descriptors
* for fast registration work requests.
* returns 0 on success, or errno code on failure
*/
int iser_create_fastreg_pool(struct ib_conn *ib_conn, unsigned cmds_max)
int iser_alloc_fastreg_pool(struct ib_conn *ib_conn,
unsigned cmds_max,
unsigned int size)
{
struct iser_device *device = ib_conn->device;
struct fast_reg_descriptor *desc;
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
struct iser_fr_desc *desc;
int i, ret;
INIT_LIST_HEAD(&ib_conn->fastreg.pool);
ib_conn->fastreg.pool_size = 0;
INIT_LIST_HEAD(&fr_pool->list);
spin_lock_init(&fr_pool->lock);
fr_pool->size = 0;
for (i = 0; i < cmds_max; i++) {
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
if (!desc) {
iser_err("Failed to allocate a new fast_reg descriptor\n");
ret = -ENOMEM;
desc = iser_create_fastreg_desc(device->ib_device, device->pd,
ib_conn->pi_support, size);
if (IS_ERR(desc)) {
ret = PTR_ERR(desc);
goto err;
}
ret = iser_create_fastreg_desc(device->ib_device, device->pd,
ib_conn->pi_support, desc);
if (ret) {
iser_err("Failed to create fastreg descriptor err=%d\n",
ret);
kfree(desc);
goto err;
}
list_add_tail(&desc->list, &ib_conn->fastreg.pool);
ib_conn->fastreg.pool_size++;
list_add_tail(&desc->list, &fr_pool->list);
fr_pool->size++;
}
return 0;
@ -422,27 +440,27 @@ int iser_create_fastreg_pool(struct ib_conn *ib_conn, unsigned cmds_max)
*/
void iser_free_fastreg_pool(struct ib_conn *ib_conn)
{
struct fast_reg_descriptor *desc, *tmp;
struct iser_fr_pool *fr_pool = &ib_conn->fr_pool;
struct iser_fr_desc *desc, *tmp;
int i = 0;
if (list_empty(&ib_conn->fastreg.pool))
if (list_empty(&fr_pool->list))
return;
iser_info("freeing conn %p fr pool\n", ib_conn);
list_for_each_entry_safe(desc, tmp, &ib_conn->fastreg.pool, list) {
list_for_each_entry_safe(desc, tmp, &fr_pool->list, list) {
list_del(&desc->list);
ib_free_fast_reg_page_list(desc->data_frpl);
ib_dereg_mr(desc->data_mr);
iser_free_reg_res(&desc->rsc);
if (desc->pi_ctx)
iser_free_pi_ctx(desc->pi_ctx);
kfree(desc);
++i;
}
if (i < ib_conn->fastreg.pool_size)
if (i < fr_pool->size)
iser_warn("pool still has %d regions registered\n",
ib_conn->fastreg.pool_size - i);
fr_pool->size - i);
}
/**
@ -738,6 +756,31 @@ static void iser_connect_error(struct rdma_cm_id *cma_id)
iser_conn->state = ISER_CONN_TERMINATING;
}
static void
iser_calc_scsi_params(struct iser_conn *iser_conn,
unsigned int max_sectors)
{
struct iser_device *device = iser_conn->ib_conn.device;
unsigned short sg_tablesize, sup_sg_tablesize;
sg_tablesize = DIV_ROUND_UP(max_sectors * 512, SIZE_4K);
sup_sg_tablesize = min_t(unsigned, ISCSI_ISER_MAX_SG_TABLESIZE,
device->dev_attr.max_fast_reg_page_list_len);
if (sg_tablesize > sup_sg_tablesize) {
sg_tablesize = sup_sg_tablesize;
iser_conn->scsi_max_sectors = sg_tablesize * SIZE_4K / 512;
} else {
iser_conn->scsi_max_sectors = max_sectors;
}
iser_conn->scsi_sg_tablesize = sg_tablesize;
iser_dbg("iser_conn %p, sg_tablesize %u, max_sectors %u\n",
iser_conn, iser_conn->scsi_sg_tablesize,
iser_conn->scsi_max_sectors);
}
/**
* Called with state mutex held
**/
@ -776,6 +819,8 @@ static void iser_addr_handler(struct rdma_cm_id *cma_id)
}
}
iser_calc_scsi_params(iser_conn, iser_max_sectors);
ret = rdma_resolve_route(cma_id, 1000);
if (ret) {
iser_err("resolve route failed: %d\n", ret);
@ -938,7 +983,6 @@ void iser_conn_init(struct iser_conn *iser_conn)
init_completion(&iser_conn->ib_completion);
init_completion(&iser_conn->up_completion);
INIT_LIST_HEAD(&iser_conn->conn_list);
spin_lock_init(&iser_conn->ib_conn.lock);
mutex_init(&iser_conn->state_mutex);
}
@ -1017,7 +1061,7 @@ int iser_post_recvl(struct iser_conn *iser_conn)
sge.addr = iser_conn->login_resp_dma;
sge.length = ISER_RX_LOGIN_SIZE;
sge.lkey = ib_conn->device->mr->lkey;
sge.lkey = ib_conn->device->pd->local_dma_lkey;
rx_wr.wr_id = (uintptr_t)iser_conn->login_resp_buf;
rx_wr.sg_list = &sge;
@ -1072,23 +1116,24 @@ int iser_post_recvm(struct iser_conn *iser_conn, int count)
int iser_post_send(struct ib_conn *ib_conn, struct iser_tx_desc *tx_desc,
bool signal)
{
int ib_ret;
struct ib_send_wr send_wr, *send_wr_failed;
struct ib_send_wr *bad_wr, *wr = iser_tx_next_wr(tx_desc);
int ib_ret;
ib_dma_sync_single_for_device(ib_conn->device->ib_device,
tx_desc->dma_addr, ISER_HEADERS_LEN,
DMA_TO_DEVICE);
send_wr.next = NULL;
send_wr.wr_id = (uintptr_t)tx_desc;
send_wr.sg_list = tx_desc->tx_sg;
send_wr.num_sge = tx_desc->num_sge;
send_wr.opcode = IB_WR_SEND;
send_wr.send_flags = signal ? IB_SEND_SIGNALED : 0;
wr->next = NULL;
wr->wr_id = (uintptr_t)tx_desc;
wr->sg_list = tx_desc->tx_sg;
wr->num_sge = tx_desc->num_sge;
wr->opcode = IB_WR_SEND;
wr->send_flags = signal ? IB_SEND_SIGNALED : 0;
ib_ret = ib_post_send(ib_conn->qp, &send_wr, &send_wr_failed);
ib_ret = ib_post_send(ib_conn->qp, &tx_desc->wrs[0], &bad_wr);
if (ib_ret)
iser_err("ib_post_send failed, ret:%d\n", ib_ret);
iser_err("ib_post_send failed, ret:%d opcode:%d\n",
ib_ret, bad_wr->opcode);
return ib_ret;
}
@ -1240,13 +1285,13 @@ u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir, sector_t *sector)
{
struct iser_mem_reg *reg = &iser_task->rdma_reg[cmd_dir];
struct fast_reg_descriptor *desc = reg->mem_h;
struct iser_fr_desc *desc = reg->mem_h;
unsigned long sector_size = iser_task->sc->device->sector_size;
struct ib_mr_status mr_status;
int ret;
if (desc && desc->reg_indicators & ISER_FASTREG_PROTECTED) {
desc->reg_indicators &= ~ISER_FASTREG_PROTECTED;
if (desc && desc->pi_ctx->sig_protected) {
desc->pi_ctx->sig_protected = 0;
ret = ib_check_mr_status(desc->pi_ctx->sig_mr,
IB_MR_CHECK_SIG_STATUS, &mr_status);
if (ret) {

View File

@ -235,7 +235,7 @@ isert_alloc_rx_descriptors(struct isert_conn *isert_conn)
rx_sg = &rx_desc->rx_sg;
rx_sg->addr = rx_desc->dma_addr;
rx_sg->length = ISER_RX_PAYLOAD_SIZE;
rx_sg->lkey = device->mr->lkey;
rx_sg->lkey = device->pd->local_dma_lkey;
}
isert_conn->rx_desc_head = 0;
@ -385,22 +385,12 @@ isert_create_device_ib_res(struct isert_device *device)
goto out_cq;
}
device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(device->mr)) {
ret = PTR_ERR(device->mr);
isert_err("failed to create dma mr, device %p, ret=%d\n",
device, ret);
goto out_mr;
}
/* Check signature cap */
device->pi_capable = dev_attr->device_cap_flags &
IB_DEVICE_SIGNATURE_HANDOVER ? true : false;
return 0;
out_mr:
ib_dealloc_pd(device->pd);
out_cq:
isert_free_comps(device);
return ret;
@ -411,7 +401,6 @@ isert_free_device_ib_res(struct isert_device *device)
{
isert_info("device %p\n", device);
ib_dereg_mr(device->mr);
ib_dealloc_pd(device->pd);
isert_free_comps(device);
}
@ -491,7 +480,7 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
if (fr_desc->pi_ctx) {
ib_free_fast_reg_page_list(fr_desc->pi_ctx->prot_frpl);
ib_dereg_mr(fr_desc->pi_ctx->prot_mr);
ib_destroy_mr(fr_desc->pi_ctx->sig_mr);
ib_dereg_mr(fr_desc->pi_ctx->sig_mr);
kfree(fr_desc->pi_ctx);
}
kfree(fr_desc);
@ -508,7 +497,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
struct ib_device *device,
struct ib_pd *pd)
{
struct ib_mr_init_attr mr_init_attr;
struct pi_context *pi_ctx;
int ret;
@ -527,7 +515,8 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
goto err_pi_ctx;
}
pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE);
pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(pi_ctx->prot_mr)) {
isert_err("Failed to allocate prot frmr err=%ld\n",
PTR_ERR(pi_ctx->prot_mr));
@ -536,10 +525,7 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
}
desc->ind |= ISERT_PROT_KEY_VALID;
memset(&mr_init_attr, 0, sizeof(mr_init_attr));
mr_init_attr.max_reg_descriptors = 2;
mr_init_attr.flags |= IB_MR_SIGNATURE_EN;
pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2);
if (IS_ERR(pi_ctx->sig_mr)) {
isert_err("Failed to allocate signature enabled mr err=%ld\n",
PTR_ERR(pi_ctx->sig_mr));
@ -577,7 +563,8 @@ isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
return PTR_ERR(fr_desc->data_frpl);
}
fr_desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE);
fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(fr_desc->data_mr)) {
isert_err("Failed to allocate data frmr err=%ld\n",
PTR_ERR(fr_desc->data_mr));
@ -1092,8 +1079,8 @@ isert_create_send_desc(struct isert_conn *isert_conn,
tx_desc->num_sge = 1;
tx_desc->isert_cmd = isert_cmd;
if (tx_desc->tx_sg[0].lkey != device->mr->lkey) {
tx_desc->tx_sg[0].lkey = device->mr->lkey;
if (tx_desc->tx_sg[0].lkey != device->pd->local_dma_lkey) {
tx_desc->tx_sg[0].lkey = device->pd->local_dma_lkey;
isert_dbg("tx_desc %p lkey mismatch, fixing\n", tx_desc);
}
}
@ -1116,7 +1103,7 @@ isert_init_tx_hdrs(struct isert_conn *isert_conn,
tx_desc->dma_addr = dma_addr;
tx_desc->tx_sg[0].addr = tx_desc->dma_addr;
tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
tx_desc->tx_sg[0].lkey = device->mr->lkey;
tx_desc->tx_sg[0].lkey = device->pd->local_dma_lkey;
isert_dbg("Setup tx_sg[0].addr: 0x%llx length: %u lkey: 0x%x\n",
tx_desc->tx_sg[0].addr, tx_desc->tx_sg[0].length,
@ -1149,7 +1136,7 @@ isert_rdma_post_recvl(struct isert_conn *isert_conn)
memset(&sge, 0, sizeof(struct ib_sge));
sge.addr = isert_conn->login_req_dma;
sge.length = ISER_RX_LOGIN_SIZE;
sge.lkey = isert_conn->device->mr->lkey;
sge.lkey = isert_conn->device->pd->local_dma_lkey;
isert_dbg("Setup sge: addr: %llx length: %d 0x%08x\n",
sge.addr, sge.length, sge.lkey);
@ -1199,7 +1186,7 @@ isert_put_login_tx(struct iscsi_conn *conn, struct iscsi_login *login,
tx_dsg->addr = isert_conn->login_rsp_dma;
tx_dsg->length = length;
tx_dsg->lkey = isert_conn->device->mr->lkey;
tx_dsg->lkey = isert_conn->device->pd->local_dma_lkey;
tx_desc->num_sge = 2;
}
if (!login->login_failed) {
@ -2216,7 +2203,7 @@ isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
isert_cmd->pdu_buf_len = pdu_len;
tx_dsg->addr = isert_cmd->pdu_buf_dma;
tx_dsg->length = pdu_len;
tx_dsg->lkey = device->mr->lkey;
tx_dsg->lkey = device->pd->local_dma_lkey;
isert_cmd->tx_desc.num_sge = 2;
}
@ -2344,7 +2331,7 @@ isert_put_reject(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
isert_cmd->pdu_buf_len = ISCSI_HDR_LEN;
tx_dsg->addr = isert_cmd->pdu_buf_dma;
tx_dsg->length = ISCSI_HDR_LEN;
tx_dsg->lkey = device->mr->lkey;
tx_dsg->lkey = device->pd->local_dma_lkey;
isert_cmd->tx_desc.num_sge = 2;
isert_init_send_wr(isert_conn, isert_cmd, send_wr);
@ -2385,7 +2372,7 @@ isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
isert_cmd->pdu_buf_len = txt_rsp_len;
tx_dsg->addr = isert_cmd->pdu_buf_dma;
tx_dsg->length = txt_rsp_len;
tx_dsg->lkey = device->mr->lkey;
tx_dsg->lkey = device->pd->local_dma_lkey;
isert_cmd->tx_desc.num_sge = 2;
}
isert_init_send_wr(isert_conn, isert_cmd, send_wr);
@ -2426,7 +2413,7 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
ib_sge->addr = ib_sg_dma_address(ib_dev, tmp_sg) + page_off;
ib_sge->length = min_t(u32, data_left,
ib_sg_dma_len(ib_dev, tmp_sg) - page_off);
ib_sge->lkey = device->mr->lkey;
ib_sge->lkey = device->pd->local_dma_lkey;
isert_dbg("RDMA ib_sge: addr: 0x%llx length: %u lkey: %x\n",
ib_sge->addr, ib_sge->length, ib_sge->lkey);
@ -2600,7 +2587,7 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
u32 page_off;
if (mem->dma_nents == 1) {
sge->lkey = device->mr->lkey;
sge->lkey = device->pd->local_dma_lkey;
sge->addr = ib_sg_dma_address(ib_dev, &mem->sg[0]);
sge->length = ib_sg_dma_len(ib_dev, &mem->sg[0]);
isert_dbg("sge: addr: 0x%llx length: %u lkey: %x\n",

View File

@ -209,7 +209,6 @@ struct isert_device {
int refcount;
struct ib_device *ib_device;
struct ib_pd *pd;
struct ib_mr *mr;
struct isert_comp *comps;
int comps_used;
struct list_head dev_node;

View File

@ -55,8 +55,8 @@
#define DRV_NAME "ib_srp"
#define PFX DRV_NAME ": "
#define DRV_VERSION "1.0"
#define DRV_RELDATE "July 1, 2013"
#define DRV_VERSION "2.0"
#define DRV_RELDATE "July 26, 2015"
MODULE_AUTHOR("Roland Dreier");
MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator");
@ -68,8 +68,8 @@ static unsigned int srp_sg_tablesize;
static unsigned int cmd_sg_entries;
static unsigned int indirect_sg_entries;
static bool allow_ext_sg;
static bool prefer_fr;
static bool register_always;
static bool prefer_fr = true;
static bool register_always = true;
static int topspin_workarounds = 1;
module_param(srp_sg_tablesize, uint, 0444);
@ -131,7 +131,7 @@ MODULE_PARM_DESC(ch_count,
"Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA.");
static void srp_add_one(struct ib_device *device);
static void srp_remove_one(struct ib_device *device);
static void srp_remove_one(struct ib_device *device, void *client_data);
static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
@ -378,7 +378,8 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
INIT_LIST_HEAD(&pool->free_list);
for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
mr = ib_alloc_fast_reg_mr(pd, max_page_list_len);
mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
max_page_list_len);
if (IS_ERR(mr)) {
ret = PTR_ERR(mr);
goto destroy_pool;
@ -545,7 +546,7 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
if (ret)
goto err_qp;
if (dev->use_fast_reg && dev->has_fr) {
if (dev->use_fast_reg) {
fr_pool = srp_alloc_fr_pool(target);
if (IS_ERR(fr_pool)) {
ret = PTR_ERR(fr_pool);
@ -553,10 +554,7 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
"FR pool allocation failed (%d)\n", ret);
goto err_qp;
}
if (ch->fr_pool)
srp_destroy_fr_pool(ch->fr_pool);
ch->fr_pool = fr_pool;
} else if (!dev->use_fast_reg && dev->has_fmr) {
} else if (dev->use_fmr) {
fmr_pool = srp_alloc_fmr_pool(target);
if (IS_ERR(fmr_pool)) {
ret = PTR_ERR(fmr_pool);
@ -564,9 +562,6 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
"FMR pool allocation failed (%d)\n", ret);
goto err_qp;
}
if (ch->fmr_pool)
ib_destroy_fmr_pool(ch->fmr_pool);
ch->fmr_pool = fmr_pool;
}
if (ch->qp)
@ -580,6 +575,16 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
ch->recv_cq = recv_cq;
ch->send_cq = send_cq;
if (dev->use_fast_reg) {
if (ch->fr_pool)
srp_destroy_fr_pool(ch->fr_pool);
ch->fr_pool = fr_pool;
} else if (dev->use_fmr) {
if (ch->fmr_pool)
ib_destroy_fmr_pool(ch->fmr_pool);
ch->fmr_pool = fmr_pool;
}
kfree(init_attr);
return 0;
@ -622,7 +627,7 @@ static void srp_free_ch_ib(struct srp_target_port *target,
if (dev->use_fast_reg) {
if (ch->fr_pool)
srp_destroy_fr_pool(ch->fr_pool);
} else {
} else if (dev->use_fmr) {
if (ch->fmr_pool)
ib_destroy_fmr_pool(ch->fmr_pool);
}
@ -1084,7 +1089,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
if (req->nmdesc)
srp_fr_pool_put(ch->fr_pool, req->fr_list,
req->nmdesc);
} else {
} else if (dev->use_fmr) {
struct ib_pool_fmr **pfmr;
for (i = req->nmdesc, pfmr = req->fmr_list; i > 0; i--, pfmr++)
@ -1259,6 +1264,8 @@ static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr,
{
struct srp_direct_buf *desc = state->desc;
WARN_ON_ONCE(!dma_len);
desc->va = cpu_to_be64(dma_addr);
desc->key = cpu_to_be32(rkey);
desc->len = cpu_to_be32(dma_len);
@ -1271,18 +1278,24 @@ static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr,
static int srp_map_finish_fmr(struct srp_map_state *state,
struct srp_rdma_ch *ch)
{
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_pool_fmr *fmr;
u64 io_addr = 0;
if (state->fmr.next >= state->fmr.end)
return -ENOMEM;
fmr = ib_fmr_pool_map_phys(ch->fmr_pool, state->pages,
state->npages, io_addr);
if (IS_ERR(fmr))
return PTR_ERR(fmr);
*state->next_fmr++ = fmr;
*state->fmr.next++ = fmr;
state->nmdesc++;
srp_map_desc(state, 0, state->dma_len, fmr->fmr->rkey);
srp_map_desc(state, state->base_dma_addr & ~dev->mr_page_mask,
state->dma_len, fmr->fmr->rkey);
return 0;
}
@ -1297,6 +1310,9 @@ static int srp_map_finish_fr(struct srp_map_state *state,
struct srp_fr_desc *desc;
u32 rkey;
if (state->fr.next >= state->fr.end)
return -ENOMEM;
desc = srp_fr_pool_get(ch->fr_pool);
if (!desc)
return -ENOMEM;
@ -1320,7 +1336,7 @@ static int srp_map_finish_fr(struct srp_map_state *state,
IB_ACCESS_REMOTE_WRITE);
wr.wr.fast_reg.rkey = desc->mr->lkey;
*state->next_fr++ = desc;
*state->fr.next++ = desc;
state->nmdesc++;
srp_map_desc(state, state->base_dma_addr, state->dma_len,
@ -1333,17 +1349,19 @@ static int srp_finish_mapping(struct srp_map_state *state,
struct srp_rdma_ch *ch)
{
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
int ret = 0;
WARN_ON_ONCE(!dev->use_fast_reg && !dev->use_fmr);
if (state->npages == 0)
return 0;
if (state->npages == 1 && !register_always)
if (state->npages == 1 && target->global_mr)
srp_map_desc(state, state->base_dma_addr, state->dma_len,
target->rkey);
target->global_mr->rkey);
else
ret = target->srp_host->srp_dev->use_fast_reg ?
srp_map_finish_fr(state, ch) :
ret = dev->use_fast_reg ? srp_map_finish_fr(state, ch) :
srp_map_finish_fmr(state, ch);
if (ret == 0) {
@ -1354,66 +1372,19 @@ static int srp_finish_mapping(struct srp_map_state *state,
return ret;
}
static void srp_map_update_start(struct srp_map_state *state,
struct scatterlist *sg, int sg_index,
dma_addr_t dma_addr)
{
state->unmapped_sg = sg;
state->unmapped_index = sg_index;
state->unmapped_addr = dma_addr;
}
static int srp_map_sg_entry(struct srp_map_state *state,
struct srp_rdma_ch *ch,
struct scatterlist *sg, int sg_index,
bool use_mr)
struct scatterlist *sg, int sg_index)
{
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_device *ibdev = dev->dev;
dma_addr_t dma_addr = ib_sg_dma_address(ibdev, sg);
unsigned int dma_len = ib_sg_dma_len(ibdev, sg);
unsigned int len;
unsigned int len = 0;
int ret;
if (!dma_len)
return 0;
if (!use_mr) {
/*
* Once we're in direct map mode for a request, we don't
* go back to FMR or FR mode, so no need to update anything
* other than the descriptor.
*/
srp_map_desc(state, dma_addr, dma_len, target->rkey);
return 0;
}
/*
* Since not all RDMA HW drivers support non-zero page offsets for
* FMR, if we start at an offset into a page, don't merge into the
* current FMR mapping. Finish it out, and use the kernel's MR for
* this sg entry.
*/
if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
dma_len > dev->mr_max_size) {
ret = srp_finish_mapping(state, ch);
if (ret)
return ret;
srp_map_desc(state, dma_addr, dma_len, target->rkey);
srp_map_update_start(state, NULL, 0, 0);
return 0;
}
/*
* If this is the first sg that will be mapped via FMR or via FR, save
* our position. We need to know the first unmapped entry, its index,
* and the first unmapped address within that entry to be able to
* restart mapping after an error.
*/
if (!state->unmapped_sg)
srp_map_update_start(state, sg, sg_index, dma_addr);
WARN_ON_ONCE(!dma_len);
while (dma_len) {
unsigned offset = dma_addr & ~dev->mr_page_mask;
@ -1421,8 +1392,6 @@ static int srp_map_sg_entry(struct srp_map_state *state,
ret = srp_finish_mapping(state, ch);
if (ret)
return ret;
srp_map_update_start(state, sg, sg_index, dma_addr);
}
len = min_t(unsigned int, dma_len, dev->mr_page_size - offset);
@ -1441,11 +1410,8 @@ static int srp_map_sg_entry(struct srp_map_state *state,
* boundries.
*/
ret = 0;
if (len != dev->mr_page_size) {
if (len != dev->mr_page_size)
ret = srp_finish_mapping(state, ch);
if (!ret)
srp_map_update_start(state, NULL, 0, 0);
}
return ret;
}
@ -1455,50 +1421,80 @@ static int srp_map_sg(struct srp_map_state *state, struct srp_rdma_ch *ch,
{
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_device *ibdev = dev->dev;
struct scatterlist *sg;
int i;
bool use_mr;
int i, ret;
state->desc = req->indirect_desc;
state->pages = req->map_page;
if (dev->use_fast_reg) {
state->next_fr = req->fr_list;
use_mr = !!ch->fr_pool;
} else {
state->next_fmr = req->fmr_list;
use_mr = !!ch->fmr_pool;
state->fr.next = req->fr_list;
state->fr.end = req->fr_list + target->cmd_sg_cnt;
} else if (dev->use_fmr) {
state->fmr.next = req->fmr_list;
state->fmr.end = req->fmr_list + target->cmd_sg_cnt;
}
for_each_sg(scat, sg, count, i) {
if (srp_map_sg_entry(state, ch, sg, i, use_mr)) {
/*
* Memory registration failed, so backtrack to the
* first unmapped entry and continue on without using
* memory registration.
*/
dma_addr_t dma_addr;
unsigned int dma_len;
backtrack:
sg = state->unmapped_sg;
i = state->unmapped_index;
dma_addr = ib_sg_dma_address(ibdev, sg);
dma_len = ib_sg_dma_len(ibdev, sg);
dma_len -= (state->unmapped_addr - dma_addr);
dma_addr = state->unmapped_addr;
use_mr = false;
srp_map_desc(state, dma_addr, dma_len, target->rkey);
if (dev->use_fast_reg || dev->use_fmr) {
for_each_sg(scat, sg, count, i) {
ret = srp_map_sg_entry(state, ch, sg, i);
if (ret)
goto out;
}
ret = srp_finish_mapping(state, ch);
if (ret)
goto out;
} else {
for_each_sg(scat, sg, count, i) {
srp_map_desc(state, ib_sg_dma_address(dev->dev, sg),
ib_sg_dma_len(dev->dev, sg),
target->global_mr->rkey);
}
}
if (use_mr && srp_finish_mapping(state, ch))
goto backtrack;
req->nmdesc = state->nmdesc;
ret = 0;
return 0;
out:
return ret;
}
/*
* Register the indirect data buffer descriptor with the HCA.
*
* Note: since the indirect data buffer descriptor has been allocated with
* kmalloc() it is guaranteed that this buffer is a physically contiguous
* memory buffer.
*/
static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req,
void **next_mr, void **end_mr, u32 idb_len,
__be32 *idb_rkey)
{
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct srp_map_state state;
struct srp_direct_buf idb_desc;
u64 idb_pages[1];
int ret;
memset(&state, 0, sizeof(state));
memset(&idb_desc, 0, sizeof(idb_desc));
state.gen.next = next_mr;
state.gen.end = end_mr;
state.desc = &idb_desc;
state.pages = idb_pages;
state.pages[0] = (req->indirect_dma_addr &
dev->mr_page_mask);
state.npages = 1;
state.base_dma_addr = req->indirect_dma_addr;
state.dma_len = idb_len;
ret = srp_finish_mapping(&state, ch);
if (ret < 0)
goto out;
*idb_rkey = idb_desc.key;
out:
return ret;
}
static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
@ -1507,12 +1503,13 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
struct srp_target_port *target = ch->target;
struct scatterlist *scat;
struct srp_cmd *cmd = req->cmd->buf;
int len, nents, count;
int len, nents, count, ret;
struct srp_device *dev;
struct ib_device *ibdev;
struct srp_map_state state;
struct srp_indirect_buf *indirect_hdr;
u32 table_len;
u32 idb_len, table_len;
__be32 idb_rkey;
u8 fmt;
if (!scsi_sglist(scmnd) || scmnd->sc_data_direction == DMA_NONE)
@ -1539,7 +1536,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
fmt = SRP_DATA_DESC_DIRECT;
len = sizeof (struct srp_cmd) + sizeof (struct srp_direct_buf);
if (count == 1 && !register_always) {
if (count == 1 && target->global_mr) {
/*
* The midlayer only generated a single gather/scatter
* entry, or DMA mapping coalesced everything to a
@ -1549,7 +1546,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
struct srp_direct_buf *buf = (void *) cmd->add_data;
buf->va = cpu_to_be64(ib_sg_dma_address(ibdev, scat));
buf->key = cpu_to_be32(target->rkey);
buf->key = cpu_to_be32(target->global_mr->rkey);
buf->len = cpu_to_be32(ib_sg_dma_len(ibdev, scat));
req->nmdesc = 0;
@ -1594,6 +1591,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
count = min(state.ndesc, target->cmd_sg_cnt);
table_len = state.ndesc * sizeof (struct srp_direct_buf);
idb_len = sizeof(struct srp_indirect_buf) + table_len;
fmt = SRP_DATA_DESC_INDIRECT;
len = sizeof(struct srp_cmd) + sizeof (struct srp_indirect_buf);
@ -1602,8 +1600,18 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
memcpy(indirect_hdr->desc_list, req->indirect_desc,
count * sizeof (struct srp_direct_buf));
if (!target->global_mr) {
ret = srp_map_idb(ch, req, state.gen.next, state.gen.end,
idb_len, &idb_rkey);
if (ret < 0)
return ret;
req->nmdesc++;
} else {
idb_rkey = target->global_mr->rkey;
}
indirect_hdr->table_desc.va = cpu_to_be64(req->indirect_dma_addr);
indirect_hdr->table_desc.key = cpu_to_be32(target->rkey);
indirect_hdr->table_desc.key = idb_rkey;
indirect_hdr->table_desc.len = cpu_to_be32(table_len);
indirect_hdr->len = cpu_to_be32(state.total_len);
@ -2171,7 +2179,7 @@ static uint32_t srp_compute_rq_tmo(struct ib_qp_attr *qp_attr, int attr_mask)
}
static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
struct srp_login_rsp *lrsp,
const struct srp_login_rsp *lrsp,
struct srp_rdma_ch *ch)
{
struct srp_target_port *target = ch->target;
@ -2757,6 +2765,13 @@ static int srp_sdev_count(struct Scsi_Host *host)
return c;
}
/*
* Return values:
* < 0 upon failure. Caller is responsible for SRP target port cleanup.
* 0 and target->state == SRP_TARGET_REMOVED if asynchronous target port
* removal has been scheduled.
* 0 and target->state != SRP_TARGET_REMOVED upon success.
*/
static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
{
struct srp_rport_identifiers ids;
@ -3146,8 +3161,8 @@ static ssize_t srp_create_target(struct device *dev,
target->io_class = SRP_REV16A_IB_IO_CLASS;
target->scsi_host = target_host;
target->srp_host = host;
target->lkey = host->srp_dev->mr->lkey;
target->rkey = host->srp_dev->mr->rkey;
target->lkey = host->srp_dev->pd->local_dma_lkey;
target->global_mr = host->srp_dev->global_mr;
target->cmd_sg_cnt = cmd_sg_entries;
target->sg_tablesize = indirect_sg_entries ? : cmd_sg_entries;
target->allow_ext_sg = allow_ext_sg;
@ -3262,7 +3277,7 @@ static ssize_t srp_create_target(struct device *dev,
srp_free_ch_ib(target, ch);
srp_free_req_data(target, ch);
target->ch_count = ch - target->ch;
break;
goto connected;
}
}
@ -3272,6 +3287,7 @@ static ssize_t srp_create_target(struct device *dev,
node_idx++;
}
connected:
target->scsi_host->nr_hw_queues = target->ch_count;
ret = srp_add_target(host, target);
@ -3294,6 +3310,8 @@ static ssize_t srp_create_target(struct device *dev,
mutex_unlock(&host->add_target_mutex);
scsi_host_put(target->scsi_host);
if (ret < 0)
scsi_host_put(target->scsi_host);
return ret;
@ -3401,6 +3419,7 @@ static void srp_add_one(struct ib_device *device)
srp_dev->use_fast_reg = (srp_dev->has_fr &&
(!srp_dev->has_fmr || prefer_fr));
srp_dev->use_fmr = !srp_dev->use_fast_reg && srp_dev->has_fmr;
/*
* Use the smallest page size supported by the HCA, down to a
@ -3433,12 +3452,16 @@ static void srp_add_one(struct ib_device *device)
if (IS_ERR(srp_dev->pd))
goto free_dev;
srp_dev->mr = ib_get_dma_mr(srp_dev->pd,
IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_READ |
IB_ACCESS_REMOTE_WRITE);
if (IS_ERR(srp_dev->mr))
goto err_pd;
if (!register_always || (!srp_dev->has_fmr && !srp_dev->has_fr)) {
srp_dev->global_mr = ib_get_dma_mr(srp_dev->pd,
IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_READ |
IB_ACCESS_REMOTE_WRITE);
if (IS_ERR(srp_dev->global_mr))
goto err_pd;
} else {
srp_dev->global_mr = NULL;
}
for (p = rdma_start_port(device); p <= rdma_end_port(device); ++p) {
host = srp_add_port(srp_dev, p);
@ -3460,13 +3483,13 @@ static void srp_add_one(struct ib_device *device)
kfree(dev_attr);
}
static void srp_remove_one(struct ib_device *device)
static void srp_remove_one(struct ib_device *device, void *client_data)
{
struct srp_device *srp_dev;
struct srp_host *host, *tmp_host;
struct srp_target_port *target;
srp_dev = ib_get_client_data(device, &srp_client);
srp_dev = client_data;
if (!srp_dev)
return;
@ -3495,7 +3518,8 @@ static void srp_remove_one(struct ib_device *device)
kfree(host);
}
ib_dereg_mr(srp_dev->mr);
if (srp_dev->global_mr)
ib_dereg_mr(srp_dev->global_mr);
ib_dealloc_pd(srp_dev->pd);
kfree(srp_dev);

View File

@ -95,13 +95,14 @@ struct srp_device {
struct list_head dev_list;
struct ib_device *dev;
struct ib_pd *pd;
struct ib_mr *mr;
struct ib_mr *global_mr;
u64 mr_page_mask;
int mr_page_size;
int mr_max_size;
int max_pages_per_mr;
bool has_fmr;
bool has_fr;
bool use_fmr;
bool use_fast_reg;
};
@ -182,10 +183,10 @@ struct srp_target_port {
spinlock_t lock;
/* read only in the hot path */
struct ib_mr *global_mr;
struct srp_rdma_ch *ch;
u32 ch_count;
u32 lkey;
u32 rkey;
enum srp_target_state state;
unsigned int max_iu_len;
unsigned int cmd_sg_cnt;
@ -276,14 +277,21 @@ struct srp_fr_pool {
* @npages: Number of page addresses in the pages[] array.
* @nmdesc: Number of FMR or FR memory descriptors used for mapping.
* @ndesc: Number of SRP buffer descriptors that have been filled in.
* @unmapped_sg: First element of the sg-list that is mapped via FMR or FR.
* @unmapped_index: Index of the first element mapped via FMR or FR.
* @unmapped_addr: DMA address of the first element mapped via FMR or FR.
*/
struct srp_map_state {
union {
struct ib_pool_fmr **next_fmr;
struct srp_fr_desc **next_fr;
struct {
struct ib_pool_fmr **next;
struct ib_pool_fmr **end;
} fmr;
struct {
struct srp_fr_desc **next;
struct srp_fr_desc **end;
} fr;
struct {
void **next;
void **end;
} gen;
};
struct srp_direct_buf *desc;
u64 *pages;
@ -293,9 +301,6 @@ struct srp_map_state {
unsigned int npages;
unsigned int nmdesc;
unsigned int ndesc;
struct scatterlist *unmapped_sg;
int unmapped_index;
dma_addr_t unmapped_addr;
};
#endif /* IB_SRP_H */

View File

@ -783,7 +783,7 @@ static int srpt_post_recv(struct srpt_device *sdev,
list.addr = ioctx->ioctx.dma;
list.length = srp_max_req_size;
list.lkey = sdev->mr->lkey;
list.lkey = sdev->pd->local_dma_lkey;
wr.next = NULL;
wr.sg_list = &list;
@ -818,7 +818,7 @@ static int srpt_post_send(struct srpt_rdma_ch *ch,
list.addr = ioctx->ioctx.dma;
list.length = len;
list.lkey = sdev->mr->lkey;
list.lkey = sdev->pd->local_dma_lkey;
wr.next = NULL;
wr.wr_id = encode_wr_id(SRPT_SEND, ioctx->ioctx.index);
@ -1206,7 +1206,7 @@ static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch,
while (rsize > 0 && tsize > 0) {
sge->addr = dma_addr;
sge->lkey = ch->sport->sdev->mr->lkey;
sge->lkey = ch->sport->sdev->pd->local_dma_lkey;
if (rsize >= dma_len) {
sge->length =
@ -3211,10 +3211,6 @@ static void srpt_add_one(struct ib_device *device)
if (IS_ERR(sdev->pd))
goto free_dev;
sdev->mr = ib_get_dma_mr(sdev->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(sdev->mr))
goto err_pd;
sdev->srq_size = min(srpt_srq_size, sdev->dev_attr.max_srq_wr);
srq_attr.event_handler = srpt_srq_event;
@ -3226,7 +3222,7 @@ static void srpt_add_one(struct ib_device *device)
sdev->srq = ib_create_srq(sdev->pd, &srq_attr);
if (IS_ERR(sdev->srq))
goto err_mr;
goto err_pd;
pr_debug("%s: create SRQ #wr= %d max_allow=%d dev= %s\n",
__func__, sdev->srq_size, sdev->dev_attr.max_srq_wr,
@ -3250,7 +3246,7 @@ static void srpt_add_one(struct ib_device *device)
* in the system as service_id; therefore, the target_id will change
* if this HCA is gone bad and replaced by different HCA
*/
if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0, NULL))
if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0))
goto err_cm;
INIT_IB_EVENT_HANDLER(&sdev->event_handler, sdev->device,
@ -3311,8 +3307,6 @@ static void srpt_add_one(struct ib_device *device)
ib_destroy_cm_id(sdev->cm_id);
err_srq:
ib_destroy_srq(sdev->srq);
err_mr:
ib_dereg_mr(sdev->mr);
err_pd:
ib_dealloc_pd(sdev->pd);
free_dev:
@ -3326,12 +3320,11 @@ static void srpt_add_one(struct ib_device *device)
/**
* srpt_remove_one() - InfiniBand device removal callback function.
*/
static void srpt_remove_one(struct ib_device *device)
static void srpt_remove_one(struct ib_device *device, void *client_data)
{
struct srpt_device *sdev;
struct srpt_device *sdev = client_data;
int i;
sdev = ib_get_client_data(device, &srpt_client);
if (!sdev) {
pr_info("%s(%s): nothing to do.\n", __func__, device->name);
return;
@ -3358,7 +3351,6 @@ static void srpt_remove_one(struct ib_device *device)
srpt_release_sdev(sdev);
ib_destroy_srq(sdev->srq);
ib_dereg_mr(sdev->mr);
ib_dealloc_pd(sdev->pd);
srpt_free_ioctx_ring((struct srpt_ioctx **)sdev->ioctx_ring, sdev,

View File

@ -393,7 +393,6 @@ struct srpt_port {
struct srpt_device {
struct ib_device *device;
struct ib_pd *pd;
struct ib_mr *mr;
struct ib_srq *srq;
struct ib_cm_id *cm_id;
struct ib_device_attr dev_attr;

View File

@ -737,19 +737,6 @@ static int bond_option_mode_set(struct bonding *bond,
return 0;
}
static struct net_device *__bond_option_active_slave_get(struct bonding *bond,
struct slave *slave)
{
return bond_uses_primary(bond) && slave ? slave->dev : NULL;
}
struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond)
{
struct slave *slave = rcu_dereference(bond->curr_active_slave);
return __bond_option_active_slave_get(bond, slave);
}
static int bond_option_active_slave_set(struct bonding *bond,
const struct bond_opt_value *newval)
{

View File

@ -224,6 +224,26 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void *endev_ptr)
kfree(mdev);
}
static void mlx4_en_activate(struct mlx4_dev *dev, void *ctx)
{
int i;
struct mlx4_en_dev *mdev = ctx;
/* Create a netdev for each port */
mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
mlx4_info(mdev, "Activating port:%d\n", i);
if (mlx4_en_init_netdev(mdev, i, &mdev->profile.prof[i]))
mdev->pndev[i] = NULL;
}
/* register notifier */
mdev->nb.notifier_call = mlx4_en_netdev_event;
if (register_netdevice_notifier(&mdev->nb)) {
mdev->nb.notifier_call = NULL;
mlx4_err(mdev, "Failed to create notifier\n");
}
}
static void *mlx4_en_add(struct mlx4_dev *dev)
{
struct mlx4_en_dev *mdev;
@ -297,21 +317,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
mutex_init(&mdev->state_lock);
mdev->device_up = true;
/* Setup ports */
/* Create a netdev for each port */
mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
mlx4_info(mdev, "Activating port:%d\n", i);
if (mlx4_en_init_netdev(mdev, i, &mdev->profile.prof[i]))
mdev->pndev[i] = NULL;
}
/* register notifier */
mdev->nb.notifier_call = mlx4_en_netdev_event;
if (register_netdevice_notifier(&mdev->nb)) {
mdev->nb.notifier_call = NULL;
mlx4_err(mdev, "Failed to create notifier\n");
}
return mdev;
err_mr:
@ -335,6 +340,7 @@ static struct mlx4_interface mlx4_en_interface = {
.event = mlx4_en_event,
.get_dev = mlx4_en_get_netdev,
.protocol = MLX4_PROT_ETH,
.activate = mlx4_en_activate,
};
static void mlx4_en_verify_params(void)

View File

@ -63,8 +63,11 @@ static void mlx4_add_device(struct mlx4_interface *intf, struct mlx4_priv *priv)
spin_lock_irq(&priv->ctx_lock);
list_add_tail(&dev_ctx->list, &priv->ctx_list);
spin_unlock_irq(&priv->ctx_lock);
if (intf->activate)
intf->activate(&priv->dev, dev_ctx->context);
} else
kfree(dev_ctx);
}
static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv *priv)

View File

@ -200,3 +200,25 @@ int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev)
return err;
}
int mlx5_core_query_special_context(struct mlx5_core_dev *dev, u32 *rsvd_lkey)
{
struct mlx5_cmd_query_special_contexts_mbox_in in;
struct mlx5_cmd_query_special_contexts_mbox_out out;
int err;
memset(&in, 0, sizeof(in));
memset(&out, 0, sizeof(out));
in.hdr.opcode = cpu_to_be16(MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS);
err = mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out));
if (err)
return err;
if (out.hdr.status)
err = mlx5_cmd_status_to_err(&out.hdr);
*rsvd_lkey = be32_to_cpu(out.resd_lkey);
return err;
}
EXPORT_SYMBOL(mlx5_core_query_special_context);

View File

@ -72,6 +72,8 @@ source "drivers/staging/nvec/Kconfig"
source "drivers/staging/media/Kconfig"
source "drivers/staging/rdma/Kconfig"
source "drivers/staging/android/Kconfig"
source "drivers/staging/board/Kconfig"

View File

@ -29,6 +29,7 @@ obj-$(CONFIG_FT1000) += ft1000/
obj-$(CONFIG_SPEAKUP) += speakup/
obj-$(CONFIG_TOUCHSCREEN_SYNAPTICS_I2C_RMI4) += ste_rmi4/
obj-$(CONFIG_MFD_NVEC) += nvec/
obj-$(CONFIG_STAGING_RDMA) += rdma/
obj-$(CONFIG_ANDROID) += android/
obj-$(CONFIG_STAGING_BOARD) += board/
obj-$(CONFIG_WIMAX_GDM72XX) += gdm72xx/

View File

@ -0,0 +1,31 @@
menuconfig STAGING_RDMA
bool "RDMA staging drivers"
depends on INFINIBAND
depends on PCI || BROKEN
depends on HAS_IOMEM
depends on NET
depends on INET
default n
---help---
This option allows you to select a number of RDMA drivers that
fall into one of two categories: deprecated drivers being held
here before finally being removed or new drivers that still need
some work before being moved to the normal RDMA driver area.
If you wish to work on these drivers, to help improve them, or
to report problems you have with them, please use the
linux-rdma@vger.kernel.org mailing list.
If in doubt, say N here.
# Please keep entries in alphabetic order
if STAGING_RDMA
source "drivers/staging/rdma/amso1100/Kconfig"
source "drivers/staging/rdma/hfi1/Kconfig"
source "drivers/staging/rdma/ipath/Kconfig"
endif

View File

@ -0,0 +1,4 @@
# Entries for RDMA_STAGING tree
obj-$(CONFIG_INFINIBAND_AMSO1100) += amso1100/
obj-$(CONFIG_INFINIBAND_HFI1) += hfi1/
obj-$(CONFIG_INFINIBAND_IPATH) += ipath/

View File

@ -0,0 +1,4 @@
7/2015
The amso1100 driver has been deprecated and moved to drivers/staging.
It will be removed in the 4.6 merge window.

Some files were not shown because too many files have changed in this diff Show More