Updates for 4.14 kernel merge window

- Lots of hfi1 driver updates (mixed with a few qib and core updates as
   well)
 - rxe updates
 - various mlx updates
 - Set default roce type to RoCEv2
 - Several larger fixes for bnxt_re that were too big for -rc
 - Several larger fixes for qedr that, likewise, were too big for -rc
 - Misc core changes
 - Make the hns_roce driver compilable on arches other than aarch64 so we
   can more easily debug build issues related to it
 - Add rdma-netlink infrastructure updates
 - Add automatic IRQ affinity infrastructure
 - Add 32bit lid support
 - Lots of misc fixes across the subsystem from random people
 - Autoloading of RDMA netlink modules
 - PCI pool cleanups from Romain Perier
 - mlx5 driver feature additions and fixes
 - Hardware tag matchine feature
 - Fix sleeping in atomic when resolving roce ah
 - Add experimental ioctl interface as posted to linux-api@
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZqBDtAAoJELgmozMOVy/dNlcQAJhYNRGaNUBx0L6+8t2xwUrt
 7ndP6qlMar30DJY9FjTQCzRBw0CRMWkXdJD8rYlyaHy07pjWDKG8LZtxEXu1FLdZ
 oNRvQX6ZJh8Bz7db2SQFBCTF2uWGZZFqWQCrSbQwjj9xxjMDs59u/knmwHVY9dKk
 egjPG4IQBDmcTeNY7h1otG2hXpx7QPIOilQW2EFN5SWAuBAazdF2JKxjjxqhnUfp
 gD2pSdgsm3VSMoo0zpMa6qOP+9GcOu8J97fYFhasRYWCavPdWHyq+XNu9S/eicRd
 xbv+seCYM+9jPb2dsNdjEKll7w3yyWdu7h6tSCMPYv54eN9sDDiO1w2L2ZnESMZa
 JRnSfB+HXru1r4RyHOTPO8peaNhYlR1V4u8bTS5G2dffbHis9BajkWoAR/oSiUcB
 AIjIIDcdJFVGfpF9KIt/pEl+adHNgESibSijzOUYkyw6RNbPqDmdd7YakPHcQhKN
 clE3zQfIsPRLWsToP/nkBE0tUd3tQocRuLy7ote7hXQK+0p7TBz0a6Kkj87MvX33
 8dVbUI+q6WRlEY90l71y0ZdXy/AvkxkFxAc4Y7FQZyJxhEArTaKgfa5fmpRwVxBm
 yi9baoYCspHNRNv6AO4IL86ZCJqmWBuch8CBY1n2X3h8IGfKYEZUAZ+T/mnTTeUq
 A4joXduz94ZD4w23leD1
 =2ntC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a big pull request.

  Of note is that I'm sending you the new ioctl API for the rdma
  subsystem. We put it up on linux-api@, but didn't get much response.
  The API is complex, but it solves two different problems in one go:

   1) The bi-directional nature of the RDMA file write calls, which
      created the security hole we had to handle (and for which the fix
      is now causing problems for systems in production, we were a bit
      over zealous in the fix and the ability to open a device, then
      fork, then create new queue pairs on the device and use them is
      broken).

   2) The bloat caused by different vendors implementing extensions to
      the base verbs API. Each vendor's hardware is slightly different,
      and the hardware might be suitable for one extension but not
      another.

      By the time we add generic extensions for all the different ways
      that the different hardware can offload things, the API becomes
      bloated. Things like our completion structs have started to exceed
      a cache line in size because of all the elements needed to support
      this. That in turn shows up heavily in the performance graphs with
      a noticable drop in performance on 100Gigabit links as our
      completion structs go from occupying one cache line to 1+.

      This API makes things like the completion structs modular in a
      very similar way to netlink so that your structs can only include
      the items needed for the offloads/features you are actually using
      on a given queue pair. In that way we support everything, but only
      use what we need, and our structs stay smaller.

  The ioctl API is better explained by the posting on linux-api@ than I
  can explain it here, so I'll just leave it at that.

  The rest of the pull request is typical stuff.

  Updates for 4.14 kernel merge window

   - Lots of hfi1 driver updates (mixed with a few qib and core updates
     as well)

   - rxe updates

   - various mlx updates

   - Set default roce type to RoCEv2

   - Several larger fixes for bnxt_re that were too big for -rc

   - Several larger fixes for qedr that, likewise, were too big for -rc

   - Misc core changes

   - Make the hns_roce driver compilable on arches other than aarch64 so
     we can more easily debug build issues related to it

   - Add rdma-netlink infrastructure updates

   - Add automatic IRQ affinity infrastructure

   - Add 32bit lid support

   - Lots of misc fixes across the subsystem from random people

   - Autoloading of RDMA netlink modules

   - PCI pool cleanups from Romain Perier

   - mlx5 driver feature additions and fixes

   - Hardware tag matchine feature

   - Fix sleeping in atomic when resolving roce ah

   - Add experimental ioctl interface as posted to linux-api@"

* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
  IB/core: Expose ioctl interface through experimental Kconfig
  IB/core: Assign root to all drivers
  IB/core: Add completion queue (cq) object actions
  IB/core: Add legacy driver's user-data
  IB/core: Export ioctl enum types to user-space
  IB/core: Explicitly destroy an object while keeping uobject
  IB/core: Add macros for declaring methods and attributes
  IB/core: Add uverbs merge trees functionality
  IB/core: Add DEVICE object and root tree structure
  IB/core: Declare an object instead of declaring only type attributes
  IB/core: Add new ioctl interface
  RDMA/vmw_pvrdma: Fix a signedness
  RDMA/vmw_pvrdma: Report network header type in WC
  IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
  IB/cm: Fix sleeping in atomic when RoCE is used
  IB/core: Add support to finalize objects in one transaction
  IB/core: Add a generic way to execute an operation on a uobject
  Documentation: Hardware tag matching
  IB/mlx5: Support IB_SRQT_TM
  net/mlx5: Add XRQ support
  ...
This commit is contained in:
Linus Torvalds 2017-09-03 17:49:17 -07:00
commit aa9d4648c2
275 changed files with 14913 additions and 5268 deletions

View File

@ -0,0 +1,64 @@
Tag matching logic
The MPI standard defines a set of rules, known as tag-matching, for matching
source send operations to destination receives. The following parameters must
match the following source and destination parameters:
* Communicator
* User tag - wild card may be specified by the receiver
* Source rank wild car may be specified by the receiver
* Destination rank wild
The ordering rules require that when more than one pair of send and receive
message envelopes may match, the pair that includes the earliest posted-send
and the earliest posted-receive is the pair that must be used to satisfy the
matching operation. However, this doesnt imply that tags are consumed in
the order they are created, e.g., a later generated tag may be consumed, if
earlier tags cant be used to satisfy the matching rules.
When a message is sent from the sender to the receiver, the communication
library may attempt to process the operation either after or before the
corresponding matching receive is posted. If a matching receive is posted,
this is an expected message, otherwise it is called an unexpected message.
Implementations frequently use different matching schemes for these two
different matching instances.
To keep MPI library memory footprint down, MPI implementations typically use
two different protocols for this purpose:
1. The Eager protocol- the complete message is sent when the send is
processed by the sender. A completion send is received in the send_cq
notifying that the buffer can be reused.
2. The Rendezvous Protocol - the sender sends the tag-matching header,
and perhaps a portion of data when first notifying the receiver. When the
corresponding buffer is posted, the responder will use the information from
the header to initiate an RDMA READ operation directly to the matching buffer.
A fin message needs to be received in order for the buffer to be reused.
Tag matching implementation
There are two types of matching objects used, the posted receive list and the
unexpected message list. The application posts receive buffers through calls
to the MPI receive routines in the posted receive list and posts send messages
using the MPI send routines. The head of the posted receive list may be
maintained by the hardware, with the software expected to shadow this list.
When send is initiated and arrives at the receive side, if there is no
pre-posted receive for this arriving message, it is passed to the software and
placed in the unexpected message list. Otherwise the match is processed,
including rendezvous processing, if appropriate, delivering the data to the
specified receive buffer. This allows overlapping receive-side MPI tag
matching with computation.
When a receive-message is posted, the communication library will first check
the software unexpected message list for a matching receive. If a match is
found, data is delivered to the user buffer, using a software controlled
protocol. The UCX implementation uses either an eager or rendezvous protocol,
depending on data size. If no match is found, the entire pre-posted receive
list is maintained by the hardware, and there is space to add one more
pre-posted receive to this list, this receive is passed to the hardware.
Software is expected to shadow this list, to help with processing MPI cancel
operations. In addition, because hardware and software are not expected to be
tightly synchronized with respect to the tag-matching operation, this shadow
list is used to detect the case that a pre-posted receive is passed to the
hardware, as the matching unexpected message is being passed from the hardware
to the software.

View File

@ -206,4 +206,9 @@ config BLK_MQ_VIRTIO
depends on BLOCK && VIRTIO
default y
config BLK_MQ_RDMA
bool
depends on BLOCK && INFINIBAND
default y
source block/Kconfig.iosched

View File

@ -29,6 +29,7 @@ obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o
obj-$(CONFIG_BLK_MQ_VIRTIO) += blk-mq-virtio.o
obj-$(CONFIG_BLK_MQ_RDMA) += blk-mq-rdma.o
obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o
obj-$(CONFIG_BLK_WBT) += blk-wbt.o
obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o

52
block/blk-mq-rdma.c Normal file
View File

@ -0,0 +1,52 @@
/*
* Copyright (c) 2017 Sagi Grimberg.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#include <linux/blk-mq.h>
#include <linux/blk-mq-rdma.h>
#include <rdma/ib_verbs.h>
/**
* blk_mq_rdma_map_queues - provide a default queue mapping for rdma device
* @set: tagset to provide the mapping for
* @dev: rdma device associated with @set.
* @first_vec: first interrupt vectors to use for queues (usually 0)
*
* This function assumes the rdma device @dev has at least as many available
* interrupt vetors as @set has queues. It will then query it's affinity mask
* and built queue mapping that maps a queue to the CPUs that have irq affinity
* for the corresponding vector.
*
* In case either the driver passed a @dev with less vectors than
* @set->nr_hw_queues, or @dev does not provide an affinity mask for a
* vector, we fallback to the naive mapping.
*/
int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
struct ib_device *dev, int first_vec)
{
const struct cpumask *mask;
unsigned int queue, cpu;
for (queue = 0; queue < set->nr_hw_queues; queue++) {
mask = ib_get_vector_affinity(dev, first_vec + queue);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
set->mq_map[cpu] = queue;
}
return 0;
fallback:
return blk_mq_map_queues(set);
}
EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);

View File

@ -34,6 +34,15 @@ config INFINIBAND_USER_ACCESS
libibverbs, libibcm and a hardware driver library from
<http://www.openfabrics.org/git/>.
config INFINIBAND_EXP_USER_ACCESS
bool "Allow experimental support for Infiniband ABI"
depends on INFINIBAND_USER_ACCESS
---help---
IOCTL based ABI support for Infiniband. This allows userspace
to invoke the experimental IOCTL based ABI.
These commands are parsed via per-device parsing tree and
enables per-device features.
config INFINIBAND_USER_MEM
bool
depends on INFINIBAND_USER_ACCESS != n

View File

@ -11,7 +11,8 @@ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
multicast.o mad.o smi.o agent.o mad_rmpp.o \
security.o
security.o nldev.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
@ -31,4 +32,5 @@ ib_umad-y := user_mad.o
ib_ucm-y := ucm.o
ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
rdma_core.o uverbs_std_types.o
rdma_core.o uverbs_std_types.o uverbs_ioctl.o \
uverbs_ioctl_merge.o

View File

@ -130,13 +130,11 @@ static void ib_nl_process_good_ip_rsep(const struct nlmsghdr *nlh)
}
int ib_nl_handle_ip_res_resp(struct sk_buff *skb,
struct netlink_callback *cb)
struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
if ((nlh->nlmsg_flags & NLM_F_REQUEST) ||
!(NETLINK_CB(skb).sk) ||
!netlink_capable(skb, CAP_NET_ADMIN))
!(NETLINK_CB(skb).sk))
return -EPERM;
if (ib_nl_is_good_ip_resp(nlh))
@ -186,7 +184,7 @@ static int ib_nl_ip_send_msg(struct rdma_dev_addr *dev_addr,
/* Repair the nlmsg header length */
nlmsg_end(skb, nlh);
ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
rdma_nl_multicast(skb, RDMA_NL_GROUP_LS, GFP_KERNEL);
/* Make the request retry, so when we get the response from userspace
* we will have something.
@ -326,7 +324,7 @@ static void queue_req(struct addr_req *req)
static int ib_nl_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr,
const void *daddr, u32 seq, u16 family)
{
if (ibnl_chk_listeners(RDMA_NL_GROUP_LS))
if (rdma_nl_chk_listeners(RDMA_NL_GROUP_LS))
return -EADDRNOTAVAIL;
/* We fill in what we can, the response will fill the rest */

View File

@ -1199,30 +1199,23 @@ int ib_cache_setup_one(struct ib_device *device)
device->cache.ports =
kzalloc(sizeof(*device->cache.ports) *
(rdma_end_port(device) - rdma_start_port(device) + 1), GFP_KERNEL);
if (!device->cache.ports) {
err = -ENOMEM;
goto out;
}
if (!device->cache.ports)
return -ENOMEM;
err = gid_table_setup_one(device);
if (err)
goto out;
if (err) {
kfree(device->cache.ports);
device->cache.ports = NULL;
return err;
}
for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
ib_cache_update(device, p + rdma_start_port(device), true);
INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
device, ib_cache_event);
err = ib_register_event_handler(&device->cache.event_handler);
if (err)
goto err;
ib_register_event_handler(&device->cache.event_handler);
return 0;
err:
gid_table_cleanup_one(device);
out:
return err;
}
void ib_cache_release_one(struct ib_device *device)

View File

@ -373,11 +373,19 @@ out:
return ret;
}
static int cm_alloc_response_msg(struct cm_port *port,
struct ib_mad_recv_wc *mad_recv_wc,
struct ib_mad_send_buf **msg)
static struct ib_mad_send_buf *cm_alloc_response_msg_no_ah(struct cm_port *port,
struct ib_mad_recv_wc *mad_recv_wc)
{
return ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index,
0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
GFP_ATOMIC,
IB_MGMT_BASE_VERSION);
}
static int cm_create_response_msg_ah(struct cm_port *port,
struct ib_mad_recv_wc *mad_recv_wc,
struct ib_mad_send_buf *msg)
{
struct ib_mad_send_buf *m;
struct ib_ah *ah;
ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
@ -385,27 +393,40 @@ static int cm_alloc_response_msg(struct cm_port *port,
if (IS_ERR(ah))
return PTR_ERR(ah);
m = ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index,
0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
GFP_ATOMIC,
IB_MGMT_BASE_VERSION);
if (IS_ERR(m)) {
rdma_destroy_ah(ah);
return PTR_ERR(m);
}
m->ah = ah;
*msg = m;
msg->ah = ah;
return 0;
}
static void cm_free_msg(struct ib_mad_send_buf *msg)
{
rdma_destroy_ah(msg->ah);
if (msg->ah)
rdma_destroy_ah(msg->ah);
if (msg->context[0])
cm_deref_id(msg->context[0]);
ib_free_send_mad(msg);
}
static int cm_alloc_response_msg(struct cm_port *port,
struct ib_mad_recv_wc *mad_recv_wc,
struct ib_mad_send_buf **msg)
{
struct ib_mad_send_buf *m;
int ret;
m = cm_alloc_response_msg_no_ah(port, mad_recv_wc);
if (IS_ERR(m))
return PTR_ERR(m);
ret = cm_create_response_msg_ah(port, mad_recv_wc, m);
if (ret) {
cm_free_msg(m);
return ret;
}
*msg = m;
return 0;
}
static void * cm_copy_private_data(const void *private_data,
u8 private_data_len)
{
@ -1175,6 +1196,11 @@ static void cm_format_req(struct cm_req_msg *req_msg,
{
struct sa_path_rec *pri_path = param->primary_path;
struct sa_path_rec *alt_path = param->alternate_path;
bool pri_ext = false;
if (pri_path->rec_type == SA_PATH_REC_TYPE_OPA)
pri_ext = opa_is_extended_lid(pri_path->opa.dlid,
pri_path->opa.slid);
cm_format_mad_hdr(&req_msg->hdr, CM_REQ_ATTR_ID,
cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_REQ));
@ -1202,18 +1228,24 @@ static void cm_format_req(struct cm_req_msg *req_msg,
cm_req_set_srq(req_msg, param->srq);
}
req_msg->primary_local_gid = pri_path->sgid;
req_msg->primary_remote_gid = pri_path->dgid;
if (pri_ext) {
req_msg->primary_local_gid.global.interface_id
= OPA_MAKE_ID(be32_to_cpu(pri_path->opa.slid));
req_msg->primary_remote_gid.global.interface_id
= OPA_MAKE_ID(be32_to_cpu(pri_path->opa.dlid));
}
if (pri_path->hop_limit <= 1) {
req_msg->primary_local_lid =
req_msg->primary_local_lid = pri_ext ? 0 :
htons(ntohl(sa_path_get_slid(pri_path)));
req_msg->primary_remote_lid =
req_msg->primary_remote_lid = pri_ext ? 0 :
htons(ntohl(sa_path_get_dlid(pri_path)));
} else {
/* Work-around until there's a way to obtain remote LID info */
req_msg->primary_local_lid = IB_LID_PERMISSIVE;
req_msg->primary_remote_lid = IB_LID_PERMISSIVE;
}
req_msg->primary_local_gid = pri_path->sgid;
req_msg->primary_remote_gid = pri_path->dgid;
cm_req_set_primary_flow_label(req_msg, pri_path->flow_label);
cm_req_set_primary_packet_rate(req_msg, pri_path->rate);
req_msg->primary_traffic_class = pri_path->traffic_class;
@ -1225,17 +1257,29 @@ static void cm_format_req(struct cm_req_msg *req_msg,
pri_path->packet_life_time));
if (alt_path) {
bool alt_ext = false;
if (alt_path->rec_type == SA_PATH_REC_TYPE_OPA)
alt_ext = opa_is_extended_lid(alt_path->opa.dlid,
alt_path->opa.slid);
req_msg->alt_local_gid = alt_path->sgid;
req_msg->alt_remote_gid = alt_path->dgid;
if (alt_ext) {
req_msg->alt_local_gid.global.interface_id
= OPA_MAKE_ID(be32_to_cpu(alt_path->opa.slid));
req_msg->alt_remote_gid.global.interface_id
= OPA_MAKE_ID(be32_to_cpu(alt_path->opa.dlid));
}
if (alt_path->hop_limit <= 1) {
req_msg->alt_local_lid =
req_msg->alt_local_lid = alt_ext ? 0 :
htons(ntohl(sa_path_get_slid(alt_path)));
req_msg->alt_remote_lid =
req_msg->alt_remote_lid = alt_ext ? 0 :
htons(ntohl(sa_path_get_dlid(alt_path)));
} else {
req_msg->alt_local_lid = IB_LID_PERMISSIVE;
req_msg->alt_remote_lid = IB_LID_PERMISSIVE;
}
req_msg->alt_local_gid = alt_path->sgid;
req_msg->alt_remote_gid = alt_path->dgid;
cm_req_set_alt_flow_label(req_msg,
alt_path->flow_label);
cm_req_set_alt_packet_rate(req_msg, alt_path->rate);
@ -1405,16 +1449,63 @@ static inline int cm_is_active_peer(__be64 local_ca_guid, __be64 remote_ca_guid,
(be32_to_cpu(local_qpn) > be32_to_cpu(remote_qpn))));
}
static bool cm_req_has_alt_path(struct cm_req_msg *req_msg)
{
return ((req_msg->alt_local_lid) ||
(ib_is_opa_gid(&req_msg->alt_local_gid)));
}
static void cm_path_set_rec_type(struct ib_device *ib_device, u8 port_num,
struct sa_path_rec *path, union ib_gid *gid)
{
if (ib_is_opa_gid(gid) && rdma_cap_opa_ah(ib_device, port_num))
path->rec_type = SA_PATH_REC_TYPE_OPA;
else
path->rec_type = SA_PATH_REC_TYPE_IB;
}
static void cm_format_path_lid_from_req(struct cm_req_msg *req_msg,
struct sa_path_rec *primary_path,
struct sa_path_rec *alt_path)
{
u32 lid;
if (primary_path->rec_type != SA_PATH_REC_TYPE_OPA) {
sa_path_set_dlid(primary_path,
htonl(ntohs(req_msg->primary_local_lid)));
sa_path_set_slid(primary_path,
htonl(ntohs(req_msg->primary_remote_lid)));
} else {
lid = opa_get_lid_from_gid(&req_msg->primary_local_gid);
sa_path_set_dlid(primary_path, cpu_to_be32(lid));
lid = opa_get_lid_from_gid(&req_msg->primary_remote_gid);
sa_path_set_slid(primary_path, cpu_to_be32(lid));
}
if (!cm_req_has_alt_path(req_msg))
return;
if (alt_path->rec_type != SA_PATH_REC_TYPE_OPA) {
sa_path_set_dlid(alt_path,
htonl(ntohs(req_msg->alt_local_lid)));
sa_path_set_slid(alt_path,
htonl(ntohs(req_msg->alt_remote_lid)));
} else {
lid = opa_get_lid_from_gid(&req_msg->alt_local_gid);
sa_path_set_dlid(alt_path, cpu_to_be32(lid));
lid = opa_get_lid_from_gid(&req_msg->alt_remote_gid);
sa_path_set_slid(alt_path, cpu_to_be32(lid));
}
}
static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
struct sa_path_rec *primary_path,
struct sa_path_rec *alt_path)
{
primary_path->dgid = req_msg->primary_local_gid;
primary_path->sgid = req_msg->primary_remote_gid;
sa_path_set_dlid(primary_path,
htonl(ntohs(req_msg->primary_local_lid)));
sa_path_set_slid(primary_path,
htonl(ntohs(req_msg->primary_remote_lid)));
primary_path->flow_label = cm_req_get_primary_flow_label(req_msg);
primary_path->hop_limit = req_msg->primary_hop_limit;
primary_path->traffic_class = req_msg->primary_traffic_class;
@ -1431,13 +1522,9 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
primary_path->service_id = req_msg->service_id;
if (req_msg->alt_local_lid) {
if (cm_req_has_alt_path(req_msg)) {
alt_path->dgid = req_msg->alt_local_gid;
alt_path->sgid = req_msg->alt_remote_gid;
sa_path_set_dlid(alt_path,
htonl(ntohs(req_msg->alt_local_lid)));
sa_path_set_slid(alt_path,
htonl(ntohs(req_msg->alt_remote_lid)));
alt_path->flow_label = cm_req_get_alt_flow_label(req_msg);
alt_path->hop_limit = req_msg->alt_hop_limit;
alt_path->traffic_class = req_msg->alt_traffic_class;
@ -1454,6 +1541,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
alt_path->service_id = req_msg->service_id;
}
cm_format_path_lid_from_req(req_msg, primary_path, alt_path);
}
static u16 cm_get_bth_pkey(struct cm_work *work)
@ -1703,7 +1791,7 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
{
if (!cm_req_get_primary_subnet_local(req_msg)) {
if (req_msg->primary_local_lid == IB_LID_PERMISSIVE) {
req_msg->primary_local_lid = cpu_to_be16(wc->slid);
req_msg->primary_local_lid = ib_lid_be16(wc->slid);
cm_req_set_primary_sl(req_msg, wc->sl);
}
@ -1713,7 +1801,7 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
if (!cm_req_get_alt_subnet_local(req_msg)) {
if (req_msg->alt_local_lid == IB_LID_PERMISSIVE) {
req_msg->alt_local_lid = cpu_to_be16(wc->slid);
req_msg->alt_local_lid = ib_lid_be16(wc->slid);
cm_req_set_alt_sl(req_msg, wc->sl);
}
@ -1784,9 +1872,12 @@ static int cm_req_handler(struct cm_work *work)
dev_net(gid_attr.ndev));
dev_put(gid_attr.ndev);
} else {
work->path[0].rec_type = SA_PATH_REC_TYPE_IB;
cm_path_set_rec_type(work->port->cm_dev->ib_device,
work->port->port_num,
&work->path[0],
&req_msg->primary_local_gid);
}
if (req_msg->alt_local_lid)
if (cm_req_has_alt_path(req_msg))
work->path[1].rec_type = work->path[0].rec_type;
cm_format_paths_from_req(req_msg, &work->path[0],
&work->path[1]);
@ -1811,16 +1902,19 @@ static int cm_req_handler(struct cm_work *work)
dev_net(gid_attr.ndev));
dev_put(gid_attr.ndev);
} else {
work->path[0].rec_type = SA_PATH_REC_TYPE_IB;
cm_path_set_rec_type(work->port->cm_dev->ib_device,
work->port->port_num,
&work->path[0],
&req_msg->primary_local_gid);
}
if (req_msg->alt_local_lid)
if (cm_req_has_alt_path(req_msg))
work->path[1].rec_type = work->path[0].rec_type;
ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
&work->path[0].sgid, sizeof work->path[0].sgid,
NULL, 0);
goto rejected;
}
if (req_msg->alt_local_lid) {
if (cm_req_has_alt_path(req_msg)) {
ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
cm_id_priv);
if (ret) {
@ -2424,7 +2518,8 @@ static int cm_dreq_handler(struct cm_work *work)
case IB_CM_TIMEWAIT:
atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
counter[CM_DREQ_COUNTER]);
if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))
msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc);
if (IS_ERR(msg))
goto unlock;
cm_format_drep((struct cm_drep_msg *) msg->mad, cm_id_priv,
@ -2432,7 +2527,8 @@ static int cm_dreq_handler(struct cm_work *work)
cm_id_priv->private_data_len);
spin_unlock_irq(&cm_id_priv->lock);
if (ib_post_send_mad(msg, NULL))
if (cm_create_response_msg_ah(work->port, work->mad_recv_wc, msg) ||
ib_post_send_mad(msg, NULL))
cm_free_msg(msg);
goto deref;
case IB_CM_DREQ_RCVD:
@ -2843,6 +2939,11 @@ static void cm_format_lap(struct cm_lap_msg *lap_msg,
const void *private_data,
u8 private_data_len)
{
bool alt_ext = false;
if (alternate_path->rec_type == SA_PATH_REC_TYPE_OPA)
alt_ext = opa_is_extended_lid(alternate_path->opa.dlid,
alternate_path->opa.slid);
cm_format_mad_hdr(&lap_msg->hdr, CM_LAP_ATTR_ID,
cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_LAP));
lap_msg->local_comm_id = cm_id_priv->id.local_id;
@ -2856,6 +2957,12 @@ static void cm_format_lap(struct cm_lap_msg *lap_msg,
htons(ntohl(sa_path_get_dlid(alternate_path)));
lap_msg->alt_local_gid = alternate_path->sgid;
lap_msg->alt_remote_gid = alternate_path->dgid;
if (alt_ext) {
lap_msg->alt_local_gid.global.interface_id
= OPA_MAKE_ID(be32_to_cpu(alternate_path->opa.slid));
lap_msg->alt_remote_gid.global.interface_id
= OPA_MAKE_ID(be32_to_cpu(alternate_path->opa.dlid));
}
cm_lap_set_flow_label(lap_msg, alternate_path->flow_label);
cm_lap_set_traffic_class(lap_msg, alternate_path->traffic_class);
lap_msg->alt_hop_limit = alternate_path->hop_limit;
@ -2924,16 +3031,29 @@ out: spin_unlock_irqrestore(&cm_id_priv->lock, flags);
}
EXPORT_SYMBOL(ib_send_cm_lap);
static void cm_format_path_lid_from_lap(struct cm_lap_msg *lap_msg,
struct sa_path_rec *path)
{
u32 lid;
if (path->rec_type != SA_PATH_REC_TYPE_OPA) {
sa_path_set_dlid(path, htonl(ntohs(lap_msg->alt_local_lid)));
sa_path_set_slid(path, htonl(ntohs(lap_msg->alt_remote_lid)));
} else {
lid = opa_get_lid_from_gid(&lap_msg->alt_local_gid);
sa_path_set_dlid(path, cpu_to_be32(lid));
lid = opa_get_lid_from_gid(&lap_msg->alt_remote_gid);
sa_path_set_slid(path, cpu_to_be32(lid));
}
}
static void cm_format_path_from_lap(struct cm_id_private *cm_id_priv,
struct sa_path_rec *path,
struct cm_lap_msg *lap_msg)
{
memset(path, 0, sizeof *path);
path->rec_type = SA_PATH_REC_TYPE_IB;
path->dgid = lap_msg->alt_local_gid;
path->sgid = lap_msg->alt_remote_gid;
sa_path_set_dlid(path, htonl(ntohs(lap_msg->alt_local_lid)));
sa_path_set_slid(path, htonl(ntohs(lap_msg->alt_remote_lid)));
path->flow_label = cm_lap_get_flow_label(lap_msg);
path->hop_limit = lap_msg->alt_hop_limit;
path->traffic_class = cm_lap_get_traffic_class(lap_msg);
@ -2947,6 +3067,7 @@ static void cm_format_path_from_lap(struct cm_id_private *cm_id_priv,
path->packet_life_time_selector = IB_SA_EQ;
path->packet_life_time = cm_lap_get_local_ack_timeout(lap_msg);
path->packet_life_time -= (path->packet_life_time > 0);
cm_format_path_lid_from_lap(lap_msg, path);
}
static int cm_lap_handler(struct cm_work *work)
@ -2965,6 +3086,11 @@ static int cm_lap_handler(struct cm_work *work)
return -EINVAL;
param = &work->cm_event.param.lap_rcvd;
memset(&work->path[0], 0, sizeof(work->path[1]));
cm_path_set_rec_type(work->port->cm_dev->ib_device,
work->port->port_num,
&work->path[0],
&lap_msg->alt_local_gid);
param->alternate_path = &work->path[0];
cm_format_path_from_lap(cm_id_priv, param->alternate_path, lap_msg);
work->cm_event.private_data = &lap_msg->private_data;
@ -2980,7 +3106,8 @@ static int cm_lap_handler(struct cm_work *work)
case IB_CM_MRA_LAP_SENT:
atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
counter[CM_LAP_COUNTER]);
if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))
msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc);
if (IS_ERR(msg))
goto unlock;
cm_format_mra((struct cm_mra_msg *) msg->mad, cm_id_priv,
@ -2990,7 +3117,8 @@ static int cm_lap_handler(struct cm_work *work)
cm_id_priv->private_data_len);
spin_unlock_irq(&cm_id_priv->lock);
if (ib_post_send_mad(msg, NULL))
if (cm_create_response_msg_ah(work->port, work->mad_recv_wc, msg) ||
ib_post_send_mad(msg, NULL))
cm_free_msg(msg);
goto deref;
case IB_CM_LAP_RCVD:
@ -4201,7 +4329,7 @@ static int __init ib_cm_init(void)
goto error1;
}
cm.wq = create_workqueue("ib_cm");
cm.wq = alloc_workqueue("ib_cm", 0, 1);
if (!cm.wq) {
ret = -ENOMEM;
goto error2;

View File

@ -72,6 +72,7 @@ MODULE_LICENSE("Dual BSD/GPL");
#define CMA_MAX_CM_RETRIES 15
#define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24)
#define CMA_IBOE_PACKET_LIFETIME 18
#define CMA_PREFERRED_ROCE_GID_TYPE IB_GID_TYPE_ROCE_UDP_ENCAP
static const char * const cma_events[] = {
[RDMA_CM_EVENT_ADDR_RESOLVED] = "address resolved",
@ -3998,7 +3999,8 @@ static void iboe_mcast_work_handler(struct work_struct *work)
kfree(mw);
}
static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid)
static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid,
enum ib_gid_type gid_type)
{
struct sockaddr_in *sin = (struct sockaddr_in *)addr;
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
@ -4008,8 +4010,8 @@ static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid)
} else if (addr->sa_family == AF_INET6) {
memcpy(mgid, &sin6->sin6_addr, sizeof *mgid);
} else {
mgid->raw[0] = 0xff;
mgid->raw[1] = 0x0e;
mgid->raw[0] = (gid_type == IB_GID_TYPE_IB) ? 0xff : 0;
mgid->raw[1] = (gid_type == IB_GID_TYPE_IB) ? 0x0e : 0;
mgid->raw[2] = 0;
mgid->raw[3] = 0;
mgid->raw[4] = 0;
@ -4050,7 +4052,9 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
goto out1;
}
cma_iboe_set_mgid(addr, &mc->multicast.ib->rec.mgid);
gid_type = id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
rdma_start_port(id_priv->cma_dev->device)];
cma_iboe_set_mgid(addr, &mc->multicast.ib->rec.mgid, gid_type);
mc->multicast.ib->rec.pkey = cpu_to_be16(0xffff);
if (id_priv->id.ps == RDMA_PS_UDP)
@ -4066,8 +4070,6 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
mc->multicast.ib->rec.hop_limit = 1;
mc->multicast.ib->rec.mtu = iboe_get_mtu(ndev->mtu);
gid_type = id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
rdma_start_port(id_priv->cma_dev->device)];
if (addr->sa_family == AF_INET) {
if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
mc->multicast.ib->rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
@ -4280,8 +4282,12 @@ static void cma_add_one(struct ib_device *device)
for (i = rdma_start_port(device); i <= rdma_end_port(device); i++) {
supported_gids = roce_gid_type_mask_support(device, i);
WARN_ON(!supported_gids);
cma_dev->default_gid_type[i - rdma_start_port(device)] =
find_first_bit(&supported_gids, BITS_PER_LONG);
if (supported_gids & (1 << CMA_PREFERRED_ROCE_GID_TYPE))
cma_dev->default_gid_type[i - rdma_start_port(device)] =
CMA_PREFERRED_ROCE_GID_TYPE;
else
cma_dev->default_gid_type[i - rdma_start_port(device)] =
find_first_bit(&supported_gids, BITS_PER_LONG);
cma_dev->default_roce_tos[i - rdma_start_port(device)] = 0;
}
@ -4452,9 +4458,8 @@ out:
return skb->len;
}
static const struct ibnl_client_cbs cma_cb_table[] = {
[RDMA_NL_RDMA_CM_ID_STATS] = { .dump = cma_get_id_stats,
.module = THIS_MODULE },
static const struct rdma_nl_cbs cma_cb_table[] = {
[RDMA_NL_RDMA_CM_ID_STATS] = { .dump = cma_get_id_stats},
};
static int cma_init_net(struct net *net)
@ -4506,9 +4511,7 @@ static int __init cma_init(void)
if (ret)
goto err;
if (ibnl_add_client(RDMA_NL_RDMA_CM, ARRAY_SIZE(cma_cb_table),
cma_cb_table))
pr_warn("RDMA CMA: failed to add netlink callback\n");
rdma_nl_register(RDMA_NL_RDMA_CM, cma_cb_table);
cma_configfs_init();
return 0;
@ -4525,7 +4528,7 @@ err_wq:
static void __exit cma_cleanup(void)
{
cma_configfs_exit();
ibnl_remove_client(RDMA_NL_RDMA_CM);
rdma_nl_unregister(RDMA_NL_RDMA_CM);
ib_unregister_client(&cma_client);
unregister_netdevice_notifier(&cma_nb);
rdma_addr_unregister_client(&addr_client);
@ -4534,5 +4537,7 @@ static void __exit cma_cleanup(void)
destroy_workqueue(cma_wq);
}
MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_RDMA_CM, 1);
module_init(cma_init);
module_exit(cma_cleanup);

View File

@ -38,6 +38,7 @@
#include <linux/cgroup_rdma.h>
#include <rdma/ib_verbs.h>
#include <rdma/opa_addr.h>
#include <rdma/ib_mad.h>
#include "mad_priv.h"
@ -102,6 +103,14 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
roce_netdev_callback cb,
void *cookie);
typedef int (*nldev_callback)(struct ib_device *device,
struct sk_buff *skb,
struct netlink_callback *cb,
unsigned int idx);
int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb,
struct netlink_callback *cb);
enum ib_cache_gid_default_mode {
IB_CACHE_GID_DEFAULT_MODE_SET,
IB_CACHE_GID_DEFAULT_MODE_DELETE
@ -179,8 +188,8 @@ void ib_mad_cleanup(void);
int ib_sa_init(void);
void ib_sa_cleanup(void);
int ibnl_init(void);
void ibnl_cleanup(void);
int rdma_nl_init(void);
void rdma_nl_exit(void);
/**
* Check if there are any listeners to the netlink group
@ -190,11 +199,14 @@ void ibnl_cleanup(void);
int ibnl_chk_listeners(unsigned int group);
int ib_nl_handle_resolve_resp(struct sk_buff *skb,
struct netlink_callback *cb);
struct nlmsghdr *nlh,
struct netlink_ext_ack *extack);
int ib_nl_handle_set_timeout(struct sk_buff *skb,
struct netlink_callback *cb);
struct nlmsghdr *nlh,
struct netlink_ext_ack *extack);
int ib_nl_handle_ip_res_resp(struct sk_buff *skb,
struct netlink_callback *cb);
struct nlmsghdr *nlh,
struct netlink_ext_ack *extack);
int ib_get_cached_subnet_prefix(struct ib_device *device,
u8 port_num,
@ -301,4 +313,9 @@ static inline int ib_mad_enforce_security(struct ib_mad_agent_private *map,
return 0;
}
#endif
struct ib_device *__ib_device_get_by_index(u32 ifindex);
/* RDMA device netlink */
void nldev_init(void);
void nldev_exit(void);
#endif /* _CORE_PRIV_H */

View File

@ -134,6 +134,17 @@ static int ib_device_check_mandatory(struct ib_device *device)
return 0;
}
struct ib_device *__ib_device_get_by_index(u32 index)
{
struct ib_device *device;
list_for_each_entry(device, &device_list, core_list)
if (device->index == index)
return device;
return NULL;
}
static struct ib_device *__ib_device_get_by_name(const char *name)
{
struct ib_device *device;
@ -145,7 +156,6 @@ static struct ib_device *__ib_device_get_by_name(const char *name)
return NULL;
}
static int alloc_name(char *name)
{
unsigned long *inuse;
@ -326,10 +336,10 @@ static int read_port_immutable(struct ib_device *device)
return 0;
}
void ib_get_device_fw_str(struct ib_device *dev, char *str, size_t str_len)
void ib_get_device_fw_str(struct ib_device *dev, char *str)
{
if (dev->get_dev_fw_str)
dev->get_dev_fw_str(dev, str, str_len);
dev->get_dev_fw_str(dev, str);
else
str[0] = '\0';
}
@ -394,6 +404,30 @@ static int ib_security_change(struct notifier_block *nb, unsigned long event,
return NOTIFY_OK;
}
/**
* __dev_new_index - allocate an device index
*
* Returns a suitable unique value for a new device interface
* number. It assumes that there are less than 2^32-1 ib devices
* will be present in the system.
*/
static u32 __dev_new_index(void)
{
/*
* The device index to allow stable naming.
* Similar to struct net -> ifindex.
*/
static u32 index;
for (;;) {
if (!(++index))
index = 1;
if (!__ib_device_get_by_index(index))
return index;
}
}
/**
* ib_register_device - Register an IB device with IB core
* @device:Device to register
@ -489,9 +523,10 @@ int ib_register_device(struct ib_device *device,
device->reg_state = IB_DEV_REGISTERED;
list_for_each_entry(client, &client_list, list)
if (client->add && !add_client_context(device, client))
if (!add_client_context(device, client) && client->add)
client->add(device);
device->index = __dev_new_index();
down_write(&lists_rwsem);
list_add_tail(&device->core_list, &device_list);
up_write(&lists_rwsem);
@ -578,7 +613,7 @@ int ib_register_client(struct ib_client *client)
mutex_lock(&device_mutex);
list_for_each_entry(device, &device_list, core_list)
if (client->add && !add_client_context(device, client))
if (!add_client_context(device, client) && client->add)
client->add(device);
down_write(&lists_rwsem);
@ -712,7 +747,7 @@ EXPORT_SYMBOL(ib_set_client_data);
* chapter 11 of the InfiniBand Architecture Specification). This
* callback may occur in interrupt context.
*/
int ib_register_event_handler (struct ib_event_handler *event_handler)
void ib_register_event_handler(struct ib_event_handler *event_handler)
{
unsigned long flags;
@ -720,8 +755,6 @@ int ib_register_event_handler (struct ib_event_handler *event_handler)
list_add_tail(&event_handler->list,
&event_handler->device->event_handler_list);
spin_unlock_irqrestore(&event_handler->device->event_handler_lock, flags);
return 0;
}
EXPORT_SYMBOL(ib_register_event_handler);
@ -732,15 +765,13 @@ EXPORT_SYMBOL(ib_register_event_handler);
* Unregister an event handler registered with
* ib_register_event_handler().
*/
int ib_unregister_event_handler(struct ib_event_handler *event_handler)
void ib_unregister_event_handler(struct ib_event_handler *event_handler)
{
unsigned long flags;
spin_lock_irqsave(&event_handler->device->event_handler_lock, flags);
list_del(&event_handler->list);
spin_unlock_irqrestore(&event_handler->device->event_handler_lock, flags);
return 0;
}
EXPORT_SYMBOL(ib_unregister_event_handler);
@ -893,6 +924,31 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
up_read(&lists_rwsem);
}
/**
* ib_enum_all_devs - enumerate all ib_devices
* @cb: Callback to call for each found ib_device
*
* Enumerates all ib_devices and calls callback() on each device.
*/
int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb,
struct netlink_callback *cb)
{
struct ib_device *dev;
unsigned int idx = 0;
int ret = 0;
down_read(&lists_rwsem);
list_for_each_entry(dev, &device_list, core_list) {
ret = nldev_cb(dev, skb, cb, idx);
if (ret)
break;
idx++;
}
up_read(&lists_rwsem);
return ret;
}
/**
* ib_query_pkey - Get P_Key table entry
* @device:Device to query
@ -945,14 +1001,17 @@ int ib_modify_port(struct ib_device *device,
u8 port_num, int port_modify_mask,
struct ib_port_modify *port_modify)
{
if (!device->modify_port)
return -ENOSYS;
int rc;
if (!rdma_is_port_valid(device, port_num))
return -EINVAL;
return device->modify_port(device, port_num, port_modify_mask,
port_modify);
if (device->modify_port)
rc = device->modify_port(device, port_num, port_modify_mask,
port_modify);
else
rc = rdma_protocol_roce(device, port_num) ? 0 : -ENOSYS;
return rc;
}
EXPORT_SYMBOL(ib_modify_port);
@ -1087,29 +1146,21 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev,
}
EXPORT_SYMBOL(ib_get_net_dev_by_params);
static struct ibnl_client_cbs ibnl_ls_cb_table[] = {
static const struct rdma_nl_cbs ibnl_ls_cb_table[] = {
[RDMA_NL_LS_OP_RESOLVE] = {
.dump = ib_nl_handle_resolve_resp,
.module = THIS_MODULE },
.doit = ib_nl_handle_resolve_resp,
.flags = RDMA_NL_ADMIN_PERM,
},
[RDMA_NL_LS_OP_SET_TIMEOUT] = {
.dump = ib_nl_handle_set_timeout,
.module = THIS_MODULE },
.doit = ib_nl_handle_set_timeout,
.flags = RDMA_NL_ADMIN_PERM,
},
[RDMA_NL_LS_OP_IP_RESOLVE] = {
.dump = ib_nl_handle_ip_res_resp,
.module = THIS_MODULE },
.doit = ib_nl_handle_ip_res_resp,
.flags = RDMA_NL_ADMIN_PERM,
},
};
static int ib_add_ibnl_clients(void)
{
return ibnl_add_client(RDMA_NL_LS, ARRAY_SIZE(ibnl_ls_cb_table),
ibnl_ls_cb_table);
}
static void ib_remove_ibnl_clients(void)
{
ibnl_remove_client(RDMA_NL_LS);
}
static int __init ib_core_init(void)
{
int ret;
@ -1131,9 +1182,9 @@ static int __init ib_core_init(void)
goto err_comp;
}
ret = ibnl_init();
ret = rdma_nl_init();
if (ret) {
pr_warn("Couldn't init IB netlink interface\n");
pr_warn("Couldn't init IB netlink interface: err %d\n", ret);
goto err_sysfs;
}
@ -1155,24 +1206,18 @@ static int __init ib_core_init(void)
goto err_mad;
}
ret = ib_add_ibnl_clients();
if (ret) {
pr_warn("Couldn't register ibnl clients\n");
goto err_sa;
}
ret = register_lsm_notifier(&ibdev_lsm_nb);
if (ret) {
pr_warn("Couldn't register LSM notifier. ret %d\n", ret);
goto err_ibnl_clients;
goto err_sa;
}
nldev_init();
rdma_nl_register(RDMA_NL_LS, ibnl_ls_cb_table);
ib_cache_setup();
return 0;
err_ibnl_clients:
ib_remove_ibnl_clients();
err_sa:
ib_sa_cleanup();
err_mad:
@ -1180,7 +1225,7 @@ err_mad:
err_addr:
addr_cleanup();
err_ibnl:
ibnl_cleanup();
rdma_nl_exit();
err_sysfs:
class_unregister(&ib_class);
err_comp:
@ -1192,18 +1237,21 @@ err:
static void __exit ib_core_cleanup(void)
{
unregister_lsm_notifier(&ibdev_lsm_nb);
ib_cache_cleanup();
ib_remove_ibnl_clients();
nldev_exit();
rdma_nl_unregister(RDMA_NL_LS);
unregister_lsm_notifier(&ibdev_lsm_nb);
ib_sa_cleanup();
ib_mad_cleanup();
addr_cleanup();
ibnl_cleanup();
rdma_nl_exit();
class_unregister(&ib_class);
destroy_workqueue(ib_comp_wq);
/* Make sure that any pending umem accounting work is done. */
destroy_workqueue(ib_wq);
}
MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_LS, 4);
module_init(ib_core_init);
module_exit(ib_core_cleanup);

View File

@ -80,7 +80,7 @@ const char *__attribute_const__ iwcm_reject_msg(int reason)
}
EXPORT_SYMBOL(iwcm_reject_msg);
static struct ibnl_client_cbs iwcm_nl_cb_table[] = {
static struct rdma_nl_cbs iwcm_nl_cb_table[] = {
[RDMA_NL_IWPM_REG_PID] = {.dump = iwpm_register_pid_cb},
[RDMA_NL_IWPM_ADD_MAPPING] = {.dump = iwpm_add_mapping_cb},
[RDMA_NL_IWPM_QUERY_MAPPING] = {.dump = iwpm_add_and_query_mapping_cb},
@ -1175,13 +1175,9 @@ static int __init iw_cm_init(void)
ret = iwpm_init(RDMA_NL_IWCM);
if (ret)
pr_err("iw_cm: couldn't init iwpm\n");
ret = ibnl_add_client(RDMA_NL_IWCM, ARRAY_SIZE(iwcm_nl_cb_table),
iwcm_nl_cb_table);
if (ret)
pr_err("iw_cm: couldn't register netlink callbacks\n");
iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", WQ_MEM_RECLAIM);
else
rdma_nl_register(RDMA_NL_IWCM, iwcm_nl_cb_table);
iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", 0);
if (!iwcm_wq)
return -ENOMEM;
@ -1200,9 +1196,11 @@ static void __exit iw_cm_cleanup(void)
{
unregister_net_sysctl_table(iwcm_ctl_table_hdr);
destroy_workqueue(iwcm_wq);
ibnl_remove_client(RDMA_NL_IWCM);
rdma_nl_unregister(RDMA_NL_IWCM);
iwpm_exit(RDMA_NL_IWCM);
}
MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_IWCM, 2);
module_init(iw_cm_init);
module_exit(iw_cm_cleanup);

View File

@ -42,7 +42,6 @@ int iwpm_valid_pid(void)
{
return iwpm_user_pid > 0;
}
EXPORT_SYMBOL(iwpm_valid_pid);
/*
* iwpm_register_pid - Send a netlink query to user space
@ -104,7 +103,7 @@ int iwpm_register_pid(struct iwpm_dev_data *pm_msg, u8 nl_client)
pr_debug("%s: Multicasting a nlmsg (dev = %s ifname = %s iwpm = %s)\n",
__func__, pm_msg->dev_name, pm_msg->if_name, iwpm_ulib_name);
ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_IWPM, GFP_KERNEL);
ret = rdma_nl_multicast(skb, RDMA_NL_GROUP_IWPM, GFP_KERNEL);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
iwpm_user_pid = IWPM_PID_UNAVAILABLE;
@ -122,7 +121,6 @@ pid_query_error:
iwpm_free_nlmsg_request(&nlmsg_request->kref);
return ret;
}
EXPORT_SYMBOL(iwpm_register_pid);
/*
* iwpm_add_mapping - Send a netlink add mapping message
@ -174,7 +172,7 @@ int iwpm_add_mapping(struct iwpm_sa_data *pm_msg, u8 nl_client)
goto add_mapping_error;
nlmsg_request->req_buffer = pm_msg;
ret = ibnl_unicast(skb, nlh, iwpm_user_pid);
ret = rdma_nl_unicast_wait(skb, iwpm_user_pid);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
iwpm_user_pid = IWPM_PID_UNDEFINED;
@ -191,7 +189,6 @@ add_mapping_error:
iwpm_free_nlmsg_request(&nlmsg_request->kref);
return ret;
}
EXPORT_SYMBOL(iwpm_add_mapping);
/*
* iwpm_add_and_query_mapping - Send a netlink add and query
@ -251,7 +248,7 @@ int iwpm_add_and_query_mapping(struct iwpm_sa_data *pm_msg, u8 nl_client)
goto query_mapping_error;
nlmsg_request->req_buffer = pm_msg;
ret = ibnl_unicast(skb, nlh, iwpm_user_pid);
ret = rdma_nl_unicast_wait(skb, iwpm_user_pid);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
err_str = "Unable to send a nlmsg";
@ -267,7 +264,6 @@ query_mapping_error:
iwpm_free_nlmsg_request(&nlmsg_request->kref);
return ret;
}
EXPORT_SYMBOL(iwpm_add_and_query_mapping);
/*
* iwpm_remove_mapping - Send a netlink remove mapping message
@ -312,7 +308,7 @@ int iwpm_remove_mapping(struct sockaddr_storage *local_addr, u8 nl_client)
if (ret)
goto remove_mapping_error;
ret = ibnl_unicast(skb, nlh, iwpm_user_pid);
ret = rdma_nl_unicast_wait(skb, iwpm_user_pid);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
iwpm_user_pid = IWPM_PID_UNDEFINED;
@ -328,7 +324,6 @@ remove_mapping_error:
dev_kfree_skb_any(skb);
return ret;
}
EXPORT_SYMBOL(iwpm_remove_mapping);
/* netlink attribute policy for the received response to register pid request */
static const struct nla_policy resp_reg_policy[IWPM_NLA_RREG_PID_MAX] = {
@ -397,7 +392,6 @@ register_pid_response_exit:
up(&nlmsg_request->sem);
return 0;
}
EXPORT_SYMBOL(iwpm_register_pid_cb);
/* netlink attribute policy for the received response to add mapping request */
static const struct nla_policy resp_add_policy[IWPM_NLA_RMANAGE_MAPPING_MAX] = {
@ -466,7 +460,6 @@ add_mapping_response_exit:
up(&nlmsg_request->sem);
return 0;
}
EXPORT_SYMBOL(iwpm_add_mapping_cb);
/* netlink attribute policy for the response to add and query mapping request
* and response with remote address info */
@ -558,7 +551,6 @@ query_mapping_response_exit:
up(&nlmsg_request->sem);
return 0;
}
EXPORT_SYMBOL(iwpm_add_and_query_mapping_cb);
/*
* iwpm_remote_info_cb - Process a port mapper message, containing
@ -627,7 +619,6 @@ int iwpm_remote_info_cb(struct sk_buff *skb, struct netlink_callback *cb)
"remote_info: Mapped remote sockaddr:");
return ret;
}
EXPORT_SYMBOL(iwpm_remote_info_cb);
/* netlink attribute policy for the received request for mapping info */
static const struct nla_policy resp_mapinfo_policy[IWPM_NLA_MAPINFO_REQ_MAX] = {
@ -677,7 +668,6 @@ int iwpm_mapping_info_cb(struct sk_buff *skb, struct netlink_callback *cb)
ret = iwpm_send_mapinfo(nl_client, iwpm_user_pid);
return ret;
}
EXPORT_SYMBOL(iwpm_mapping_info_cb);
/* netlink attribute policy for the received mapping info ack */
static const struct nla_policy ack_mapinfo_policy[IWPM_NLA_MAPINFO_NUM_MAX] = {
@ -707,7 +697,6 @@ int iwpm_ack_mapping_info_cb(struct sk_buff *skb, struct netlink_callback *cb)
atomic_set(&echo_nlmsg_seq, cb->nlh->nlmsg_seq);
return 0;
}
EXPORT_SYMBOL(iwpm_ack_mapping_info_cb);
/* netlink attribute policy for the received port mapper error message */
static const struct nla_policy map_error_policy[IWPM_NLA_ERR_MAX] = {
@ -751,4 +740,3 @@ int iwpm_mapping_error_cb(struct sk_buff *skb, struct netlink_callback *cb)
up(&nlmsg_request->sem);
return 0;
}
EXPORT_SYMBOL(iwpm_mapping_error_cb);

View File

@ -54,8 +54,6 @@ static struct iwpm_admin_data iwpm_admin;
int iwpm_init(u8 nl_client)
{
int ret = 0;
if (iwpm_valid_client(nl_client))
return -EINVAL;
mutex_lock(&iwpm_admin_lock);
if (atomic_read(&iwpm_admin.refcount) == 0) {
iwpm_hash_bucket = kzalloc(IWPM_MAPINFO_HASH_SIZE *
@ -83,7 +81,6 @@ init_exit:
}
return ret;
}
EXPORT_SYMBOL(iwpm_init);
static void free_hash_bucket(void);
static void free_reminfo_bucket(void);
@ -109,7 +106,6 @@ int iwpm_exit(u8 nl_client)
iwpm_set_registration(nl_client, IWPM_REG_UNDEF);
return 0;
}
EXPORT_SYMBOL(iwpm_exit);
static struct hlist_head *get_mapinfo_hash_bucket(struct sockaddr_storage *,
struct sockaddr_storage *);
@ -148,7 +144,6 @@ int iwpm_create_mapinfo(struct sockaddr_storage *local_sockaddr,
spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags);
return ret;
}
EXPORT_SYMBOL(iwpm_create_mapinfo);
int iwpm_remove_mapinfo(struct sockaddr_storage *local_sockaddr,
struct sockaddr_storage *mapped_local_addr)
@ -184,7 +179,6 @@ remove_mapinfo_exit:
spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags);
return ret;
}
EXPORT_SYMBOL(iwpm_remove_mapinfo);
static void free_hash_bucket(void)
{
@ -297,7 +291,6 @@ get_remote_info_exit:
spin_unlock_irqrestore(&iwpm_reminfo_lock, flags);
return ret;
}
EXPORT_SYMBOL(iwpm_get_remote_info);
struct iwpm_nlmsg_request *iwpm_get_nlmsg_request(__u32 nlmsg_seq,
u8 nl_client, gfp_t gfp)
@ -383,15 +376,11 @@ int iwpm_get_nlmsg_seq(void)
int iwpm_valid_client(u8 nl_client)
{
if (nl_client >= RDMA_NL_NUM_CLIENTS)
return 0;
return iwpm_admin.client_list[nl_client];
}
void iwpm_set_valid(u8 nl_client, int valid)
{
if (nl_client >= RDMA_NL_NUM_CLIENTS)
return;
iwpm_admin.client_list[nl_client] = valid;
}
@ -608,7 +597,7 @@ static int send_mapinfo_num(u32 mapping_num, u8 nl_client, int iwpm_pid)
&mapping_num, IWPM_NLA_MAPINFO_SEND_NUM);
if (ret)
goto mapinfo_num_error;
ret = ibnl_unicast(skb, nlh, iwpm_pid);
ret = rdma_nl_unicast(skb, iwpm_pid);
if (ret) {
skb = NULL;
err_str = "Unable to send a nlmsg";
@ -637,7 +626,7 @@ static int send_nlmsg_done(struct sk_buff *skb, u8 nl_client, int iwpm_pid)
return -ENOMEM;
}
nlh->nlmsg_type = NLMSG_DONE;
ret = ibnl_unicast(skb, (struct nlmsghdr *)skb->data, iwpm_pid);
ret = rdma_nl_unicast(skb, iwpm_pid);
if (ret)
pr_warn("%s Unable to send a nlmsg\n", __func__);
return ret;

View File

@ -64,7 +64,7 @@ struct mad_rmpp_recv {
__be64 tid;
u32 src_qp;
u16 slid;
u32 slid;
u8 mgmt_class;
u8 class_version;
u8 method;

View File

@ -1,4 +1,5 @@
/*
* Copyright (c) 2017 Mellanox Technologies Inc. All rights reserved.
* Copyright (c) 2010 Voltaire Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
@ -37,239 +38,267 @@
#include <net/net_namespace.h>
#include <net/sock.h>
#include <rdma/rdma_netlink.h>
#include <linux/module.h>
#include "core_priv.h"
struct ibnl_client {
struct list_head list;
int index;
int nops;
const struct ibnl_client_cbs *cb_table;
};
#include "core_priv.h"
static DEFINE_MUTEX(ibnl_mutex);
static DEFINE_MUTEX(rdma_nl_mutex);
static struct sock *nls;
static LIST_HEAD(client_list);
static struct {
const struct rdma_nl_cbs *cb_table;
} rdma_nl_types[RDMA_NL_NUM_CLIENTS];
int ibnl_chk_listeners(unsigned int group)
int rdma_nl_chk_listeners(unsigned int group)
{
if (netlink_has_listeners(nls, group) == 0)
return -1;
return 0;
return (netlink_has_listeners(nls, group)) ? 0 : -1;
}
EXPORT_SYMBOL(rdma_nl_chk_listeners);
static bool is_nl_msg_valid(unsigned int type, unsigned int op)
{
static const unsigned int max_num_ops[RDMA_NL_NUM_CLIENTS - 1] = {
RDMA_NL_RDMA_CM_NUM_OPS,
RDMA_NL_IWPM_NUM_OPS,
0,
RDMA_NL_LS_NUM_OPS,
RDMA_NLDEV_NUM_OPS };
/*
* This BUILD_BUG_ON is intended to catch addition of new
* RDMA netlink protocol without updating the array above.
*/
BUILD_BUG_ON(RDMA_NL_NUM_CLIENTS != 6);
if (type > RDMA_NL_NUM_CLIENTS - 1)
return false;
return (op < max_num_ops[type - 1]) ? true : false;
}
int ibnl_add_client(int index, int nops,
const struct ibnl_client_cbs cb_table[])
static bool is_nl_valid(unsigned int type, unsigned int op)
{
struct ibnl_client *cur;
struct ibnl_client *nl_client;
const struct rdma_nl_cbs *cb_table;
nl_client = kmalloc(sizeof *nl_client, GFP_KERNEL);
if (!nl_client)
return -ENOMEM;
if (!is_nl_msg_valid(type, op))
return false;
nl_client->index = index;
nl_client->nops = nops;
nl_client->cb_table = cb_table;
cb_table = rdma_nl_types[type].cb_table;
#ifdef CONFIG_MODULES
if (!cb_table) {
mutex_unlock(&rdma_nl_mutex);
request_module("rdma-netlink-subsys-%d", type);
mutex_lock(&rdma_nl_mutex);
cb_table = rdma_nl_types[type].cb_table;
}
#endif
mutex_lock(&ibnl_mutex);
if (!cb_table || (!cb_table[op].dump && !cb_table[op].doit))
return false;
return true;
}
list_for_each_entry(cur, &client_list, list) {
if (cur->index == index) {
pr_warn("Client for %d already exists\n", index);
mutex_unlock(&ibnl_mutex);
kfree(nl_client);
return -EINVAL;
}
void rdma_nl_register(unsigned int index,
const struct rdma_nl_cbs cb_table[])
{
mutex_lock(&rdma_nl_mutex);
if (!is_nl_msg_valid(index, 0)) {
/*
* All clients are not interesting in success/failure of
* this call. They want to see the print to error log and
* continue their initialization. Print warning for them,
* because it is programmer's error to be here.
*/
mutex_unlock(&rdma_nl_mutex);
WARN(true,
"The not-valid %u index was supplied to RDMA netlink\n",
index);
return;
}
list_add_tail(&nl_client->list, &client_list);
mutex_unlock(&ibnl_mutex);
return 0;
}
EXPORT_SYMBOL(ibnl_add_client);
int ibnl_remove_client(int index)
{
struct ibnl_client *cur, *next;
mutex_lock(&ibnl_mutex);
list_for_each_entry_safe(cur, next, &client_list, list) {
if (cur->index == index) {
list_del(&(cur->list));
mutex_unlock(&ibnl_mutex);
kfree(cur);
return 0;
}
if (rdma_nl_types[index].cb_table) {
mutex_unlock(&rdma_nl_mutex);
WARN(true,
"The %u index is already registered in RDMA netlink\n",
index);
return;
}
pr_warn("Can't remove callback for client idx %d. Not found\n", index);
mutex_unlock(&ibnl_mutex);
return -EINVAL;
rdma_nl_types[index].cb_table = cb_table;
mutex_unlock(&rdma_nl_mutex);
}
EXPORT_SYMBOL(ibnl_remove_client);
EXPORT_SYMBOL(rdma_nl_register);
void rdma_nl_unregister(unsigned int index)
{
mutex_lock(&rdma_nl_mutex);
rdma_nl_types[index].cb_table = NULL;
mutex_unlock(&rdma_nl_mutex);
}
EXPORT_SYMBOL(rdma_nl_unregister);
void *ibnl_put_msg(struct sk_buff *skb, struct nlmsghdr **nlh, int seq,
int len, int client, int op, int flags)
{
unsigned char *prev_tail;
prev_tail = skb_tail_pointer(skb);
*nlh = nlmsg_put(skb, 0, seq, RDMA_NL_GET_TYPE(client, op),
len, flags);
*nlh = nlmsg_put(skb, 0, seq, RDMA_NL_GET_TYPE(client, op), len, flags);
if (!*nlh)
goto out_nlmsg_trim;
(*nlh)->nlmsg_len = skb_tail_pointer(skb) - prev_tail;
return NULL;
return nlmsg_data(*nlh);
out_nlmsg_trim:
nlmsg_trim(skb, prev_tail);
return NULL;
}
EXPORT_SYMBOL(ibnl_put_msg);
int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
int len, void *data, int type)
{
unsigned char *prev_tail;
prev_tail = skb_tail_pointer(skb);
if (nla_put(skb, type, len, data))
goto nla_put_failure;
nlh->nlmsg_len += skb_tail_pointer(skb) - prev_tail;
if (nla_put(skb, type, len, data)) {
nlmsg_cancel(skb, nlh);
return -EMSGSIZE;
}
return 0;
nla_put_failure:
nlmsg_trim(skb, prev_tail - nlh->nlmsg_len);
return -EMSGSIZE;
}
EXPORT_SYMBOL(ibnl_put_attr);
static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct ibnl_client *client;
int type = nlh->nlmsg_type;
int index = RDMA_NL_GET_CLIENT(type);
unsigned int index = RDMA_NL_GET_CLIENT(type);
unsigned int op = RDMA_NL_GET_OP(type);
const struct rdma_nl_cbs *cb_table;
list_for_each_entry(client, &client_list, list) {
if (client->index == index) {
if (op >= client->nops || !client->cb_table[op].dump)
return -EINVAL;
if (!is_nl_valid(index, op))
return -EINVAL;
/*
* For response or local service set_timeout request,
* there is no need to use netlink_dump_start.
*/
if (!(nlh->nlmsg_flags & NLM_F_REQUEST) ||
(index == RDMA_NL_LS &&
op == RDMA_NL_LS_OP_SET_TIMEOUT)) {
struct netlink_callback cb = {
.skb = skb,
.nlh = nlh,
.dump = client->cb_table[op].dump,
.module = client->cb_table[op].module,
};
cb_table = rdma_nl_types[index].cb_table;
return cb.dump(skb, &cb);
}
if ((cb_table[op].flags & RDMA_NL_ADMIN_PERM) &&
!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
{
struct netlink_dump_control c = {
.dump = client->cb_table[op].dump,
.module = client->cb_table[op].module,
};
return netlink_dump_start(nls, skb, nlh, &c);
}
}
/* FIXME: Convert IWCM to properly handle doit callbacks */
if ((nlh->nlmsg_flags & NLM_F_DUMP) || index == RDMA_NL_RDMA_CM ||
index == RDMA_NL_IWCM) {
struct netlink_dump_control c = {
.dump = cb_table[op].dump,
};
return netlink_dump_start(nls, skb, nlh, &c);
}
pr_info("Index %d wasn't found in client list\n", index);
return -EINVAL;
if (cb_table[op].doit)
return cb_table[op].doit(skb, nlh, extack);
return 0;
}
static void ibnl_rcv_reply_skb(struct sk_buff *skb)
/*
* This function is similar to netlink_rcv_skb with one exception:
* It calls to the callback for the netlink messages without NLM_F_REQUEST
* flag. These messages are intended for RDMA_NL_LS consumer, so it is allowed
* for that consumer only.
*/
static int rdma_nl_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,
struct nlmsghdr *,
struct netlink_ext_ack *))
{
struct netlink_ext_ack extack = {};
struct nlmsghdr *nlh;
int msglen;
int err;
/*
* Process responses until there is no more message or the first
* request. Generally speaking, it is not recommended to mix responses
* with requests.
*/
while (skb->len >= nlmsg_total_size(0)) {
int msglen;
nlh = nlmsg_hdr(skb);
err = 0;
if (nlh->nlmsg_len < NLMSG_HDRLEN || skb->len < nlh->nlmsg_len)
return;
return 0;
/* Handle response only */
if (nlh->nlmsg_flags & NLM_F_REQUEST)
return;
/*
* Generally speaking, the only requests are handled
* by the kernel, but RDMA_NL_LS is different, because it
* runs backward netlink scheme. Kernel initiates messages
* and waits for reply with data to keep pathrecord cache
* in sync.
*/
if (!(nlh->nlmsg_flags & NLM_F_REQUEST) &&
(RDMA_NL_GET_CLIENT(nlh->nlmsg_type) != RDMA_NL_LS))
goto ack;
ibnl_rcv_msg(skb, nlh, NULL);
/* Skip control messages */
if (nlh->nlmsg_type < NLMSG_MIN_TYPE)
goto ack;
err = cb(skb, nlh, &extack);
if (err == -EINTR)
goto skip;
ack:
if (nlh->nlmsg_flags & NLM_F_ACK || err)
netlink_ack(skb, nlh, err, &extack);
skip:
msglen = NLMSG_ALIGN(nlh->nlmsg_len);
if (msglen > skb->len)
msglen = skb->len;
skb_pull(skb, msglen);
}
return 0;
}
static void ibnl_rcv(struct sk_buff *skb)
static void rdma_nl_rcv(struct sk_buff *skb)
{
mutex_lock(&ibnl_mutex);
ibnl_rcv_reply_skb(skb);
netlink_rcv_skb(skb, &ibnl_rcv_msg);
mutex_unlock(&ibnl_mutex);
mutex_lock(&rdma_nl_mutex);
rdma_nl_rcv_skb(skb, &rdma_nl_rcv_msg);
mutex_unlock(&rdma_nl_mutex);
}
int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
__u32 pid)
int rdma_nl_unicast(struct sk_buff *skb, u32 pid)
{
int err;
err = netlink_unicast(nls, skb, pid, MSG_DONTWAIT);
return (err < 0) ? err : 0;
}
EXPORT_SYMBOL(rdma_nl_unicast);
int rdma_nl_unicast_wait(struct sk_buff *skb, __u32 pid)
{
int err;
err = netlink_unicast(nls, skb, pid, 0);
return (err < 0) ? err : 0;
}
EXPORT_SYMBOL(ibnl_unicast);
EXPORT_SYMBOL(rdma_nl_unicast_wait);
int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh,
unsigned int group, gfp_t flags)
int rdma_nl_multicast(struct sk_buff *skb, unsigned int group, gfp_t flags)
{
return nlmsg_multicast(nls, skb, 0, group, flags);
}
EXPORT_SYMBOL(ibnl_multicast);
EXPORT_SYMBOL(rdma_nl_multicast);
int __init ibnl_init(void)
int __init rdma_nl_init(void)
{
struct netlink_kernel_cfg cfg = {
.input = ibnl_rcv,
.input = rdma_nl_rcv,
};
nls = netlink_kernel_create(&init_net, NETLINK_RDMA, &cfg);
if (!nls) {
pr_warn("Failed to create netlink socket\n");
if (!nls)
return -ENOMEM;
}
nls->sk_sndtimeo = 10 * HZ;
return 0;
}
void ibnl_cleanup(void)
void rdma_nl_exit(void)
{
struct ibnl_client *cur, *next;
int idx;
mutex_lock(&ibnl_mutex);
list_for_each_entry_safe(cur, next, &client_list, list) {
list_del(&(cur->list));
kfree(cur);
}
mutex_unlock(&ibnl_mutex);
for (idx = 0; idx < RDMA_NL_NUM_CLIENTS; idx++)
rdma_nl_unregister(idx);
netlink_kernel_release(nls);
}
MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_RDMA);

View File

@ -0,0 +1,325 @@
/*
* Copyright (c) 2017 Mellanox Technologies. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the names of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* Alternatively, this software may be distributed under the terms of the
* GNU General Public License ("GPL") version 2 as published by the Free
* Software Foundation.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
#include <linux/module.h>
#include <net/netlink.h>
#include <rdma/rdma_netlink.h>
#include "core_priv.h"
static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
[RDMA_NLDEV_ATTR_DEV_INDEX] = { .type = NLA_U32 },
[RDMA_NLDEV_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING,
.len = IB_DEVICE_NAME_MAX - 1},
[RDMA_NLDEV_ATTR_PORT_INDEX] = { .type = NLA_U32 },
[RDMA_NLDEV_ATTR_FW_VERSION] = { .type = NLA_NUL_STRING,
.len = IB_FW_VERSION_NAME_MAX - 1},
[RDMA_NLDEV_ATTR_NODE_GUID] = { .type = NLA_U64 },
[RDMA_NLDEV_ATTR_SYS_IMAGE_GUID] = { .type = NLA_U64 },
[RDMA_NLDEV_ATTR_SUBNET_PREFIX] = { .type = NLA_U64 },
[RDMA_NLDEV_ATTR_LID] = { .type = NLA_U32 },
[RDMA_NLDEV_ATTR_SM_LID] = { .type = NLA_U32 },
[RDMA_NLDEV_ATTR_LMC] = { .type = NLA_U8 },
[RDMA_NLDEV_ATTR_PORT_STATE] = { .type = NLA_U8 },
[RDMA_NLDEV_ATTR_PORT_PHYS_STATE] = { .type = NLA_U8 },
[RDMA_NLDEV_ATTR_DEV_NODE_TYPE] = { .type = NLA_U8 },
};
static int fill_dev_info(struct sk_buff *msg, struct ib_device *device)
{
char fw[IB_FW_VERSION_NAME_MAX];
if (nla_put_u32(msg, RDMA_NLDEV_ATTR_DEV_INDEX, device->index))
return -EMSGSIZE;
if (nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_NAME, device->name))
return -EMSGSIZE;
if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, rdma_end_port(device)))
return -EMSGSIZE;
BUILD_BUG_ON(sizeof(device->attrs.device_cap_flags) != sizeof(u64));
if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_CAP_FLAGS,
device->attrs.device_cap_flags, 0))
return -EMSGSIZE;
ib_get_device_fw_str(device, fw);
/* Device without FW has strlen(fw) */
if (strlen(fw) && nla_put_string(msg, RDMA_NLDEV_ATTR_FW_VERSION, fw))
return -EMSGSIZE;
if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_NODE_GUID,
be64_to_cpu(device->node_guid), 0))
return -EMSGSIZE;
if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_SYS_IMAGE_GUID,
be64_to_cpu(device->attrs.sys_image_guid), 0))
return -EMSGSIZE;
if (nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_NODE_TYPE, device->node_type))
return -EMSGSIZE;
return 0;
}
static int fill_port_info(struct sk_buff *msg,
struct ib_device *device, u32 port)
{
struct ib_port_attr attr;
int ret;
if (nla_put_u32(msg, RDMA_NLDEV_ATTR_DEV_INDEX, device->index))
return -EMSGSIZE;
if (nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_NAME, device->name))
return -EMSGSIZE;
if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port))
return -EMSGSIZE;
ret = ib_query_port(device, port, &attr);
if (ret)
return ret;
BUILD_BUG_ON(sizeof(attr.port_cap_flags) > sizeof(u64));
if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_CAP_FLAGS,
(u64)attr.port_cap_flags, 0))
return -EMSGSIZE;
if (rdma_protocol_ib(device, port) &&
nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_SUBNET_PREFIX,
attr.subnet_prefix, 0))
return -EMSGSIZE;
if (rdma_protocol_ib(device, port)) {
if (nla_put_u32(msg, RDMA_NLDEV_ATTR_LID, attr.lid))
return -EMSGSIZE;
if (nla_put_u32(msg, RDMA_NLDEV_ATTR_SM_LID, attr.sm_lid))
return -EMSGSIZE;
if (nla_put_u8(msg, RDMA_NLDEV_ATTR_LMC, attr.lmc))
return -EMSGSIZE;
}
if (nla_put_u8(msg, RDMA_NLDEV_ATTR_PORT_STATE, attr.state))
return -EMSGSIZE;
if (nla_put_u8(msg, RDMA_NLDEV_ATTR_PORT_PHYS_STATE, attr.phys_state))
return -EMSGSIZE;
return 0;
}
static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
struct ib_device *device;
struct sk_buff *msg;
u32 index;
int err;
err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
nldev_policy, extack);
if (err || !tb[RDMA_NLDEV_ATTR_DEV_INDEX])
return -EINVAL;
index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
device = __ib_device_get_by_index(index);
if (!device)
return -EINVAL;
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!msg)
return -ENOMEM;
nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET),
0, 0);
err = fill_dev_info(msg, device);
if (err) {
nlmsg_free(msg);
return err;
}
nlmsg_end(msg, nlh);
return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
}
static int _nldev_get_dumpit(struct ib_device *device,
struct sk_buff *skb,
struct netlink_callback *cb,
unsigned int idx)
{
int start = cb->args[0];
struct nlmsghdr *nlh;
if (idx < start)
return 0;
nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET),
0, NLM_F_MULTI);
if (fill_dev_info(skb, device)) {
nlmsg_cancel(skb, nlh);
goto out;
}
nlmsg_end(skb, nlh);
idx++;
out: cb->args[0] = idx;
return skb->len;
}
static int nldev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
{
/*
* There is no need to take lock, because
* we are relying on ib_core's lists_rwsem
*/
return ib_enum_all_devs(_nldev_get_dumpit, skb, cb);
}
static int nldev_port_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
struct ib_device *device;
struct sk_buff *msg;
u32 index;
u32 port;
int err;
err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
nldev_policy, extack);
if (err || !tb[RDMA_NLDEV_ATTR_PORT_INDEX])
return -EINVAL;
index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
device = __ib_device_get_by_index(index);
if (!device)
return -EINVAL;
port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
if (!rdma_is_port_valid(device, port))
return -EINVAL;
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!msg)
return -ENOMEM;
nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET),
0, 0);
err = fill_port_info(msg, device, port);
if (err) {
nlmsg_free(msg);
return err;
}
nlmsg_end(msg, nlh);
return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
}
static int nldev_port_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb)
{
struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
struct ib_device *device;
int start = cb->args[0];
struct nlmsghdr *nlh;
u32 idx = 0;
u32 ifindex;
int err;
u32 p;
err = nlmsg_parse(cb->nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
nldev_policy, NULL);
if (err || !tb[RDMA_NLDEV_ATTR_DEV_INDEX])
return -EINVAL;
ifindex = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
device = __ib_device_get_by_index(ifindex);
if (!device)
return -EINVAL;
for (p = rdma_start_port(device); p <= rdma_end_port(device); ++p) {
/*
* The dumpit function returns all information from specific
* index. This specific index is taken from the netlink
* messages request sent by user and it is available
* in cb->args[0].
*
* Usually, the user doesn't fill this field and it causes
* to return everything.
*
*/
if (idx < start) {
idx++;
continue;
}
nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
RDMA_NLDEV_CMD_PORT_GET),
0, NLM_F_MULTI);
if (fill_port_info(skb, device, p)) {
nlmsg_cancel(skb, nlh);
goto out;
}
idx++;
nlmsg_end(skb, nlh);
}
out: cb->args[0] = idx;
return skb->len;
}
static const struct rdma_nl_cbs nldev_cb_table[] = {
[RDMA_NLDEV_CMD_GET] = {
.doit = nldev_get_doit,
.dump = nldev_get_dumpit,
},
[RDMA_NLDEV_CMD_PORT_GET] = {
.doit = nldev_port_get_doit,
.dump = nldev_port_get_dumpit,
},
};
void __init nldev_init(void)
{
rdma_nl_register(RDMA_NL_NLDEV, nldev_cb_table);
}
void __exit nldev_exit(void)
{
rdma_nl_unregister(RDMA_NL_NLDEV);
}
MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_NLDEV, 5);

View File

@ -35,10 +35,57 @@
#include <rdma/ib_verbs.h>
#include <rdma/uverbs_types.h>
#include <linux/rcupdate.h>
#include <rdma/uverbs_ioctl.h>
#include <rdma/rdma_user_ioctl.h>
#include "uverbs.h"
#include "core_priv.h"
#include "rdma_core.h"
int uverbs_ns_idx(u16 *id, unsigned int ns_count)
{
int ret = (*id & UVERBS_ID_NS_MASK) >> UVERBS_ID_NS_SHIFT;
if (ret >= ns_count)
return -EINVAL;
*id &= ~UVERBS_ID_NS_MASK;
return ret;
}
const struct uverbs_object_spec *uverbs_get_object(const struct ib_device *ibdev,
uint16_t object)
{
const struct uverbs_root_spec *object_hash = ibdev->specs_root;
const struct uverbs_object_spec_hash *objects;
int ret = uverbs_ns_idx(&object, object_hash->num_buckets);
if (ret < 0)
return NULL;
objects = object_hash->object_buckets[ret];
if (object >= objects->num_objects)
return NULL;
return objects->objects[object];
}
const struct uverbs_method_spec *uverbs_get_method(const struct uverbs_object_spec *object,
uint16_t method)
{
const struct uverbs_method_spec_hash *methods;
int ret = uverbs_ns_idx(&method, object->num_buckets);
if (ret < 0)
return NULL;
methods = object->method_buckets[ret];
if (method >= methods->num_methods)
return NULL;
return methods->methods[method];
}
void uverbs_uobject_get(struct ib_uobject *uobject)
{
kref_get(&uobject->ref);
@ -404,6 +451,41 @@ int __must_check rdma_remove_commit_uobject(struct ib_uobject *uobj)
return ret;
}
static int null_obj_type_class_remove_commit(struct ib_uobject *uobj,
enum rdma_remove_reason why)
{
return 0;
}
static const struct uverbs_obj_type null_obj_type = {
.type_class = &((const struct uverbs_obj_type_class){
.remove_commit = null_obj_type_class_remove_commit,
/* be cautious */
.needs_kfree_rcu = true}),
};
int rdma_explicit_destroy(struct ib_uobject *uobject)
{
int ret;
struct ib_ucontext *ucontext = uobject->context;
/* Cleanup is running. Calling this should have been impossible */
if (!down_read_trylock(&ucontext->cleanup_rwsem)) {
WARN(true, "ib_uverbs: Cleanup is running while removing an uobject\n");
return 0;
}
lockdep_check(uobject, true);
ret = uobject->type->type_class->remove_commit(uobject,
RDMA_REMOVE_DESTROY);
if (ret)
return ret;
uobject->type = &null_obj_type;
up_read(&ucontext->cleanup_rwsem);
return 0;
}
static void alloc_commit_idr_uobject(struct ib_uobject *uobj)
{
uverbs_uobject_add(uobj);
@ -625,3 +707,100 @@ const struct uverbs_obj_type_class uverbs_fd_class = {
.needs_kfree_rcu = false,
};
struct ib_uobject *uverbs_get_uobject_from_context(const struct uverbs_obj_type *type_attrs,
struct ib_ucontext *ucontext,
enum uverbs_obj_access access,
int id)
{
switch (access) {
case UVERBS_ACCESS_READ:
return rdma_lookup_get_uobject(type_attrs, ucontext, id, false);
case UVERBS_ACCESS_DESTROY:
case UVERBS_ACCESS_WRITE:
return rdma_lookup_get_uobject(type_attrs, ucontext, id, true);
case UVERBS_ACCESS_NEW:
return rdma_alloc_begin_uobject(type_attrs, ucontext);
default:
WARN_ON(true);
return ERR_PTR(-EOPNOTSUPP);
}
}
int uverbs_finalize_object(struct ib_uobject *uobj,
enum uverbs_obj_access access,
bool commit)
{
int ret = 0;
/*
* refcounts should be handled at the object level and not at the
* uobject level. Refcounts of the objects themselves are done in
* handlers.
*/
switch (access) {
case UVERBS_ACCESS_READ:
rdma_lookup_put_uobject(uobj, false);
break;
case UVERBS_ACCESS_WRITE:
rdma_lookup_put_uobject(uobj, true);
break;
case UVERBS_ACCESS_DESTROY:
if (commit)
ret = rdma_remove_commit_uobject(uobj);
else
rdma_lookup_put_uobject(uobj, true);
break;
case UVERBS_ACCESS_NEW:
if (commit)
ret = rdma_alloc_commit_uobject(uobj);
else
rdma_alloc_abort_uobject(uobj);
break;
default:
WARN_ON(true);
ret = -EOPNOTSUPP;
}
return ret;
}
int uverbs_finalize_objects(struct uverbs_attr_bundle *attrs_bundle,
struct uverbs_attr_spec_hash * const *spec_hash,
size_t num,
bool commit)
{
unsigned int i;
int ret = 0;
for (i = 0; i < num; i++) {
struct uverbs_attr_bundle_hash *curr_bundle =
&attrs_bundle->hash[i];
const struct uverbs_attr_spec_hash *curr_spec_bucket =
spec_hash[i];
unsigned int j;
for (j = 0; j < curr_bundle->num_attrs; j++) {
struct uverbs_attr *attr;
const struct uverbs_attr_spec *spec;
if (!uverbs_attr_is_valid_in_hash(curr_bundle, j))
continue;
attr = &curr_bundle->attrs[j];
spec = &curr_spec_bucket->attrs[j];
if (spec->type == UVERBS_ATTR_TYPE_IDR ||
spec->type == UVERBS_ATTR_TYPE_FD) {
int current_ret;
current_ret = uverbs_finalize_object(attr->obj_attr.uobject,
spec->obj.access,
commit);
if (!ret)
ret = current_ret;
}
}
}
return ret;
}

View File

@ -39,9 +39,15 @@
#include <linux/idr.h>
#include <rdma/uverbs_types.h>
#include <rdma/uverbs_ioctl.h>
#include <rdma/ib_verbs.h>
#include <linux/mutex.h>
int uverbs_ns_idx(u16 *id, unsigned int ns_count);
const struct uverbs_object_spec *uverbs_get_object(const struct ib_device *ibdev,
uint16_t object);
const struct uverbs_method_spec *uverbs_get_method(const struct uverbs_object_spec *object,
uint16_t method);
/*
* These functions initialize the context and cleanups its uobjects.
* The context has a list of objects which is protected by a mutex
@ -75,4 +81,40 @@ void uverbs_uobject_put(struct ib_uobject *uobject);
*/
void uverbs_close_fd(struct file *f);
/*
* Get an ib_uobject that corresponds to the given id from ucontext, assuming
* the object is from the given type. Lock it to the required access when
* applicable.
* This function could create (access == NEW), destroy (access == DESTROY)
* or unlock (access == READ || access == WRITE) objects if required.
* The action will be finalized only when uverbs_finalize_object or
* uverbs_finalize_objects are called.
*/
struct ib_uobject *uverbs_get_uobject_from_context(const struct uverbs_obj_type *type_attrs,
struct ib_ucontext *ucontext,
enum uverbs_obj_access access,
int id);
int uverbs_finalize_object(struct ib_uobject *uobj,
enum uverbs_obj_access access,
bool commit);
/*
* Note that certain finalize stages could return a status:
* (a) alloc_commit could return a failure if the object is committed at the
* same time when the context is destroyed.
* (b) remove_commit could fail if the object wasn't destroyed successfully.
* Since multiple objects could be finalized in one transaction, it is very NOT
* recommended to have several finalize actions which have side effects.
* For example, it's NOT recommended to have a certain action which has both
* a commit action and a destroy action or two destroy objects in the same
* action. The rule of thumb is to have one destroy or commit action with
* multiple lookups.
* The first non zero return value of finalize_object is returned from this
* function. For example, this could happen when we couldn't destroy an
* object.
*/
int uverbs_finalize_objects(struct uverbs_attr_bundle *attrs_bundle,
struct uverbs_attr_spec_hash * const *spec_hash,
size_t num,
bool commit);
#endif /* RDMA_CORE_H */

View File

@ -44,6 +44,8 @@
static struct workqueue_struct *gid_cache_wq;
static struct workqueue_struct *gid_cache_wq;
enum gid_op_type {
GID_DEL = 0,
GID_ADD

View File

@ -50,6 +50,7 @@
#include <uapi/rdma/ib_user_sa.h>
#include <rdma/ib_marshall.h>
#include <rdma/ib_addr.h>
#include <rdma/opa_addr.h>
#include "sa.h"
#include "core_priv.h"
@ -861,7 +862,7 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
/* Repair the nlmsg header length */
nlmsg_end(skb, nlh);
ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, gfp_mask);
ret = rdma_nl_multicast(skb, RDMA_NL_GROUP_LS, gfp_mask);
if (!ret)
ret = len;
else
@ -1021,9 +1022,9 @@ static void ib_nl_request_timeout(struct work_struct *work)
}
int ib_nl_handle_set_timeout(struct sk_buff *skb,
struct netlink_callback *cb)
struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
int timeout, delta, abs_delta;
const struct nlattr *attr;
unsigned long flags;
@ -1033,8 +1034,7 @@ int ib_nl_handle_set_timeout(struct sk_buff *skb,
int ret;
if (!(nlh->nlmsg_flags & NLM_F_REQUEST) ||
!(NETLINK_CB(skb).sk) ||
!netlink_capable(skb, CAP_NET_ADMIN))
!(NETLINK_CB(skb).sk))
return -EPERM;
ret = nla_parse(tb, LS_NLA_TYPE_MAX - 1, nlmsg_data(nlh),
@ -1098,9 +1098,9 @@ static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh)
}
int ib_nl_handle_resolve_resp(struct sk_buff *skb,
struct netlink_callback *cb)
struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
unsigned long flags;
struct ib_sa_query *query;
struct ib_mad_send_buf *send_buf;
@ -1109,8 +1109,7 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb,
int ret;
if ((nlh->nlmsg_flags & NLM_F_REQUEST) ||
!(NETLINK_CB(skb).sk) ||
!netlink_capable(skb, CAP_NET_ADMIN))
!(NETLINK_CB(skb).sk))
return -EPERM;
spin_lock_irqsave(&ib_nl_request_lock, flags);
@ -1241,6 +1240,11 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
ah_attr->type = rdma_ah_find_type(device, port_num);
rdma_ah_set_dlid(ah_attr, be32_to_cpu(sa_path_get_dlid(rec)));
if ((ah_attr->type == RDMA_AH_ATTR_TYPE_OPA) &&
(rdma_ah_get_dlid(ah_attr) == be16_to_cpu(IB_LID_PERMISSIVE)))
rdma_ah_set_make_grd(ah_attr, true);
rdma_ah_set_sl(ah_attr, rec->sl);
rdma_ah_set_path_bits(ah_attr, be32_to_cpu(sa_path_get_slid(rec)) &
get_src_path_mask(device, port_num));
@ -1420,7 +1424,7 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
if ((query->flags & IB_SA_ENABLE_LOCAL_SERVICE) &&
(!(query->flags & IB_SA_QUERY_OPA))) {
if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) {
if (!rdma_nl_chk_listeners(RDMA_NL_GROUP_LS)) {
if (!ib_nl_make_request(query, gfp_mask))
return id;
}
@ -2290,12 +2294,15 @@ static void update_sm_ah(struct work_struct *work)
rdma_ah_set_sl(&ah_attr, port_attr.sm_sl);
rdma_ah_set_port_num(&ah_attr, port->port_num);
if (port_attr.grh_required) {
rdma_ah_set_ah_flags(&ah_attr, IB_AH_GRH);
rdma_ah_set_subnet_prefix(&ah_attr,
cpu_to_be64(port_attr.subnet_prefix));
rdma_ah_set_interface_id(&ah_attr,
cpu_to_be64(IB_SA_WELL_KNOWN_GUID));
if (ah_attr.type == RDMA_AH_ATTR_TYPE_OPA) {
rdma_ah_set_make_grd(&ah_attr, true);
} else {
rdma_ah_set_ah_flags(&ah_attr, IB_AH_GRH);
rdma_ah_set_subnet_prefix(&ah_attr,
cpu_to_be64(port_attr.subnet_prefix));
rdma_ah_set_interface_id(&ah_attr,
cpu_to_be64(IB_SA_WELL_KNOWN_GUID));
}
}
new_ah->ah = rdma_create_ah(port->agent->qp->pd, &ah_attr);
@ -2410,8 +2417,7 @@ static void ib_sa_add_one(struct ib_device *device)
*/
INIT_IB_EVENT_HANDLER(&sa_dev->event_handler, device, ib_sa_event);
if (ib_register_event_handler(&sa_dev->event_handler))
goto err;
ib_register_event_handler(&sa_dev->event_handler);
for (i = 0; i <= e - s; ++i) {
if (rdma_cap_ib_sa(device, i + 1))

View File

@ -1210,8 +1210,8 @@ static ssize_t show_fw_ver(struct device *device, struct device_attribute *attr,
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
ib_get_device_fw_str(dev, buf, PAGE_SIZE);
strlcat(buf, "\n", PAGE_SIZE);
ib_get_device_fw_str(dev, buf);
strlcat(buf, "\n", IB_FW_VERSION_NAME_MAX);
return strlen(buf);
}

View File

@ -618,7 +618,7 @@ static ssize_t ib_ucm_init_qp_attr(struct ib_ucm_file *file,
if (result)
goto out;
ib_copy_qp_attr_to_user(&resp, &qp_attr);
ib_copy_qp_attr_to_user(ctx->cm_id->device, &resp, &qp_attr);
if (copy_to_user((void __user *)(unsigned long)cmd.response,
&resp, sizeof(resp)))

View File

@ -248,14 +248,15 @@ static void ucma_copy_conn_event(struct rdma_ucm_conn_param *dst,
dst->qp_num = src->qp_num;
}
static void ucma_copy_ud_event(struct rdma_ucm_ud_param *dst,
static void ucma_copy_ud_event(struct ib_device *device,
struct rdma_ucm_ud_param *dst,
struct rdma_ud_param *src)
{
if (src->private_data_len)
memcpy(dst->private_data, src->private_data,
src->private_data_len);
dst->private_data_len = src->private_data_len;
ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr);
ib_copy_ah_attr_to_user(device, &dst->ah_attr, &src->ah_attr);
dst->qp_num = src->qp_num;
dst->qkey = src->qkey;
}
@ -335,7 +336,8 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
uevent->resp.event = event->event;
uevent->resp.status = event->status;
if (cm_id->qp_type == IB_QPT_UD)
ucma_copy_ud_event(&uevent->resp.param.ud, &event->param.ud);
ucma_copy_ud_event(cm_id->device, &uevent->resp.param.ud,
&event->param.ud);
else
ucma_copy_conn_event(&uevent->resp.param.conn,
&event->param.conn);
@ -1157,7 +1159,7 @@ static ssize_t ucma_init_qp_attr(struct ucma_file *file,
if (ret)
goto out;
ib_copy_qp_attr_to_user(&resp, &qp_attr);
ib_copy_qp_attr_to_user(ctx->cm_id->device, &resp, &qp_attr);
if (copy_to_user((void __user *)(unsigned long)cmd.response,
&resp, sizeof(resp)))
ret = -EFAULT;

View File

@ -229,7 +229,7 @@ static void recv_handler(struct ib_mad_agent *agent,
packet->mad.hdr.status = 0;
packet->mad.hdr.length = hdr_size(file) + mad_recv_wc->mad_len;
packet->mad.hdr.qpn = cpu_to_be32(mad_recv_wc->wc->src_qp);
packet->mad.hdr.lid = cpu_to_be16(mad_recv_wc->wc->slid);
packet->mad.hdr.lid = ib_lid_be16(mad_recv_wc->wc->slid);
packet->mad.hdr.sl = mad_recv_wc->wc->sl;
packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits;
packet->mad.hdr.pkey_index = mad_recv_wc->wc->pkey_index;

View File

@ -100,6 +100,7 @@ struct ib_uverbs_device {
struct mutex lists_mutex; /* protect lists */
struct list_head uverbs_file_list;
struct list_head uverbs_events_file_list;
struct uverbs_root_spec *specs_root;
};
struct ib_uverbs_event_queue {
@ -218,6 +219,8 @@ int uverbs_dealloc_mw(struct ib_mw *mw);
void ib_uverbs_detach_umcast(struct ib_qp *qp,
struct ib_uqp_object *uobj);
long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
struct ib_uverbs_flow_spec {
union {
union {

View File

@ -91,9 +91,10 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
goto err;
}
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
ret = ib_rdmacg_try_charge(&cg_obj, ib_dev, RDMACG_RESOURCE_HCA_HANDLE);
if (ret)
@ -275,8 +276,14 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
resp.bad_pkey_cntr = attr.bad_pkey_cntr;
resp.qkey_viol_cntr = attr.qkey_viol_cntr;
resp.pkey_tbl_len = attr.pkey_tbl_len;
resp.lid = attr.lid;
resp.sm_lid = attr.sm_lid;
if (rdma_cap_opa_ah(ib_dev, cmd.port_num)) {
resp.lid = OPA_TO_IB_UCAST_LID(attr.lid);
resp.sm_lid = OPA_TO_IB_UCAST_LID(attr.sm_lid);
} else {
resp.lid = ib_lid_cpu16(attr.lid);
resp.sm_lid = ib_lid_cpu16(attr.sm_lid);
}
resp.lmc = attr.lmc;
resp.max_vl_num = attr.max_vl_num;
resp.sm_sl = attr.sm_sl;
@ -313,9 +320,10 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
uobj = uobj_alloc(uobj_get_type(pd), file->ucontext);
if (IS_ERR(uobj))
@ -482,9 +490,10 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
mutex_lock(&file->device->xrcd_tree_mutex);
@ -646,9 +655,10 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
if ((cmd.start & ~PAGE_MASK) != (cmd.hca_va & ~PAGE_MASK))
return -EINVAL;
@ -740,7 +750,8 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd), out_len - sizeof(resp));
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
if (cmd.flags & ~IB_MR_REREG_SUPPORTED || !cmd.flags)
return -EINVAL;
@ -1080,7 +1091,8 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
INIT_UDATA(&uhw, buf + sizeof(cmd),
(unsigned long)cmd.response + sizeof(resp),
in_len - sizeof(cmd), out_len - sizeof(resp));
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
memset(&cmd_ex, 0, sizeof(cmd_ex));
cmd_ex.user_handle = cmd.user_handle;
@ -1161,9 +1173,10 @@ ssize_t ib_uverbs_resize_cq(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
cq = uobj_get_obj_read(cq, cmd.cq_handle, file->ucontext);
if (!cq)
@ -1185,7 +1198,8 @@ out:
return ret ? ret : in_len;
}
static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
static int copy_wc_to_user(struct ib_device *ib_dev, void __user *dest,
struct ib_wc *wc)
{
struct ib_uverbs_wc tmp;
@ -1199,7 +1213,10 @@ static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
tmp.src_qp = wc->src_qp;
tmp.wc_flags = wc->wc_flags;
tmp.pkey_index = wc->pkey_index;
tmp.slid = wc->slid;
if (rdma_cap_opa_ah(ib_dev, wc->port_num))
tmp.slid = OPA_TO_IB_UCAST_LID(wc->slid);
else
tmp.slid = ib_lid_cpu16(wc->slid);
tmp.sl = wc->sl;
tmp.dlid_path_bits = wc->dlid_path_bits;
tmp.port_num = wc->port_num;
@ -1243,7 +1260,7 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
if (!ret)
break;
ret = copy_wc_to_user(data_ptr, &wc);
ret = copy_wc_to_user(ib_dev, data_ptr, &wc);
if (ret)
goto out_put;
@ -1383,8 +1400,9 @@ static int create_qp(struct ib_uverbs_file *file,
attr.rwq_ind_tbl = ind_tbl;
}
if ((cmd_sz >= offsetof(typeof(*cmd), reserved1) +
sizeof(cmd->reserved1)) && cmd->reserved1) {
if (cmd_sz > sizeof(*cmd) &&
!ib_is_udata_cleared(ucore, sizeof(*cmd),
cmd_sz - sizeof(*cmd))) {
ret = -EOPNOTSUPP;
goto err_put;
}
@ -1420,7 +1438,7 @@ static int create_qp(struct ib_uverbs_file *file,
if (cmd->is_srq) {
srq = uobj_get_obj_read(srq, cmd->srq_handle,
file->ucontext);
if (!srq || srq->srq_type != IB_SRQT_BASIC) {
if (!srq || srq->srq_type == IB_SRQT_XRC) {
ret = -EINVAL;
goto err_put;
}
@ -1482,11 +1500,21 @@ static int create_qp(struct ib_uverbs_file *file,
IB_QP_CREATE_MANAGED_SEND |
IB_QP_CREATE_MANAGED_RECV |
IB_QP_CREATE_SCATTER_FCS |
IB_QP_CREATE_CVLAN_STRIPPING)) {
IB_QP_CREATE_CVLAN_STRIPPING |
IB_QP_CREATE_SOURCE_QPN)) {
ret = -EINVAL;
goto err_put;
}
if (attr.create_flags & IB_QP_CREATE_SOURCE_QPN) {
if (!capable(CAP_NET_RAW)) {
ret = -EPERM;
goto err_put;
}
attr.source_qpn = cmd->source_qpn;
}
buf = (void *)cmd + sizeof(*cmd);
if (cmd_sz > sizeof(*cmd))
if (!(buf[0] == 0 && !memcmp(buf, buf + 1,
@ -1722,9 +1750,10 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd, out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
obj = (struct ib_uqp_object *)uobj_alloc(uobj_get_type(qp),
file->ucontext);
@ -1791,6 +1820,28 @@ err_put:
return ret;
}
static void copy_ah_attr_to_uverbs(struct ib_uverbs_qp_dest *uverb_attr,
struct rdma_ah_attr *rdma_attr)
{
const struct ib_global_route *grh;
uverb_attr->dlid = rdma_ah_get_dlid(rdma_attr);
uverb_attr->sl = rdma_ah_get_sl(rdma_attr);
uverb_attr->src_path_bits = rdma_ah_get_path_bits(rdma_attr);
uverb_attr->static_rate = rdma_ah_get_static_rate(rdma_attr);
uverb_attr->is_global = !!(rdma_ah_get_ah_flags(rdma_attr) &
IB_AH_GRH);
if (uverb_attr->is_global) {
grh = rdma_ah_read_grh(rdma_attr);
memcpy(uverb_attr->dgid, grh->dgid.raw, 16);
uverb_attr->flow_label = grh->flow_label;
uverb_attr->sgid_index = grh->sgid_index;
uverb_attr->hop_limit = grh->hop_limit;
uverb_attr->traffic_class = grh->traffic_class;
}
uverb_attr->port_num = rdma_ah_get_port_num(rdma_attr);
}
ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
const char __user *buf, int in_len,
@ -1801,7 +1852,6 @@ ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file,
struct ib_qp *qp;
struct ib_qp_attr *attr;
struct ib_qp_init_attr *init_attr;
const struct ib_global_route *grh;
int ret;
if (copy_from_user(&cmd, buf, sizeof cmd))
@ -1851,39 +1901,8 @@ ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file,
resp.alt_port_num = attr->alt_port_num;
resp.alt_timeout = attr->alt_timeout;
resp.dest.dlid = rdma_ah_get_dlid(&attr->ah_attr);
resp.dest.sl = rdma_ah_get_sl(&attr->ah_attr);
resp.dest.src_path_bits = rdma_ah_get_path_bits(&attr->ah_attr);
resp.dest.static_rate = rdma_ah_get_static_rate(&attr->ah_attr);
resp.dest.is_global = !!(rdma_ah_get_ah_flags(&attr->ah_attr) &
IB_AH_GRH);
if (resp.dest.is_global) {
grh = rdma_ah_read_grh(&attr->ah_attr);
memcpy(resp.dest.dgid, grh->dgid.raw, 16);
resp.dest.flow_label = grh->flow_label;
resp.dest.sgid_index = grh->sgid_index;
resp.dest.hop_limit = grh->hop_limit;
resp.dest.traffic_class = grh->traffic_class;
}
resp.dest.port_num = rdma_ah_get_port_num(&attr->ah_attr);
resp.alt_dest.dlid = rdma_ah_get_dlid(&attr->alt_ah_attr);
resp.alt_dest.sl = rdma_ah_get_sl(&attr->alt_ah_attr);
resp.alt_dest.src_path_bits = rdma_ah_get_path_bits(&attr->alt_ah_attr);
resp.alt_dest.static_rate
= rdma_ah_get_static_rate(&attr->alt_ah_attr);
resp.alt_dest.is_global
= !!(rdma_ah_get_ah_flags(&attr->alt_ah_attr) &
IB_AH_GRH);
if (resp.alt_dest.is_global) {
grh = rdma_ah_read_grh(&attr->alt_ah_attr);
memcpy(resp.alt_dest.dgid, grh->dgid.raw, 16);
resp.alt_dest.flow_label = grh->flow_label;
resp.alt_dest.sgid_index = grh->sgid_index;
resp.alt_dest.hop_limit = grh->hop_limit;
resp.alt_dest.traffic_class = grh->traffic_class;
}
resp.alt_dest.port_num = rdma_ah_get_port_num(&attr->alt_ah_attr);
copy_ah_attr_to_uverbs(&resp.dest, &attr->ah_attr);
copy_ah_attr_to_uverbs(&resp.alt_dest, &attr->alt_ah_attr);
resp.max_send_wr = init_attr->cap.max_send_wr;
resp.max_recv_wr = init_attr->cap.max_recv_wr;
@ -1917,6 +1936,29 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
}
}
static void copy_ah_attr_from_uverbs(struct ib_device *dev,
struct rdma_ah_attr *rdma_attr,
struct ib_uverbs_qp_dest *uverb_attr)
{
rdma_attr->type = rdma_ah_find_type(dev, uverb_attr->port_num);
if (uverb_attr->is_global) {
rdma_ah_set_grh(rdma_attr, NULL,
uverb_attr->flow_label,
uverb_attr->sgid_index,
uverb_attr->hop_limit,
uverb_attr->traffic_class);
rdma_ah_set_dgid_raw(rdma_attr, uverb_attr->dgid);
} else {
rdma_ah_set_ah_flags(rdma_attr, 0);
}
rdma_ah_set_dlid(rdma_attr, uverb_attr->dlid);
rdma_ah_set_sl(rdma_attr, uverb_attr->sl);
rdma_ah_set_path_bits(rdma_attr, uverb_attr->src_path_bits);
rdma_ah_set_static_rate(rdma_attr, uverb_attr->static_rate);
rdma_ah_set_port_num(rdma_attr, uverb_attr->port_num);
rdma_ah_set_make_grd(rdma_attr, false);
}
static int modify_qp(struct ib_uverbs_file *file,
struct ib_uverbs_ex_modify_qp *cmd, struct ib_udata *udata)
{
@ -1964,48 +2006,12 @@ static int modify_qp(struct ib_uverbs_file *file,
attr->rate_limit = cmd->rate_limit;
if (cmd->base.attr_mask & IB_QP_AV)
attr->ah_attr.type = rdma_ah_find_type(qp->device,
cmd->base.dest.port_num);
if (cmd->base.dest.is_global) {
rdma_ah_set_grh(&attr->ah_attr, NULL,
cmd->base.dest.flow_label,
cmd->base.dest.sgid_index,
cmd->base.dest.hop_limit,
cmd->base.dest.traffic_class);
rdma_ah_set_dgid_raw(&attr->ah_attr, cmd->base.dest.dgid);
} else {
rdma_ah_set_ah_flags(&attr->ah_attr, 0);
}
rdma_ah_set_dlid(&attr->ah_attr, cmd->base.dest.dlid);
rdma_ah_set_sl(&attr->ah_attr, cmd->base.dest.sl);
rdma_ah_set_path_bits(&attr->ah_attr, cmd->base.dest.src_path_bits);
rdma_ah_set_static_rate(&attr->ah_attr, cmd->base.dest.static_rate);
rdma_ah_set_port_num(&attr->ah_attr,
cmd->base.dest.port_num);
copy_ah_attr_from_uverbs(qp->device, &attr->ah_attr,
&cmd->base.dest);
if (cmd->base.attr_mask & IB_QP_ALT_PATH)
attr->alt_ah_attr.type =
rdma_ah_find_type(qp->device, cmd->base.dest.port_num);
if (cmd->base.alt_dest.is_global) {
rdma_ah_set_grh(&attr->alt_ah_attr, NULL,
cmd->base.alt_dest.flow_label,
cmd->base.alt_dest.sgid_index,
cmd->base.alt_dest.hop_limit,
cmd->base.alt_dest.traffic_class);
rdma_ah_set_dgid_raw(&attr->alt_ah_attr,
cmd->base.alt_dest.dgid);
} else {
rdma_ah_set_ah_flags(&attr->alt_ah_attr, 0);
}
rdma_ah_set_dlid(&attr->alt_ah_attr, cmd->base.alt_dest.dlid);
rdma_ah_set_sl(&attr->alt_ah_attr, cmd->base.alt_dest.sl);
rdma_ah_set_path_bits(&attr->alt_ah_attr,
cmd->base.alt_dest.src_path_bits);
rdma_ah_set_static_rate(&attr->alt_ah_attr,
cmd->base.alt_dest.static_rate);
rdma_ah_set_port_num(&attr->alt_ah_attr,
cmd->base.alt_dest.port_num);
copy_ah_attr_from_uverbs(qp->device, &attr->alt_ah_attr,
&cmd->base.alt_dest);
ret = ib_modify_qp_with_udata(qp, attr,
modify_qp_mask(qp->qp_type,
@ -2037,7 +2043,8 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
return -EOPNOTSUPP;
INIT_UDATA(&udata, buf + sizeof(cmd.base), NULL,
in_len - sizeof(cmd.base), out_len);
in_len - sizeof(cmd.base) - sizeof(struct ib_uverbs_cmd_hdr),
out_len);
ret = modify_qp(file, &cmd, &udata);
if (ret)
@ -2543,7 +2550,8 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long)cmd.response + sizeof(resp),
in_len - sizeof(cmd), out_len - sizeof(resp));
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
uobj = uobj_alloc(uobj_get_type(ah), file->ucontext);
if (IS_ERR(uobj))
@ -2556,6 +2564,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
}
attr.type = rdma_ah_find_type(ib_dev, cmd.attr.port_num);
rdma_ah_set_make_grd(&attr, false);
rdma_ah_set_dlid(&attr, cmd.attr.dlid);
rdma_ah_set_sl(&attr, cmd.attr.sl);
rdma_ah_set_path_bits(&attr, cmd.attr.src_path_bits);
@ -3472,6 +3481,9 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
if (IS_ERR(obj))
return PTR_ERR(obj);
if (cmd->srq_type == IB_SRQT_TM)
attr.ext.tag_matching.max_num_tags = cmd->max_num_tags;
if (cmd->srq_type == IB_SRQT_XRC) {
xrcd_uobj = uobj_get_read(uobj_get_type(xrcd), cmd->xrcd_handle,
file->ucontext);
@ -3488,10 +3500,12 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object, uobject);
atomic_inc(&obj->uxrcd->refcnt);
}
attr.ext.xrc.cq = uobj_get_obj_read(cq, cmd->cq_handle,
file->ucontext);
if (!attr.ext.xrc.cq) {
if (ib_srq_has_cq(cmd->srq_type)) {
attr.ext.cq = uobj_get_obj_read(cq, cmd->cq_handle,
file->ucontext);
if (!attr.ext.cq) {
ret = -EINVAL;
goto err_put_xrcd;
}
@ -3526,10 +3540,13 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
srq->event_handler = attr.event_handler;
srq->srq_context = attr.srq_context;
if (ib_srq_has_cq(cmd->srq_type)) {
srq->ext.cq = attr.ext.cq;
atomic_inc(&attr.ext.cq->usecnt);
}
if (cmd->srq_type == IB_SRQT_XRC) {
srq->ext.xrc.cq = attr.ext.xrc.cq;
srq->ext.xrc.xrcd = attr.ext.xrc.xrcd;
atomic_inc(&attr.ext.xrc.cq->usecnt);
atomic_inc(&attr.ext.xrc.xrcd->usecnt);
}
@ -3552,10 +3569,12 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
goto err_copy;
}
if (cmd->srq_type == IB_SRQT_XRC) {
if (cmd->srq_type == IB_SRQT_XRC)
uobj_put_read(xrcd_uobj);
uobj_put_obj_read(attr.ext.xrc.cq);
}
if (ib_srq_has_cq(cmd->srq_type))
uobj_put_obj_read(attr.ext.cq);
uobj_put_obj_read(pd);
uobj_alloc_commit(&obj->uevent.uobject);
@ -3568,8 +3587,8 @@ err_put:
uobj_put_obj_read(pd);
err_put_cq:
if (cmd->srq_type == IB_SRQT_XRC)
uobj_put_obj_read(attr.ext.xrc.cq);
if (ib_srq_has_cq(cmd->srq_type))
uobj_put_obj_read(attr.ext.cq);
err_put_xrcd:
if (cmd->srq_type == IB_SRQT_XRC) {
@ -3599,6 +3618,7 @@ ssize_t ib_uverbs_create_srq(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
memset(&xcmd, 0, sizeof(xcmd));
xcmd.response = cmd.response;
xcmd.user_handle = cmd.user_handle;
xcmd.srq_type = IB_SRQT_BASIC;
@ -3607,10 +3627,10 @@ ssize_t ib_uverbs_create_srq(struct ib_uverbs_file *file,
xcmd.max_sge = cmd.max_sge;
xcmd.srq_limit = cmd.srq_limit;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
ret = __uverbs_create_xsrq(file, ib_dev, &xcmd, &udata);
if (ret)
@ -3634,10 +3654,10 @@ ssize_t ib_uverbs_create_xsrq(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
INIT_UDATA(&udata, buf + sizeof cmd,
(unsigned long) cmd.response + sizeof resp,
in_len - sizeof cmd - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof resp);
INIT_UDATA(&udata, buf + sizeof(cmd),
(unsigned long) cmd.response + sizeof(resp),
in_len - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr),
out_len - sizeof(resp));
ret = __uverbs_create_xsrq(file, ib_dev, &cmd, &udata);
if (ret)
@ -3848,6 +3868,16 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
resp.raw_packet_caps = attr.raw_packet_caps;
resp.response_length += sizeof(resp.raw_packet_caps);
if (ucore->outlen < resp.response_length + sizeof(resp.xrq_caps))
goto end;
resp.xrq_caps.max_rndv_hdr_size = attr.xrq_caps.max_rndv_hdr_size;
resp.xrq_caps.max_num_tags = attr.xrq_caps.max_num_tags;
resp.xrq_caps.max_ops = attr.xrq_caps.max_ops;
resp.xrq_caps.max_sge = attr.xrq_caps.max_sge;
resp.xrq_caps.flags = attr.xrq_caps.flags;
resp.response_length += sizeof(resp.xrq_caps);
end:
err = ib_copy_to_udata(ucore, &resp, resp.response_length);
return err;

View File

@ -0,0 +1,364 @@
/*
* Copyright (c) 2017, Mellanox Technologies inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <rdma/rdma_user_ioctl.h>
#include <rdma/uverbs_ioctl.h>
#include "rdma_core.h"
#include "uverbs.h"
static int uverbs_process_attr(struct ib_device *ibdev,
struct ib_ucontext *ucontext,
const struct ib_uverbs_attr *uattr,
u16 attr_id,
const struct uverbs_attr_spec_hash *attr_spec_bucket,
struct uverbs_attr_bundle_hash *attr_bundle_h,
struct ib_uverbs_attr __user *uattr_ptr)
{
const struct uverbs_attr_spec *spec;
struct uverbs_attr *e;
const struct uverbs_object_spec *object;
struct uverbs_obj_attr *o_attr;
struct uverbs_attr *elements = attr_bundle_h->attrs;
if (uattr->reserved)
return -EINVAL;
if (attr_id >= attr_spec_bucket->num_attrs) {
if (uattr->flags & UVERBS_ATTR_F_MANDATORY)
return -EINVAL;
else
return 0;
}
spec = &attr_spec_bucket->attrs[attr_id];
e = &elements[attr_id];
e->uattr = uattr_ptr;
switch (spec->type) {
case UVERBS_ATTR_TYPE_PTR_IN:
case UVERBS_ATTR_TYPE_PTR_OUT:
if (uattr->len < spec->len ||
(!(spec->flags & UVERBS_ATTR_SPEC_F_MIN_SZ) &&
uattr->len > spec->len))
return -EINVAL;
e->ptr_attr.data = uattr->data;
e->ptr_attr.len = uattr->len;
e->ptr_attr.flags = uattr->flags;
break;
case UVERBS_ATTR_TYPE_IDR:
if (uattr->data >> 32)
return -EINVAL;
/* fall through */
case UVERBS_ATTR_TYPE_FD:
if (uattr->len != 0 || !ucontext || uattr->data > INT_MAX)
return -EINVAL;
o_attr = &e->obj_attr;
object = uverbs_get_object(ibdev, spec->obj.obj_type);
if (!object)
return -EINVAL;
o_attr->type = object->type_attrs;
o_attr->id = (int)uattr->data;
o_attr->uobject = uverbs_get_uobject_from_context(
o_attr->type,
ucontext,
spec->obj.access,
o_attr->id);
if (IS_ERR(o_attr->uobject))
return PTR_ERR(o_attr->uobject);
if (spec->obj.access == UVERBS_ACCESS_NEW) {
u64 id = o_attr->uobject->id;
/* Copy the allocated id to the user-space */
if (put_user(id, &e->uattr->data)) {
uverbs_finalize_object(o_attr->uobject,
UVERBS_ACCESS_NEW,
false);
return -EFAULT;
}
}
break;
default:
return -EOPNOTSUPP;
}
set_bit(attr_id, attr_bundle_h->valid_bitmap);
return 0;
}
static int uverbs_uattrs_process(struct ib_device *ibdev,
struct ib_ucontext *ucontext,
const struct ib_uverbs_attr *uattrs,
size_t num_uattrs,
const struct uverbs_method_spec *method,
struct uverbs_attr_bundle *attr_bundle,
struct ib_uverbs_attr __user *uattr_ptr)
{
size_t i;
int ret = 0;
int num_given_buckets = 0;
for (i = 0; i < num_uattrs; i++) {
const struct ib_uverbs_attr *uattr = &uattrs[i];
u16 attr_id = uattr->attr_id;
struct uverbs_attr_spec_hash *attr_spec_bucket;
ret = uverbs_ns_idx(&attr_id, method->num_buckets);
if (ret < 0) {
if (uattr->flags & UVERBS_ATTR_F_MANDATORY) {
uverbs_finalize_objects(attr_bundle,
method->attr_buckets,
num_given_buckets,
false);
return ret;
}
continue;
}
/*
* ret is the found ns, so increase num_given_buckets if
* necessary.
*/
if (ret >= num_given_buckets)
num_given_buckets = ret + 1;
attr_spec_bucket = method->attr_buckets[ret];
ret = uverbs_process_attr(ibdev, ucontext, uattr, attr_id,
attr_spec_bucket, &attr_bundle->hash[ret],
uattr_ptr++);
if (ret) {
uverbs_finalize_objects(attr_bundle,
method->attr_buckets,
num_given_buckets,
false);
return ret;
}
}
return num_given_buckets;
}
static int uverbs_validate_kernel_mandatory(const struct uverbs_method_spec *method_spec,
struct uverbs_attr_bundle *attr_bundle)
{
unsigned int i;
for (i = 0; i < attr_bundle->num_buckets; i++) {
struct uverbs_attr_spec_hash *attr_spec_bucket =
method_spec->attr_buckets[i];
if (!bitmap_subset(attr_spec_bucket->mandatory_attrs_bitmask,
attr_bundle->hash[i].valid_bitmap,
attr_spec_bucket->num_attrs))
return -EINVAL;
}
return 0;
}
static int uverbs_handle_method(struct ib_uverbs_attr __user *uattr_ptr,
const struct ib_uverbs_attr *uattrs,
size_t num_uattrs,
struct ib_device *ibdev,
struct ib_uverbs_file *ufile,
const struct uverbs_method_spec *method_spec,
struct uverbs_attr_bundle *attr_bundle)
{
int ret;
int finalize_ret;
int num_given_buckets;
num_given_buckets = uverbs_uattrs_process(ibdev, ufile->ucontext, uattrs,
num_uattrs, method_spec,
attr_bundle, uattr_ptr);
if (num_given_buckets <= 0)
return -EINVAL;
attr_bundle->num_buckets = num_given_buckets;
ret = uverbs_validate_kernel_mandatory(method_spec, attr_bundle);
if (ret)
goto cleanup;
ret = method_spec->handler(ibdev, ufile, attr_bundle);
cleanup:
finalize_ret = uverbs_finalize_objects(attr_bundle,
method_spec->attr_buckets,
attr_bundle->num_buckets,
!ret);
return ret ? ret : finalize_ret;
}
#define UVERBS_OPTIMIZE_USING_STACK_SZ 256
static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
struct ib_uverbs_file *file,
struct ib_uverbs_ioctl_hdr *hdr,
void __user *buf)
{
const struct uverbs_object_spec *object_spec;
const struct uverbs_method_spec *method_spec;
long err = 0;
unsigned int i;
struct {
struct ib_uverbs_attr *uattrs;
struct uverbs_attr_bundle *uverbs_attr_bundle;
} *ctx = NULL;
struct uverbs_attr *curr_attr;
unsigned long *curr_bitmap;
size_t ctx_size;
#ifdef UVERBS_OPTIMIZE_USING_STACK_SZ
uintptr_t data[UVERBS_OPTIMIZE_USING_STACK_SZ / sizeof(uintptr_t)];
#endif
if (hdr->reserved)
return -EINVAL;
object_spec = uverbs_get_object(ib_dev, hdr->object_id);
if (!object_spec)
return -EOPNOTSUPP;
method_spec = uverbs_get_method(object_spec, hdr->method_id);
if (!method_spec)
return -EOPNOTSUPP;
if ((method_spec->flags & UVERBS_ACTION_FLAG_CREATE_ROOT) ^ !file->ucontext)
return -EINVAL;
ctx_size = sizeof(*ctx) +
sizeof(struct uverbs_attr_bundle) +
sizeof(struct uverbs_attr_bundle_hash) * method_spec->num_buckets +
sizeof(*ctx->uattrs) * hdr->num_attrs +
sizeof(*ctx->uverbs_attr_bundle->hash[0].attrs) *
method_spec->num_child_attrs +
sizeof(*ctx->uverbs_attr_bundle->hash[0].valid_bitmap) *
(method_spec->num_child_attrs / BITS_PER_LONG +
method_spec->num_buckets);
#ifdef UVERBS_OPTIMIZE_USING_STACK_SZ
if (ctx_size <= UVERBS_OPTIMIZE_USING_STACK_SZ)
ctx = (void *)data;
if (!ctx)
#endif
ctx = kmalloc(ctx_size, GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->uverbs_attr_bundle = (void *)ctx + sizeof(*ctx);
ctx->uattrs = (void *)(ctx->uverbs_attr_bundle + 1) +
(sizeof(ctx->uverbs_attr_bundle->hash[0]) *
method_spec->num_buckets);
curr_attr = (void *)(ctx->uattrs + hdr->num_attrs);
curr_bitmap = (void *)(curr_attr + method_spec->num_child_attrs);
/*
* We just fill the pointers and num_attrs here. The data itself will be
* filled at a later stage (uverbs_process_attr)
*/
for (i = 0; i < method_spec->num_buckets; i++) {
unsigned int curr_num_attrs = method_spec->attr_buckets[i]->num_attrs;
ctx->uverbs_attr_bundle->hash[i].attrs = curr_attr;
curr_attr += curr_num_attrs;
ctx->uverbs_attr_bundle->hash[i].num_attrs = curr_num_attrs;
ctx->uverbs_attr_bundle->hash[i].valid_bitmap = curr_bitmap;
bitmap_zero(curr_bitmap, curr_num_attrs);
curr_bitmap += BITS_TO_LONGS(curr_num_attrs);
}
err = copy_from_user(ctx->uattrs, buf,
sizeof(*ctx->uattrs) * hdr->num_attrs);
if (err) {
err = -EFAULT;
goto out;
}
err = uverbs_handle_method(buf, ctx->uattrs, hdr->num_attrs, ib_dev,
file, method_spec, ctx->uverbs_attr_bundle);
out:
#ifdef UVERBS_OPTIMIZE_USING_STACK_SZ
if (ctx_size > UVERBS_OPTIMIZE_USING_STACK_SZ)
#endif
kfree(ctx);
return err;
}
#define IB_UVERBS_MAX_CMD_SZ 4096
long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct ib_uverbs_file *file = filp->private_data;
struct ib_uverbs_ioctl_hdr __user *user_hdr =
(struct ib_uverbs_ioctl_hdr __user *)arg;
struct ib_uverbs_ioctl_hdr hdr;
struct ib_device *ib_dev;
int srcu_key;
long err;
srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
ib_dev = srcu_dereference(file->device->ib_dev,
&file->device->disassociate_srcu);
if (!ib_dev) {
err = -EIO;
goto out;
}
if (cmd == RDMA_VERBS_IOCTL) {
err = copy_from_user(&hdr, user_hdr, sizeof(hdr));
if (err || hdr.length > IB_UVERBS_MAX_CMD_SZ ||
hdr.length != sizeof(hdr) + hdr.num_attrs * sizeof(struct ib_uverbs_attr)) {
err = -EINVAL;
goto out;
}
if (hdr.reserved) {
err = -EOPNOTSUPP;
goto out;
}
err = ib_uverbs_cmd_verbs(ib_dev, file, &hdr,
(__user void *)arg + sizeof(hdr));
} else {
err = -ENOIOCTLCMD;
}
out:
srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
return err;
}

View File

@ -0,0 +1,665 @@
/*
* Copyright (c) 2017, Mellanox Technologies inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <rdma/uverbs_ioctl.h>
#include <rdma/rdma_user_ioctl.h>
#include <linux/bitops.h>
#include "uverbs.h"
#define UVERBS_NUM_NS (UVERBS_ID_NS_MASK >> UVERBS_ID_NS_SHIFT)
#define GET_NS_ID(idx) (((idx) & UVERBS_ID_NS_MASK) >> UVERBS_ID_NS_SHIFT)
#define GET_ID(idx) ((idx) & ~UVERBS_ID_NS_MASK)
#define _for_each_element(elem, tmpi, tmpj, hashes, num_buckets_offset, \
buckets_offset) \
for (tmpj = 0, \
elem = (*(const void ***)((hashes)[tmpi] + \
(buckets_offset)))[0]; \
tmpj < *(size_t *)((hashes)[tmpi] + (num_buckets_offset)); \
tmpj++) \
if ((elem = ((*(const void ***)(hashes[tmpi] + \
(buckets_offset)))[tmpj])))
/*
* Iterate all elements of a few @hashes. The number of given hashes is
* indicated by @num_hashes. The offset of the number of buckets in the hash is
* represented by @num_buckets_offset, while the offset of the buckets array in
* the hash structure is represented by @buckets_offset. tmpi and tmpj are two
* short (or int) based indices that are given by the user. tmpi iterates over
* the different hashes. @elem points the current element in the hashes[tmpi]
* bucket we are looping on. To be honest, @hashes representation isn't exactly
* a hash, but more a collection of elements. These elements' ids are treated
* in a hash like manner, where the first upper bits are the bucket number.
* These elements are later mapped into a perfect-hash.
*/
#define for_each_element(elem, tmpi, tmpj, hashes, num_hashes, \
num_buckets_offset, buckets_offset) \
for (tmpi = 0; tmpi < (num_hashes); tmpi++) \
_for_each_element(elem, tmpi, tmpj, hashes, num_buckets_offset,\
buckets_offset)
#define get_elements_iterators_entry_above(iters, num_elements, elements, \
num_objects_fld, objects_fld, bucket,\
min_id) \
get_elements_above_id((const void **)iters, num_elements, \
(const void **)(elements), \
offsetof(typeof(**elements), \
num_objects_fld), \
offsetof(typeof(**elements), objects_fld),\
offsetof(typeof(***(*elements)->objects_fld), id),\
bucket, min_id)
#define get_objects_above_id(iters, num_trees, trees, bucket, min_id) \
get_elements_iterators_entry_above(iters, num_trees, trees, \
num_objects, objects, bucket, min_id)
#define get_methods_above_id(method_iters, num_iters, iters, bucket, min_id)\
get_elements_iterators_entry_above(method_iters, num_iters, iters, \
num_methods, methods, bucket, min_id)
#define get_attrs_above_id(attrs_iters, num_iters, iters, bucket, min_id)\
get_elements_iterators_entry_above(attrs_iters, num_iters, iters, \
num_attrs, attrs, bucket, min_id)
/*
* get_elements_above_id get a few hashes represented by @elements and
* @num_elements. The hashes fields are described by @num_offset, @data_offset
* and @id_offset in the same way as required by for_each_element. The function
* returns an array of @iters, represents an array of elements in the hashes
* buckets, which their ids are the smallest ids in all hashes but are all
* larger than the id given by min_id. Elements are only added to the iters
* array if their id belongs to the bucket @bucket. The number of elements in
* the returned array is returned by the function. @min_id is also updated to
* reflect the new min_id of all elements in iters.
*/
static size_t get_elements_above_id(const void **iters,
unsigned int num_elements,
const void **elements,
size_t num_offset,
size_t data_offset,
size_t id_offset,
u16 bucket,
short *min_id)
{
size_t num_iters = 0;
short min = SHRT_MAX;
const void *elem;
int i, j, last_stored = -1;
for_each_element(elem, i, j, elements, num_elements, num_offset,
data_offset) {
u16 id = *(u16 *)(elem + id_offset);
if (GET_NS_ID(id) != bucket)
continue;
if (GET_ID(id) < *min_id ||
(min != SHRT_MAX && GET_ID(id) > min))
continue;
/*
* We first iterate all hashes represented by @elements. When
* we do, we try to find an element @elem in the bucket @bucket
* which its id is min. Since we can't ensure the user sorted
* the elements in increasing order, we override this hash's
* minimal id element we found, if a new element with a smaller
* id was just found.
*/
iters[last_stored == i ? num_iters - 1 : num_iters++] = elem;
last_stored = i;
min = GET_ID(id);
}
/*
* We only insert to our iters array an element, if its id is smaller
* than all previous ids. Therefore, the final iters array is sorted so
* that smaller ids are in the end of the array.
* Therefore, we need to clean the beginning of the array to make sure
* all ids of final elements are equal to min.
*/
for (i = num_iters - 1; i >= 0 &&
GET_ID(*(u16 *)(iters[i] + id_offset)) == min; i--)
;
num_iters -= i + 1;
memmove(iters, iters + i + 1, sizeof(*iters) * num_iters);
*min_id = min;
return num_iters;
}
#define find_max_element_entry_id(num_elements, elements, num_objects_fld, \
objects_fld, bucket) \
find_max_element_id(num_elements, (const void **)(elements), \
offsetof(typeof(**elements), num_objects_fld), \
offsetof(typeof(**elements), objects_fld), \
offsetof(typeof(***(*elements)->objects_fld), id),\
bucket)
static short find_max_element_ns_id(unsigned int num_elements,
const void **elements,
size_t num_offset,
size_t data_offset,
size_t id_offset)
{
short max_ns = SHRT_MIN;
const void *elem;
int i, j;
for_each_element(elem, i, j, elements, num_elements, num_offset,
data_offset) {
u16 id = *(u16 *)(elem + id_offset);
if (GET_NS_ID(id) > max_ns)
max_ns = GET_NS_ID(id);
}
return max_ns;
}
static short find_max_element_id(unsigned int num_elements,
const void **elements,
size_t num_offset,
size_t data_offset,
size_t id_offset,
u16 bucket)
{
short max_id = SHRT_MIN;
const void *elem;
int i, j;
for_each_element(elem, i, j, elements, num_elements, num_offset,
data_offset) {
u16 id = *(u16 *)(elem + id_offset);
if (GET_NS_ID(id) == bucket &&
GET_ID(id) > max_id)
max_id = GET_ID(id);
}
return max_id;
}
#define find_max_element_entry_id(num_elements, elements, num_objects_fld, \
objects_fld, bucket) \
find_max_element_id(num_elements, (const void **)(elements), \
offsetof(typeof(**elements), num_objects_fld), \
offsetof(typeof(**elements), objects_fld), \
offsetof(typeof(***(*elements)->objects_fld), id),\
bucket)
#define find_max_element_ns_entry_id(num_elements, elements, \
num_objects_fld, objects_fld) \
find_max_element_ns_id(num_elements, (const void **)(elements), \
offsetof(typeof(**elements), num_objects_fld),\
offsetof(typeof(**elements), objects_fld), \
offsetof(typeof(***(*elements)->objects_fld), id))
/*
* find_max_xxxx_ns_id gets a few elements. Each element is described by an id
* which its upper bits represents a namespace. It finds the max namespace. This
* could be used in order to know how many buckets do we need to allocate. If no
* elements exist, SHRT_MIN is returned. Namespace represents here different
* buckets. The common example is "common bucket" and "driver bucket".
*
* find_max_xxxx_id gets a few elements and a bucket. Each element is described
* by an id which its upper bits represent a namespace. It returns the max id
* which is contained in the same namespace defined in @bucket. This could be
* used in order to know how many elements do we need to allocate in the bucket.
* If no elements exist, SHRT_MIN is returned.
*/
#define find_max_object_id(num_trees, trees, bucket) \
find_max_element_entry_id(num_trees, trees, num_objects,\
objects, bucket)
#define find_max_object_ns_id(num_trees, trees) \
find_max_element_ns_entry_id(num_trees, trees, \
num_objects, objects)
#define find_max_method_id(num_iters, iters, bucket) \
find_max_element_entry_id(num_iters, iters, num_methods,\
methods, bucket)
#define find_max_method_ns_id(num_iters, iters) \
find_max_element_ns_entry_id(num_iters, iters, \
num_methods, methods)
#define find_max_attr_id(num_iters, iters, bucket) \
find_max_element_entry_id(num_iters, iters, num_attrs, \
attrs, bucket)
#define find_max_attr_ns_id(num_iters, iters) \
find_max_element_ns_entry_id(num_iters, iters, \
num_attrs, attrs)
static void free_method(struct uverbs_method_spec *method)
{
unsigned int i;
if (!method)
return;
for (i = 0; i < method->num_buckets; i++)
kfree(method->attr_buckets[i]);
kfree(method);
}
#define IS_ATTR_OBJECT(attr) ((attr)->type == UVERBS_ATTR_TYPE_IDR || \
(attr)->type == UVERBS_ATTR_TYPE_FD)
/*
* This function gets array of size @num_method_defs which contains pointers to
* method definitions @method_defs. The function allocates an
* uverbs_method_spec structure and initializes its number of buckets and the
* elements in buckets to the correct attributes. While doing that, it
* validates that there aren't conflicts between attributes of different
* method_defs.
*/
static struct uverbs_method_spec *build_method_with_attrs(const struct uverbs_method_def **method_defs,
size_t num_method_defs)
{
int bucket_idx;
int max_attr_buckets = 0;
size_t num_attr_buckets = 0;
int res = 0;
struct uverbs_method_spec *method = NULL;
const struct uverbs_attr_def **attr_defs;
unsigned int num_of_singularities = 0;
max_attr_buckets = find_max_attr_ns_id(num_method_defs, method_defs);
if (max_attr_buckets >= 0)
num_attr_buckets = max_attr_buckets + 1;
method = kzalloc(sizeof(*method) +
num_attr_buckets * sizeof(*method->attr_buckets),
GFP_KERNEL);
if (!method)
return ERR_PTR(-ENOMEM);
method->num_buckets = num_attr_buckets;
attr_defs = kcalloc(num_method_defs, sizeof(*attr_defs), GFP_KERNEL);
if (!attr_defs) {
res = -ENOMEM;
goto free_method;
}
for (bucket_idx = 0; bucket_idx < method->num_buckets; bucket_idx++) {
short min_id = SHRT_MIN;
int attr_max_bucket = 0;
struct uverbs_attr_spec_hash *hash = NULL;
attr_max_bucket = find_max_attr_id(num_method_defs, method_defs,
bucket_idx);
if (attr_max_bucket < 0)
continue;
hash = kzalloc(sizeof(*hash) +
ALIGN(sizeof(*hash->attrs) * (attr_max_bucket + 1),
sizeof(long)) +
BITS_TO_LONGS(attr_max_bucket) * sizeof(long),
GFP_KERNEL);
if (!hash) {
res = -ENOMEM;
goto free;
}
hash->num_attrs = attr_max_bucket + 1;
method->num_child_attrs += hash->num_attrs;
hash->mandatory_attrs_bitmask = (void *)(hash + 1) +
ALIGN(sizeof(*hash->attrs) *
(attr_max_bucket + 1),
sizeof(long));
method->attr_buckets[bucket_idx] = hash;
do {
size_t num_attr_defs;
struct uverbs_attr_spec *attr;
bool attr_obj_with_special_access;
num_attr_defs =
get_attrs_above_id(attr_defs,
num_method_defs,
method_defs,
bucket_idx,
&min_id);
/* Last attr in bucket */
if (!num_attr_defs)
break;
if (num_attr_defs > 1) {
/*
* We don't allow two attribute definitions for
* the same attribute. This is usually a
* programmer error. If required, it's better to
* just add a new attribute to capture the new
* semantics.
*/
res = -EEXIST;
goto free;
}
attr = &hash->attrs[min_id];
memcpy(attr, &attr_defs[0]->attr, sizeof(*attr));
attr_obj_with_special_access = IS_ATTR_OBJECT(attr) &&
(attr->obj.access == UVERBS_ACCESS_NEW ||
attr->obj.access == UVERBS_ACCESS_DESTROY);
num_of_singularities += !!attr_obj_with_special_access;
if (WARN(num_of_singularities > 1,
"ib_uverbs: Method contains more than one object attr (%d) with new/destroy access\n",
min_id) ||
WARN(attr_obj_with_special_access &&
!(attr->flags & UVERBS_ATTR_SPEC_F_MANDATORY),
"ib_uverbs: Tried to merge attr (%d) but it's an object with new/destroy aceess but isn't mandatory\n",
min_id) ||
WARN(IS_ATTR_OBJECT(attr) &&
attr->flags & UVERBS_ATTR_SPEC_F_MIN_SZ,
"ib_uverbs: Tried to merge attr (%d) but it's an object with min_sz flag\n",
min_id)) {
res = -EINVAL;
goto free;
}
if (attr->flags & UVERBS_ATTR_SPEC_F_MANDATORY)
set_bit(min_id, hash->mandatory_attrs_bitmask);
min_id++;
} while (1);
}
kfree(attr_defs);
return method;
free:
kfree(attr_defs);
free_method:
free_method(method);
return ERR_PTR(res);
}
static void free_object(struct uverbs_object_spec *object)
{
unsigned int i, j;
if (!object)
return;
for (i = 0; i < object->num_buckets; i++) {
struct uverbs_method_spec_hash *method_buckets =
object->method_buckets[i];
if (!method_buckets)
continue;
for (j = 0; j < method_buckets->num_methods; j++)
free_method(method_buckets->methods[j]);
kfree(method_buckets);
}
kfree(object);
}
/*
* This function gets array of size @num_object_defs which contains pointers to
* object definitions @object_defs. The function allocated an
* uverbs_object_spec structure and initialize its number of buckets and the
* elements in buckets to the correct methods. While doing that, it
* sorts out the correct relationship between conflicts in the same method.
*/
static struct uverbs_object_spec *build_object_with_methods(const struct uverbs_object_def **object_defs,
size_t num_object_defs)
{
u16 bucket_idx;
int max_method_buckets = 0;
u16 num_method_buckets = 0;
int res = 0;
struct uverbs_object_spec *object = NULL;
const struct uverbs_method_def **method_defs;
max_method_buckets = find_max_method_ns_id(num_object_defs, object_defs);
if (max_method_buckets >= 0)
num_method_buckets = max_method_buckets + 1;
object = kzalloc(sizeof(*object) +
num_method_buckets *
sizeof(*object->method_buckets), GFP_KERNEL);
if (!object)
return ERR_PTR(-ENOMEM);
object->num_buckets = num_method_buckets;
method_defs = kcalloc(num_object_defs, sizeof(*method_defs), GFP_KERNEL);
if (!method_defs) {
res = -ENOMEM;
goto free_object;
}
for (bucket_idx = 0; bucket_idx < object->num_buckets; bucket_idx++) {
short min_id = SHRT_MIN;
int methods_max_bucket = 0;
struct uverbs_method_spec_hash *hash = NULL;
methods_max_bucket = find_max_method_id(num_object_defs, object_defs,
bucket_idx);
if (methods_max_bucket < 0)
continue;
hash = kzalloc(sizeof(*hash) +
sizeof(*hash->methods) * (methods_max_bucket + 1),
GFP_KERNEL);
if (!hash) {
res = -ENOMEM;
goto free;
}
hash->num_methods = methods_max_bucket + 1;
object->method_buckets[bucket_idx] = hash;
do {
size_t num_method_defs;
struct uverbs_method_spec *method;
int i;
num_method_defs =
get_methods_above_id(method_defs,
num_object_defs,
object_defs,
bucket_idx,
&min_id);
/* Last method in bucket */
if (!num_method_defs)
break;
method = build_method_with_attrs(method_defs,
num_method_defs);
if (IS_ERR(method)) {
res = PTR_ERR(method);
goto free;
}
/*
* The last tree which is given as an argument to the
* merge overrides previous method handler.
* Therefore, we iterate backwards and search for the
* first handler which != NULL. This also defines the
* set of flags used for this handler.
*/
for (i = num_object_defs - 1;
i >= 0 && !method_defs[i]->handler; i--)
;
hash->methods[min_id++] = method;
/* NULL handler isn't allowed */
if (WARN(i < 0,
"ib_uverbs: tried to merge function id %d, but all handlers are NULL\n",
min_id)) {
res = -EINVAL;
goto free;
}
method->handler = method_defs[i]->handler;
method->flags = method_defs[i]->flags;
} while (1);
}
kfree(method_defs);
return object;
free:
kfree(method_defs);
free_object:
free_object(object);
return ERR_PTR(res);
}
void uverbs_free_spec_tree(struct uverbs_root_spec *root)
{
unsigned int i, j;
if (!root)
return;
for (i = 0; i < root->num_buckets; i++) {
struct uverbs_object_spec_hash *object_hash =
root->object_buckets[i];
if (!object_hash)
continue;
for (j = 0; j < object_hash->num_objects; j++)
free_object(object_hash->objects[j]);
kfree(object_hash);
}
kfree(root);
}
EXPORT_SYMBOL(uverbs_free_spec_tree);
struct uverbs_root_spec *uverbs_alloc_spec_tree(unsigned int num_trees,
const struct uverbs_object_tree_def **trees)
{
u16 bucket_idx;
short max_object_buckets = 0;
size_t num_objects_buckets = 0;
struct uverbs_root_spec *root_spec = NULL;
const struct uverbs_object_def **object_defs;
int i;
int res = 0;
max_object_buckets = find_max_object_ns_id(num_trees, trees);
/*
* Devices which don't want to support ib_uverbs, should just allocate
* an empty parsing tree. Every user-space command won't hit any valid
* entry in the parsing tree and thus will fail.
*/
if (max_object_buckets >= 0)
num_objects_buckets = max_object_buckets + 1;
root_spec = kzalloc(sizeof(*root_spec) +
num_objects_buckets * sizeof(*root_spec->object_buckets),
GFP_KERNEL);
if (!root_spec)
return ERR_PTR(-ENOMEM);
root_spec->num_buckets = num_objects_buckets;
object_defs = kcalloc(num_trees, sizeof(*object_defs),
GFP_KERNEL);
if (!object_defs) {
res = -ENOMEM;
goto free_root;
}
for (bucket_idx = 0; bucket_idx < root_spec->num_buckets; bucket_idx++) {
short min_id = SHRT_MIN;
short objects_max_bucket;
struct uverbs_object_spec_hash *hash = NULL;
objects_max_bucket = find_max_object_id(num_trees, trees,
bucket_idx);
if (objects_max_bucket < 0)
continue;
hash = kzalloc(sizeof(*hash) +
sizeof(*hash->objects) * (objects_max_bucket + 1),
GFP_KERNEL);
if (!hash) {
res = -ENOMEM;
goto free;
}
hash->num_objects = objects_max_bucket + 1;
root_spec->object_buckets[bucket_idx] = hash;
do {
size_t num_object_defs;
struct uverbs_object_spec *object;
num_object_defs = get_objects_above_id(object_defs,
num_trees,
trees,
bucket_idx,
&min_id);
/* Last object in bucket */
if (!num_object_defs)
break;
object = build_object_with_methods(object_defs,
num_object_defs);
if (IS_ERR(object)) {
res = PTR_ERR(object);
goto free;
}
/*
* The last tree which is given as an argument to the
* merge overrides previous object's type_attrs.
* Therefore, we iterate backwards and search for the
* first type_attrs which != NULL.
*/
for (i = num_object_defs - 1;
i >= 0 && !object_defs[i]->type_attrs; i--)
;
/*
* NULL is a valid type_attrs. It means an object we
* can't instantiate (like DEVICE).
*/
object->type_attrs = i < 0 ? NULL :
object_defs[i]->type_attrs;
hash->objects[min_id++] = object;
} while (1);
}
kfree(object_defs);
return root_spec;
free:
kfree(object_defs);
free_root:
uverbs_free_spec_tree(root_spec);
return ERR_PTR(res);
}
EXPORT_SYMBOL(uverbs_alloc_spec_tree);

View File

@ -49,6 +49,7 @@
#include <linux/uaccess.h>
#include <rdma/ib.h>
#include <rdma/uverbs_std_types.h>
#include "uverbs.h"
#include "core_priv.h"
@ -595,7 +596,6 @@ struct file *ib_uverbs_alloc_async_event_file(struct ib_uverbs_file *uverbs_file
{
struct ib_uverbs_async_event_file *ev_file;
struct file *filp;
int ret;
ev_file = kzalloc(sizeof(*ev_file), GFP_KERNEL);
if (!ev_file)
@ -621,21 +621,11 @@ struct file *ib_uverbs_alloc_async_event_file(struct ib_uverbs_file *uverbs_file
INIT_IB_EVENT_HANDLER(&uverbs_file->event_handler,
ib_dev,
ib_uverbs_event_handler);
ret = ib_register_event_handler(&uverbs_file->event_handler);
if (ret)
goto err_put_file;
ib_register_event_handler(&uverbs_file->event_handler);
/* At that point async file stuff was fully set */
return filp;
err_put_file:
fput(filp);
kref_put(&uverbs_file->async_file->ref,
ib_uverbs_release_async_event_file);
uverbs_file->async_file = NULL;
return ERR_PTR(ret);
err_put_refs:
kref_put(&ev_file->uverbs_file->ref, ib_uverbs_release_file);
kref_put(&ev_file->ref, ib_uverbs_release_async_event_file);
@ -949,6 +939,9 @@ static const struct file_operations uverbs_fops = {
.open = ib_uverbs_open,
.release = ib_uverbs_close,
.llseek = no_llseek,
#if IS_ENABLED(CONFIG_INFINIBAND_EXP_USER_ACCESS)
.unlocked_ioctl = ib_uverbs_ioctl,
#endif
};
static const struct file_operations uverbs_mmap_fops = {
@ -958,6 +951,9 @@ static const struct file_operations uverbs_mmap_fops = {
.open = ib_uverbs_open,
.release = ib_uverbs_close,
.llseek = no_llseek,
#if IS_ENABLED(CONFIG_INFINIBAND_EXP_USER_ACCESS)
.unlocked_ioctl = ib_uverbs_ioctl,
#endif
};
static struct ib_client uverbs_client = {
@ -1108,6 +1104,18 @@ static void ib_uverbs_add_one(struct ib_device *device)
if (device_create_file(uverbs_dev->dev, &dev_attr_abi_version))
goto err_class;
if (!device->specs_root) {
const struct uverbs_object_tree_def *default_root[] = {
uverbs_default_get_objects()};
uverbs_dev->specs_root = uverbs_alloc_spec_tree(1,
default_root);
if (IS_ERR(uverbs_dev->specs_root))
goto err_class;
device->specs_root = uverbs_dev->specs_root;
}
ib_set_client_data(device, &uverbs_client, uverbs_dev);
return;
@ -1239,6 +1247,11 @@ static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
ib_uverbs_comp_dev(uverbs_dev);
if (wait_clients)
wait_for_completion(&uverbs_dev->comp);
if (uverbs_dev->specs_root) {
uverbs_free_spec_tree(uverbs_dev->specs_root);
device->specs_root = NULL;
}
kobject_put(&uverbs_dev->kobj);
}

View File

@ -33,10 +33,47 @@
#include <linux/export.h>
#include <rdma/ib_marshall.h>
void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
struct rdma_ah_attr *src)
#define OPA_DEFAULT_GID_PREFIX cpu_to_be64(0xfe80000000000000ULL)
static int rdma_ah_conv_opa_to_ib(struct ib_device *dev,
struct rdma_ah_attr *ib,
struct rdma_ah_attr *opa)
{
struct ib_port_attr port_attr;
int ret = 0;
/* Do structure copy and the over-write fields */
*ib = *opa;
ib->type = RDMA_AH_ATTR_TYPE_IB;
rdma_ah_set_grh(ib, NULL, 0, 0, 1, 0);
if (ib_query_port(dev, opa->port_num, &port_attr)) {
/* Set to default subnet to indicate error */
rdma_ah_set_subnet_prefix(ib, OPA_DEFAULT_GID_PREFIX);
ret = -EINVAL;
} else {
rdma_ah_set_subnet_prefix(ib,
cpu_to_be64(port_attr.subnet_prefix));
}
rdma_ah_set_interface_id(ib, OPA_MAKE_ID(rdma_ah_get_dlid(opa)));
return ret;
}
void ib_copy_ah_attr_to_user(struct ib_device *device,
struct ib_uverbs_ah_attr *dst,
struct rdma_ah_attr *ah_attr)
{
struct rdma_ah_attr *src = ah_attr;
struct rdma_ah_attr conv_ah;
memset(&dst->grh.reserved, 0, sizeof(dst->grh.reserved));
if ((ah_attr->type == RDMA_AH_ATTR_TYPE_OPA) &&
(rdma_ah_get_dlid(ah_attr) >=
be16_to_cpu(IB_MULTICAST_LID_BASE)) &&
(!rdma_ah_conv_opa_to_ib(device, &conv_ah, ah_attr)))
src = &conv_ah;
dst->dlid = rdma_ah_get_dlid(src);
dst->sl = rdma_ah_get_sl(src);
dst->src_path_bits = rdma_ah_get_path_bits(src);
@ -57,7 +94,8 @@ void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
}
EXPORT_SYMBOL(ib_copy_ah_attr_to_user);
void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
void ib_copy_qp_attr_to_user(struct ib_device *device,
struct ib_uverbs_qp_attr *dst,
struct ib_qp_attr *src)
{
dst->qp_state = src->qp_state;
@ -76,8 +114,8 @@ void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
dst->max_recv_sge = src->cap.max_recv_sge;
dst->max_inline_data = src->cap.max_inline_data;
ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr);
ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr);
ib_copy_ah_attr_to_user(device, &dst->ah_attr, &src->ah_attr);
ib_copy_ah_attr_to_user(device, &dst->alt_ah_attr, &src->alt_ah_attr);
dst->pkey_index = src->pkey_index;
dst->alt_pkey_index = src->alt_pkey_index;

View File

@ -209,67 +209,244 @@ static int uverbs_hot_unplug_completion_event_file(struct ib_uobject_file *uobj_
return 0;
};
const struct uverbs_obj_fd_type uverbs_type_attrs_comp_channel = {
.type = UVERBS_TYPE_ALLOC_FD(sizeof(struct ib_uverbs_completion_event_file), 0),
.context_closed = uverbs_hot_unplug_completion_event_file,
.fops = &uverbs_event_fops,
.name = "[infinibandevent]",
.flags = O_RDONLY,
/*
* This spec is used in order to pass information to the hardware driver in a
* legacy way. Every verb that could get driver specific data should get this
* spec.
*/
static const struct uverbs_attr_def uverbs_uhw_compat_in =
UVERBS_ATTR_PTR_IN_SZ(UVERBS_UHW_IN, 0, UA_FLAGS(UVERBS_ATTR_SPEC_F_MIN_SZ));
static const struct uverbs_attr_def uverbs_uhw_compat_out =
UVERBS_ATTR_PTR_OUT_SZ(UVERBS_UHW_OUT, 0, UA_FLAGS(UVERBS_ATTR_SPEC_F_MIN_SZ));
static void create_udata(struct uverbs_attr_bundle *ctx,
struct ib_udata *udata)
{
/*
* This is for ease of conversion. The purpose is to convert all drivers
* to use uverbs_attr_bundle instead of ib_udata.
* Assume attr == 0 is input and attr == 1 is output.
*/
void __user *inbuf;
size_t inbuf_len = 0;
void __user *outbuf;
size_t outbuf_len = 0;
const struct uverbs_attr *uhw_in =
uverbs_attr_get(ctx, UVERBS_UHW_IN);
const struct uverbs_attr *uhw_out =
uverbs_attr_get(ctx, UVERBS_UHW_OUT);
if (!IS_ERR(uhw_in)) {
inbuf = uhw_in->ptr_attr.ptr;
inbuf_len = uhw_in->ptr_attr.len;
}
if (!IS_ERR(uhw_out)) {
outbuf = uhw_out->ptr_attr.ptr;
outbuf_len = uhw_out->ptr_attr.len;
}
INIT_UDATA_BUF_OR_NULL(udata, inbuf, outbuf, inbuf_len, outbuf_len);
}
static int uverbs_create_cq_handler(struct ib_device *ib_dev,
struct ib_uverbs_file *file,
struct uverbs_attr_bundle *attrs)
{
struct ib_ucontext *ucontext = file->ucontext;
struct ib_ucq_object *obj;
struct ib_udata uhw;
int ret;
u64 user_handle;
struct ib_cq_init_attr attr = {};
struct ib_cq *cq;
struct ib_uverbs_completion_event_file *ev_file = NULL;
const struct uverbs_attr *ev_file_attr;
struct ib_uobject *ev_file_uobj;
if (!(ib_dev->uverbs_cmd_mask & 1ULL << IB_USER_VERBS_CMD_CREATE_CQ))
return -EOPNOTSUPP;
ret = uverbs_copy_from(&attr.comp_vector, attrs, CREATE_CQ_COMP_VECTOR);
if (!ret)
ret = uverbs_copy_from(&attr.cqe, attrs, CREATE_CQ_CQE);
if (!ret)
ret = uverbs_copy_from(&user_handle, attrs, CREATE_CQ_USER_HANDLE);
if (ret)
return ret;
/* Optional param, if it doesn't exist, we get -ENOENT and skip it */
if (uverbs_copy_from(&attr.flags, attrs, CREATE_CQ_FLAGS) == -EFAULT)
return -EFAULT;
ev_file_attr = uverbs_attr_get(attrs, CREATE_CQ_COMP_CHANNEL);
if (!IS_ERR(ev_file_attr)) {
ev_file_uobj = ev_file_attr->obj_attr.uobject;
ev_file = container_of(ev_file_uobj,
struct ib_uverbs_completion_event_file,
uobj_file.uobj);
uverbs_uobject_get(ev_file_uobj);
}
if (attr.comp_vector >= ucontext->ufile->device->num_comp_vectors) {
ret = -EINVAL;
goto err_event_file;
}
obj = container_of(uverbs_attr_get(attrs, CREATE_CQ_HANDLE)->obj_attr.uobject,
typeof(*obj), uobject);
obj->uverbs_file = ucontext->ufile;
obj->comp_events_reported = 0;
obj->async_events_reported = 0;
INIT_LIST_HEAD(&obj->comp_list);
INIT_LIST_HEAD(&obj->async_list);
/* Temporary, only until drivers get the new uverbs_attr_bundle */
create_udata(attrs, &uhw);
cq = ib_dev->create_cq(ib_dev, &attr, ucontext, &uhw);
if (IS_ERR(cq)) {
ret = PTR_ERR(cq);
goto err_event_file;
}
cq->device = ib_dev;
cq->uobject = &obj->uobject;
cq->comp_handler = ib_uverbs_comp_handler;
cq->event_handler = ib_uverbs_cq_event_handler;
cq->cq_context = &ev_file->ev_queue;
obj->uobject.object = cq;
obj->uobject.user_handle = user_handle;
atomic_set(&cq->usecnt, 0);
ret = uverbs_copy_to(attrs, CREATE_CQ_RESP_CQE, &cq->cqe);
if (ret)
goto err_cq;
return 0;
err_cq:
ib_destroy_cq(cq);
err_event_file:
if (ev_file)
uverbs_uobject_put(ev_file_uobj);
return ret;
};
const struct uverbs_obj_idr_type uverbs_type_attrs_cq = {
.type = UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_ucq_object), 0),
.destroy_object = uverbs_free_cq,
};
static DECLARE_UVERBS_METHOD(
uverbs_method_cq_create, UVERBS_CQ_CREATE, uverbs_create_cq_handler,
&UVERBS_ATTR_IDR(CREATE_CQ_HANDLE, UVERBS_OBJECT_CQ, UVERBS_ACCESS_NEW,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
&UVERBS_ATTR_PTR_IN(CREATE_CQ_CQE, u32,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
&UVERBS_ATTR_PTR_IN(CREATE_CQ_USER_HANDLE, u64,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
&UVERBS_ATTR_FD(CREATE_CQ_COMP_CHANNEL, UVERBS_OBJECT_COMP_CHANNEL,
UVERBS_ACCESS_READ),
&UVERBS_ATTR_PTR_IN(CREATE_CQ_COMP_VECTOR, u32,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
&UVERBS_ATTR_PTR_IN(CREATE_CQ_FLAGS, u32),
&UVERBS_ATTR_PTR_OUT(CREATE_CQ_RESP_CQE, u32,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
&uverbs_uhw_compat_in, &uverbs_uhw_compat_out);
const struct uverbs_obj_idr_type uverbs_type_attrs_qp = {
.type = UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uqp_object), 0),
.destroy_object = uverbs_free_qp,
};
static int uverbs_destroy_cq_handler(struct ib_device *ib_dev,
struct ib_uverbs_file *file,
struct uverbs_attr_bundle *attrs)
{
struct ib_uverbs_destroy_cq_resp resp;
struct ib_uobject *uobj =
uverbs_attr_get(attrs, DESTROY_CQ_HANDLE)->obj_attr.uobject;
struct ib_ucq_object *obj = container_of(uobj, struct ib_ucq_object,
uobject);
int ret;
const struct uverbs_obj_idr_type uverbs_type_attrs_mw = {
.type = UVERBS_TYPE_ALLOC_IDR(0),
.destroy_object = uverbs_free_mw,
};
if (!(ib_dev->uverbs_cmd_mask & 1ULL << IB_USER_VERBS_CMD_DESTROY_CQ))
return -EOPNOTSUPP;
const struct uverbs_obj_idr_type uverbs_type_attrs_mr = {
/* 1 is used in order to free the MR after all the MWs */
.type = UVERBS_TYPE_ALLOC_IDR(1),
.destroy_object = uverbs_free_mr,
};
ret = rdma_explicit_destroy(uobj);
if (ret)
return ret;
const struct uverbs_obj_idr_type uverbs_type_attrs_srq = {
.type = UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_usrq_object), 0),
.destroy_object = uverbs_free_srq,
};
resp.comp_events_reported = obj->comp_events_reported;
resp.async_events_reported = obj->async_events_reported;
const struct uverbs_obj_idr_type uverbs_type_attrs_ah = {
.type = UVERBS_TYPE_ALLOC_IDR(0),
.destroy_object = uverbs_free_ah,
};
return uverbs_copy_to(attrs, DESTROY_CQ_RESP, &resp);
}
const struct uverbs_obj_idr_type uverbs_type_attrs_flow = {
.type = UVERBS_TYPE_ALLOC_IDR(0),
.destroy_object = uverbs_free_flow,
};
static DECLARE_UVERBS_METHOD(
uverbs_method_cq_destroy, UVERBS_CQ_DESTROY, uverbs_destroy_cq_handler,
&UVERBS_ATTR_IDR(DESTROY_CQ_HANDLE, UVERBS_OBJECT_CQ,
UVERBS_ACCESS_DESTROY,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
&UVERBS_ATTR_PTR_OUT(DESTROY_CQ_RESP, struct ib_uverbs_destroy_cq_resp,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
const struct uverbs_obj_idr_type uverbs_type_attrs_wq = {
.type = UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uwq_object), 0),
.destroy_object = uverbs_free_wq,
};
DECLARE_UVERBS_OBJECT(uverbs_object_comp_channel,
UVERBS_OBJECT_COMP_CHANNEL,
&UVERBS_TYPE_ALLOC_FD(0,
sizeof(struct ib_uverbs_completion_event_file),
uverbs_hot_unplug_completion_event_file,
&uverbs_event_fops,
"[infinibandevent]", O_RDONLY));
const struct uverbs_obj_idr_type uverbs_type_attrs_rwq_ind_table = {
.type = UVERBS_TYPE_ALLOC_IDR(0),
.destroy_object = uverbs_free_rwq_ind_tbl,
};
DECLARE_UVERBS_OBJECT(uverbs_object_cq, UVERBS_OBJECT_CQ,
&UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_ucq_object), 0,
uverbs_free_cq),
&uverbs_method_cq_create,
&uverbs_method_cq_destroy);
const struct uverbs_obj_idr_type uverbs_type_attrs_xrcd = {
.type = UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uxrcd_object), 0),
.destroy_object = uverbs_free_xrcd,
};
DECLARE_UVERBS_OBJECT(uverbs_object_qp, UVERBS_OBJECT_QP,
&UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uqp_object), 0,
uverbs_free_qp));
const struct uverbs_obj_idr_type uverbs_type_attrs_pd = {
/* 2 is used in order to free the PD after MRs */
.type = UVERBS_TYPE_ALLOC_IDR(2),
.destroy_object = uverbs_free_pd,
};
DECLARE_UVERBS_OBJECT(uverbs_object_mw, UVERBS_OBJECT_MW,
&UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_mw));
DECLARE_UVERBS_OBJECT(uverbs_object_mr, UVERBS_OBJECT_MR,
/* 1 is used in order to free the MR after all the MWs */
&UVERBS_TYPE_ALLOC_IDR(1, uverbs_free_mr));
DECLARE_UVERBS_OBJECT(uverbs_object_srq, UVERBS_OBJECT_SRQ,
&UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_usrq_object), 0,
uverbs_free_srq));
DECLARE_UVERBS_OBJECT(uverbs_object_ah, UVERBS_OBJECT_AH,
&UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_ah));
DECLARE_UVERBS_OBJECT(uverbs_object_flow, UVERBS_OBJECT_FLOW,
&UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_flow));
DECLARE_UVERBS_OBJECT(uverbs_object_wq, UVERBS_OBJECT_WQ,
&UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uwq_object), 0,
uverbs_free_wq));
DECLARE_UVERBS_OBJECT(uverbs_object_rwq_ind_table,
UVERBS_OBJECT_RWQ_IND_TBL,
&UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_rwq_ind_tbl));
DECLARE_UVERBS_OBJECT(uverbs_object_xrcd, UVERBS_OBJECT_XRCD,
&UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uxrcd_object), 0,
uverbs_free_xrcd));
DECLARE_UVERBS_OBJECT(uverbs_object_pd, UVERBS_OBJECT_PD,
/* 2 is used in order to free the PD after MRs */
&UVERBS_TYPE_ALLOC_IDR(2, uverbs_free_pd));
DECLARE_UVERBS_OBJECT(uverbs_object_device, UVERBS_OBJECT_DEVICE, NULL);
DECLARE_UVERBS_OBJECT_TREE(uverbs_default_objects,
&uverbs_object_device,
&uverbs_object_pd,
&uverbs_object_mr,
&uverbs_object_comp_channel,
&uverbs_object_cq,
&uverbs_object_qp,
&uverbs_object_ah,
&uverbs_object_mw,
&uverbs_object_srq,
&uverbs_object_flow,
&uverbs_object_wq,
&uverbs_object_rwq_ind_table,
&uverbs_object_xrcd);

View File

@ -180,39 +180,29 @@ EXPORT_SYMBOL(ib_rate_to_mbps);
__attribute_const__ enum rdma_transport_type
rdma_node_get_transport(enum rdma_node_type node_type)
{
switch (node_type) {
case RDMA_NODE_IB_CA:
case RDMA_NODE_IB_SWITCH:
case RDMA_NODE_IB_ROUTER:
return RDMA_TRANSPORT_IB;
case RDMA_NODE_RNIC:
return RDMA_TRANSPORT_IWARP;
case RDMA_NODE_USNIC:
if (node_type == RDMA_NODE_USNIC)
return RDMA_TRANSPORT_USNIC;
case RDMA_NODE_USNIC_UDP:
if (node_type == RDMA_NODE_USNIC_UDP)
return RDMA_TRANSPORT_USNIC_UDP;
default:
BUG();
return 0;
}
if (node_type == RDMA_NODE_RNIC)
return RDMA_TRANSPORT_IWARP;
return RDMA_TRANSPORT_IB;
}
EXPORT_SYMBOL(rdma_node_get_transport);
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num)
{
enum rdma_transport_type lt;
if (device->get_link_layer)
return device->get_link_layer(device, port_num);
switch (rdma_node_get_transport(device->node_type)) {
case RDMA_TRANSPORT_IB:
lt = rdma_node_get_transport(device->node_type);
if (lt == RDMA_TRANSPORT_IB)
return IB_LINK_LAYER_INFINIBAND;
case RDMA_TRANSPORT_IWARP:
case RDMA_TRANSPORT_USNIC:
case RDMA_TRANSPORT_USNIC_UDP:
return IB_LINK_LAYER_ETHERNET;
default:
return IB_LINK_LAYER_UNSPECIFIED;
}
return IB_LINK_LAYER_ETHERNET;
}
EXPORT_SYMBOL(rdma_port_get_link_layer);
@ -478,6 +468,8 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
union ib_gid dgid;
union ib_gid sgid;
might_sleep();
memset(ah_attr, 0, sizeof *ah_attr);
ah_attr->type = rdma_ah_find_type(device, port_num);
if (rdma_cap_eth_ah(device, port_num)) {
@ -632,11 +624,13 @@ struct ib_srq *ib_create_srq(struct ib_pd *pd,
srq->event_handler = srq_init_attr->event_handler;
srq->srq_context = srq_init_attr->srq_context;
srq->srq_type = srq_init_attr->srq_type;
if (ib_srq_has_cq(srq->srq_type)) {
srq->ext.cq = srq_init_attr->ext.cq;
atomic_inc(&srq->ext.cq->usecnt);
}
if (srq->srq_type == IB_SRQT_XRC) {
srq->ext.xrc.xrcd = srq_init_attr->ext.xrc.xrcd;
srq->ext.xrc.cq = srq_init_attr->ext.xrc.cq;
atomic_inc(&srq->ext.xrc.xrcd->usecnt);
atomic_inc(&srq->ext.xrc.cq->usecnt);
}
atomic_inc(&pd->usecnt);
atomic_set(&srq->usecnt, 0);
@ -677,18 +671,18 @@ int ib_destroy_srq(struct ib_srq *srq)
pd = srq->pd;
srq_type = srq->srq_type;
if (srq_type == IB_SRQT_XRC) {
if (ib_srq_has_cq(srq_type))
cq = srq->ext.cq;
if (srq_type == IB_SRQT_XRC)
xrcd = srq->ext.xrc.xrcd;
cq = srq->ext.xrc.cq;
}
ret = srq->device->destroy_srq(srq);
if (!ret) {
atomic_dec(&pd->usecnt);
if (srq_type == IB_SRQT_XRC) {
if (srq_type == IB_SRQT_XRC)
atomic_dec(&xrcd->usecnt);
if (ib_srq_has_cq(srq_type))
atomic_dec(&cq->usecnt);
}
}
return ret;
@ -1244,6 +1238,18 @@ int ib_resolve_eth_dmac(struct ib_device *device,
if (rdma_link_local_addr((struct in6_addr *)grh->dgid.raw)) {
rdma_get_ll_mac((struct in6_addr *)grh->dgid.raw,
ah_attr->roce.dmac);
return 0;
}
if (rdma_is_multicast_addr((struct in6_addr *)ah_attr->grh.dgid.raw)) {
if (ipv6_addr_v4mapped((struct in6_addr *)ah_attr->grh.dgid.raw)) {
__be32 addr = 0;
memcpy(&addr, ah_attr->grh.dgid.raw + 12, 4);
ip_eth_mc_map(addr, (char *)ah_attr->roce.dmac);
} else {
ipv6_eth_mc_map((struct in6_addr *)ah_attr->grh.dgid.raw,
(char *)ah_attr->roce.dmac);
}
} else {
union ib_gid sgid;
struct ib_gid_attr sgid_attr;
@ -1306,6 +1312,61 @@ int ib_modify_qp_with_udata(struct ib_qp *qp, struct ib_qp_attr *attr,
}
EXPORT_SYMBOL(ib_modify_qp_with_udata);
int ib_get_eth_speed(struct ib_device *dev, u8 port_num, u8 *speed, u8 *width)
{
int rc;
u32 netdev_speed;
struct net_device *netdev;
struct ethtool_link_ksettings lksettings;
if (rdma_port_get_link_layer(dev, port_num) != IB_LINK_LAYER_ETHERNET)
return -EINVAL;
if (!dev->get_netdev)
return -EOPNOTSUPP;
netdev = dev->get_netdev(dev, port_num);
if (!netdev)
return -ENODEV;
rtnl_lock();
rc = __ethtool_get_link_ksettings(netdev, &lksettings);
rtnl_unlock();
dev_put(netdev);
if (!rc) {
netdev_speed = lksettings.base.speed;
} else {
netdev_speed = SPEED_1000;
pr_warn("%s speed is unknown, defaulting to %d\n", netdev->name,
netdev_speed);
}
if (netdev_speed <= SPEED_1000) {
*width = IB_WIDTH_1X;
*speed = IB_SPEED_SDR;
} else if (netdev_speed <= SPEED_10000) {
*width = IB_WIDTH_1X;
*speed = IB_SPEED_FDR10;
} else if (netdev_speed <= SPEED_20000) {
*width = IB_WIDTH_4X;
*speed = IB_SPEED_DDR;
} else if (netdev_speed <= SPEED_25000) {
*width = IB_WIDTH_1X;
*speed = IB_SPEED_EDR;
} else if (netdev_speed <= SPEED_40000) {
*width = IB_WIDTH_4X;
*speed = IB_SPEED_FDR10;
} else {
*width = IB_WIDTH_4X;
*speed = IB_SPEED_EDR;
}
return 0;
}
EXPORT_SYMBOL(ib_get_eth_speed);
int ib_modify_qp(struct ib_qp *qp,
struct ib_qp_attr *qp_attr,
int qp_attr_mask)
@ -1573,15 +1634,53 @@ EXPORT_SYMBOL(ib_dealloc_fmr);
/* Multicast groups */
static bool is_valid_mcast_lid(struct ib_qp *qp, u16 lid)
{
struct ib_qp_init_attr init_attr = {};
struct ib_qp_attr attr = {};
int num_eth_ports = 0;
int port;
/* If QP state >= init, it is assigned to a port and we can check this
* port only.
*/
if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
if (attr.qp_state >= IB_QPS_INIT) {
if (qp->device->get_link_layer(qp->device, attr.port_num) !=
IB_LINK_LAYER_INFINIBAND)
return true;
goto lid_check;
}
}
/* Can't get a quick answer, iterate over all ports */
for (port = 0; port < qp->device->phys_port_cnt; port++)
if (qp->device->get_link_layer(qp->device, port) !=
IB_LINK_LAYER_INFINIBAND)
num_eth_ports++;
/* If we have at lease one Ethernet port, RoCE annex declares that
* multicast LID should be ignored. We can't tell at this step if the
* QP belongs to an IB or Ethernet port.
*/
if (num_eth_ports)
return true;
/* If all the ports are IB, we can check according to IB spec. */
lid_check:
return !(lid < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
lid == be16_to_cpu(IB_LID_PERMISSIVE));
}
int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid)
{
int ret;
if (!qp->device->attach_mcast)
return -ENOSYS;
if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD ||
lid < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
lid == be16_to_cpu(IB_LID_PERMISSIVE))
if (!rdma_is_multicast_addr((struct in6_addr *)gid->raw) ||
qp->qp_type != IB_QPT_UD || !is_valid_mcast_lid(qp, lid))
return -EINVAL;
ret = qp->device->attach_mcast(qp, gid, lid);
@ -1597,9 +1696,9 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid)
if (!qp->device->detach_mcast)
return -ENOSYS;
if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD ||
lid < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
lid == be16_to_cpu(IB_LID_PERMISSIVE))
if (!rdma_is_multicast_addr((struct in6_addr *)gid->raw) ||
qp->qp_type != IB_QPT_UD || !is_valid_mcast_lid(qp, lid))
return -EINVAL;
ret = qp->device->detach_mcast(qp, gid, lid);

View File

@ -3,4 +3,4 @@ ccflags-y := -Idrivers/net/ethernet/broadcom/bnxt
obj-$(CONFIG_INFINIBAND_BNXT_RE) += bnxt_re.o
bnxt_re-y := main.o ib_verbs.o \
qplib_res.o qplib_rcfw.o \
qplib_sp.o qplib_fp.o
qplib_sp.o qplib_fp.o hw_counters.o

View File

@ -85,7 +85,7 @@ struct bnxt_re_sqp_entries {
};
#define BNXT_RE_MIN_MSIX 2
#define BNXT_RE_MAX_MSIX 16
#define BNXT_RE_MAX_MSIX 9
#define BNXT_RE_AEQ_IDX 0
#define BNXT_RE_NQ_IDX 1
@ -116,7 +116,7 @@ struct bnxt_re_dev {
struct bnxt_qplib_rcfw rcfw;
/* NQ */
struct bnxt_qplib_nq nq;
struct bnxt_qplib_nq nq[BNXT_RE_MAX_MSIX];
/* Device Resources */
struct bnxt_qplib_dev_attr dev_attr;
@ -140,6 +140,7 @@ struct bnxt_re_dev {
struct bnxt_re_qp *qp1_sqp;
struct bnxt_re_ah *sqp_ah;
struct bnxt_re_sqp_entries sqp_tbl[1024];
atomic_t nq_alloc_cnt;
};
#define to_bnxt_re_dev(ptr, member) \

View File

@ -0,0 +1,114 @@
/*
* Broadcom NetXtreme-E RoCE driver.
*
* Copyright (c) 2016 - 2017, Broadcom. All rights reserved. The term
* Broadcom refers to Broadcom Limited and/or its subsidiaries.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* BSD license below:
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
* IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* Description: Statistics
*
*/
#include <linux/interrupt.h>
#include <linux/types.h>
#include <linux/spinlock.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/pci.h>
#include <linux/prefetch.h>
#include <linux/delay.h>
#include <rdma/ib_addr.h>
#include "bnxt_ulp.h"
#include "roce_hsi.h"
#include "qplib_res.h"
#include "qplib_sp.h"
#include "qplib_fp.h"
#include "qplib_rcfw.h"
#include "bnxt_re.h"
#include "hw_counters.h"
static const char * const bnxt_re_stat_name[] = {
[BNXT_RE_ACTIVE_QP] = "active_qps",
[BNXT_RE_ACTIVE_SRQ] = "active_srqs",
[BNXT_RE_ACTIVE_CQ] = "active_cqs",
[BNXT_RE_ACTIVE_MR] = "active_mrs",
[BNXT_RE_ACTIVE_MW] = "active_mws",
[BNXT_RE_RX_PKTS] = "rx_pkts",
[BNXT_RE_RX_BYTES] = "rx_bytes",
[BNXT_RE_TX_PKTS] = "tx_pkts",
[BNXT_RE_TX_BYTES] = "tx_bytes",
[BNXT_RE_RECOVERABLE_ERRORS] = "recoverable_errors"
};
int bnxt_re_ib_get_hw_stats(struct ib_device *ibdev,
struct rdma_hw_stats *stats,
u8 port, int index)
{
struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
struct ctx_hw_stats *bnxt_re_stats = rdev->qplib_ctx.stats.dma;
if (!port || !stats)
return -EINVAL;
stats->value[BNXT_RE_ACTIVE_QP] = atomic_read(&rdev->qp_count);
stats->value[BNXT_RE_ACTIVE_SRQ] = atomic_read(&rdev->srq_count);
stats->value[BNXT_RE_ACTIVE_CQ] = atomic_read(&rdev->cq_count);
stats->value[BNXT_RE_ACTIVE_MR] = atomic_read(&rdev->mr_count);
stats->value[BNXT_RE_ACTIVE_MW] = atomic_read(&rdev->mw_count);
if (bnxt_re_stats) {
stats->value[BNXT_RE_RECOVERABLE_ERRORS] =
le64_to_cpu(bnxt_re_stats->tx_bcast_pkts);
stats->value[BNXT_RE_RX_PKTS] =
le64_to_cpu(bnxt_re_stats->rx_ucast_pkts);
stats->value[BNXT_RE_RX_BYTES] =
le64_to_cpu(bnxt_re_stats->rx_ucast_bytes);
stats->value[BNXT_RE_TX_PKTS] =
le64_to_cpu(bnxt_re_stats->tx_ucast_pkts);
stats->value[BNXT_RE_TX_BYTES] =
le64_to_cpu(bnxt_re_stats->tx_ucast_bytes);
}
return ARRAY_SIZE(bnxt_re_stat_name);
}
struct rdma_hw_stats *bnxt_re_ib_alloc_hw_stats(struct ib_device *ibdev,
u8 port_num)
{
BUILD_BUG_ON(ARRAY_SIZE(bnxt_re_stat_name) != BNXT_RE_NUM_COUNTERS);
/* We support only per port stats */
if (!port_num)
return NULL;
return rdma_alloc_hw_stats_struct(bnxt_re_stat_name,
ARRAY_SIZE(bnxt_re_stat_name),
RDMA_HW_STATS_DEFAULT_LIFESPAN);
}

View File

@ -0,0 +1,62 @@
/*
* Broadcom NetXtreme-E RoCE driver.
*
* Copyright (c) 2016 - 2017, Broadcom. All rights reserved. The term
* Broadcom refers to Broadcom Limited and/or its subsidiaries.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* BSD license below:
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
* IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* Description: Statistics (header)
*
*/
#ifndef __BNXT_RE_HW_STATS_H__
#define __BNXT_RE_HW_STATS_H__
enum bnxt_re_hw_stats {
BNXT_RE_ACTIVE_QP,
BNXT_RE_ACTIVE_SRQ,
BNXT_RE_ACTIVE_CQ,
BNXT_RE_ACTIVE_MR,
BNXT_RE_ACTIVE_MW,
BNXT_RE_RX_PKTS,
BNXT_RE_RX_BYTES,
BNXT_RE_TX_PKTS,
BNXT_RE_TX_BYTES,
BNXT_RE_RECOVERABLE_ERRORS,
BNXT_RE_NUM_COUNTERS
};
struct rdma_hw_stats *bnxt_re_ib_alloc_hw_stats(struct ib_device *ibdev,
u8 port_num);
int bnxt_re_ib_get_hw_stats(struct ib_device *ibdev,
struct rdma_hw_stats *stats,
u8 port, int index);
#endif /* __BNXT_RE_HW_STATS_H__ */

View File

@ -223,50 +223,6 @@ int bnxt_re_modify_device(struct ib_device *ibdev,
return 0;
}
static void __to_ib_speed_width(struct net_device *netdev, u8 *speed, u8 *width)
{
struct ethtool_link_ksettings lksettings;
u32 espeed;
if (netdev->ethtool_ops && netdev->ethtool_ops->get_link_ksettings) {
memset(&lksettings, 0, sizeof(lksettings));
rtnl_lock();
netdev->ethtool_ops->get_link_ksettings(netdev, &lksettings);
rtnl_unlock();
espeed = lksettings.base.speed;
} else {
espeed = SPEED_UNKNOWN;
}
switch (espeed) {
case SPEED_1000:
*speed = IB_SPEED_SDR;
*width = IB_WIDTH_1X;
break;
case SPEED_10000:
*speed = IB_SPEED_QDR;
*width = IB_WIDTH_1X;
break;
case SPEED_20000:
*speed = IB_SPEED_DDR;
*width = IB_WIDTH_4X;
break;
case SPEED_25000:
*speed = IB_SPEED_EDR;
*width = IB_WIDTH_1X;
break;
case SPEED_40000:
*speed = IB_SPEED_QDR;
*width = IB_WIDTH_4X;
break;
case SPEED_50000:
break;
default:
*speed = IB_SPEED_SDR;
*width = IB_WIDTH_1X;
break;
}
}
/* Port */
int bnxt_re_query_port(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr *port_attr)
@ -308,25 +264,9 @@ int bnxt_re_query_port(struct ib_device *ibdev, u8 port_num,
* IB stack to avoid race in the NETDEV_UNREG path
*/
if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
__to_ib_speed_width(rdev->netdev, &port_attr->active_speed,
&port_attr->active_width);
return 0;
}
int bnxt_re_modify_port(struct ib_device *ibdev, u8 port_num,
int port_modify_mask,
struct ib_port_modify *port_modify)
{
switch (port_modify_mask) {
case IB_PORT_SHUTDOWN:
break;
case IB_PORT_INIT_TYPE:
break;
case IB_PORT_RESET_QKEY_CNTR:
break;
default:
break;
}
if (ib_get_eth_speed(ibdev, port_num, &port_attr->active_speed,
&port_attr->active_width))
return -EINVAL;
return 0;
}
@ -846,6 +786,7 @@ int bnxt_re_destroy_qp(struct ib_qp *ib_qp)
struct bnxt_re_dev *rdev = qp->rdev;
int rc;
bnxt_qplib_del_flush_qp(&qp->qplib_qp);
rc = bnxt_qplib_destroy_qp(&rdev->qplib_res, &qp->qplib_qp);
if (rc) {
dev_err(rdev_to_dev(rdev), "Failed to destroy HW QP");
@ -860,6 +801,7 @@ int bnxt_re_destroy_qp(struct ib_qp *ib_qp)
return rc;
}
bnxt_qplib_del_flush_qp(&qp->qplib_qp);
rc = bnxt_qplib_destroy_qp(&rdev->qplib_res,
&rdev->qp1_sqp->qplib_qp);
if (rc) {
@ -969,7 +911,6 @@ static struct bnxt_re_ah *bnxt_re_create_shadow_qp_ah
if (!ah)
return NULL;
memset(ah, 0, sizeof(*ah));
ah->rdev = rdev;
ah->qplib_ah.pd = &pd->qplib_pd;
@ -1016,7 +957,6 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
if (!qp)
return NULL;
memset(qp, 0, sizeof(*qp));
qp->rdev = rdev;
/* Initialize the shadow QP structure from the QP1 values */
@ -1404,6 +1344,21 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
}
qp->qplib_qp.modify_flags |= CMDQ_MODIFY_QP_MODIFY_MASK_STATE;
qp->qplib_qp.state = __from_ib_qp_state(qp_attr->qp_state);
if (!qp->sumem &&
qp->qplib_qp.state == CMDQ_MODIFY_QP_NEW_STATE_ERR) {
dev_dbg(rdev_to_dev(rdev),
"Move QP = %p to flush list\n",
qp);
bnxt_qplib_add_flush_qp(&qp->qplib_qp);
}
if (!qp->sumem &&
qp->qplib_qp.state == CMDQ_MODIFY_QP_NEW_STATE_RESET) {
dev_dbg(rdev_to_dev(rdev),
"Move QP = %p out of flush list\n",
qp);
bnxt_qplib_del_flush_qp(&qp->qplib_qp);
}
}
if (qp_attr_mask & IB_QP_EN_SQD_ASYNC_NOTIFY) {
qp->qplib_qp.modify_flags |=
@ -2333,6 +2288,7 @@ int bnxt_re_destroy_cq(struct ib_cq *ib_cq)
struct bnxt_re_cq *cq = container_of(ib_cq, struct bnxt_re_cq, ib_cq);
struct bnxt_re_dev *rdev = cq->rdev;
int rc;
struct bnxt_qplib_nq *nq = cq->qplib_cq.nq;
rc = bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
if (rc) {
@ -2347,7 +2303,7 @@ int bnxt_re_destroy_cq(struct ib_cq *ib_cq)
kfree(cq);
}
atomic_dec(&rdev->cq_count);
rdev->nq.budget--;
nq->budget--;
return 0;
}
@ -2361,6 +2317,8 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
struct bnxt_re_cq *cq = NULL;
int rc, entries;
int cqe = attr->cqe;
struct bnxt_qplib_nq *nq = NULL;
unsigned int nq_alloc_cnt;
/* Validate CQ fields */
if (cqe < 1 || cqe > dev_attr->max_cq_wqes) {
@ -2412,8 +2370,15 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
cq->qplib_cq.sghead = NULL;
cq->qplib_cq.nmap = 0;
}
/*
* Allocating the NQ in a round robin fashion. nq_alloc_cnt is a
* used for getting the NQ index.
*/
nq_alloc_cnt = atomic_inc_return(&rdev->nq_alloc_cnt);
nq = &rdev->nq[nq_alloc_cnt % (rdev->num_msix - 1)];
cq->qplib_cq.max_wqe = entries;
cq->qplib_cq.cnq_hw_ring_id = rdev->nq.ring_id;
cq->qplib_cq.cnq_hw_ring_id = nq->ring_id;
cq->qplib_cq.nq = nq;
rc = bnxt_qplib_create_cq(&rdev->qplib_res, &cq->qplib_cq);
if (rc) {
@ -2423,7 +2388,7 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
cq->ib_cq.cqe = entries;
cq->cq_period = cq->qplib_cq.period;
rdev->nq.budget++;
nq->budget++;
atomic_inc(&rdev->cq_count);
@ -2921,6 +2886,10 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
sq->send_phantom = false;
}
}
if (ncqe < budget)
ncqe += bnxt_qplib_process_flush_list(&cq->qplib_cq,
cqe + ncqe,
budget - ncqe);
if (!ncqe)
break;
@ -3410,7 +3379,7 @@ int bnxt_re_dealloc_ucontext(struct ib_ucontext *ib_uctx)
&rdev->qplib_res.dpi_tbl,
&uctx->dpi);
if (rc)
dev_err(rdev_to_dev(rdev), "Deallocte HW DPI failed!");
dev_err(rdev_to_dev(rdev), "Deallocate HW DPI failed!");
/* Don't fail, continue*/
uctx->dpi.dbr = NULL;
}

View File

@ -141,9 +141,6 @@ int bnxt_re_modify_device(struct ib_device *ibdev,
struct ib_device_modify *device_modify);
int bnxt_re_query_port(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr *port_attr);
int bnxt_re_modify_port(struct ib_device *ibdev, u8 port_num,
int port_modify_mask,
struct ib_port_modify *port_modify);
int bnxt_re_get_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_immutable *immutable);
int bnxt_re_query_pkey(struct ib_device *ibdev, u8 port_num,

View File

@ -64,13 +64,14 @@
#include "ib_verbs.h"
#include <rdma/bnxt_re-abi.h>
#include "bnxt.h"
#include "hw_counters.h"
static char version[] =
BNXT_RE_DESC " v" ROCE_DRV_MODULE_VERSION "\n";
MODULE_AUTHOR("Eddie Wai <eddie.wai@broadcom.com>");
MODULE_DESCRIPTION(BNXT_RE_DESC " Driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(ROCE_DRV_MODULE_VERSION);
/* globals */
static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
@ -162,7 +163,7 @@ static int bnxt_re_free_msix(struct bnxt_re_dev *rdev, bool lock_wait)
static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
{
int rc = 0, num_msix_want = BNXT_RE_MIN_MSIX, num_msix_got;
int rc = 0, num_msix_want = BNXT_RE_MAX_MSIX, num_msix_got;
struct bnxt_en_dev *en_dev;
if (!rdev)
@ -170,6 +171,8 @@ static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
en_dev = rdev->en_dev;
num_msix_want = min_t(u32, BNXT_RE_MAX_MSIX, num_online_cpus());
rtnl_lock();
num_msix_got = en_dev->en_ops->bnxt_request_msix(en_dev, BNXT_ROCE_ULP,
rdev->msix_entries,
@ -474,7 +477,6 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
ibdev->modify_device = bnxt_re_modify_device;
ibdev->query_port = bnxt_re_query_port;
ibdev->modify_port = bnxt_re_modify_port;
ibdev->get_port_immutable = bnxt_re_get_port_immutable;
ibdev->query_pkey = bnxt_re_query_pkey;
ibdev->query_gid = bnxt_re_query_gid;
@ -513,6 +515,8 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
ibdev->alloc_ucontext = bnxt_re_alloc_ucontext;
ibdev->dealloc_ucontext = bnxt_re_dealloc_ucontext;
ibdev->mmap = bnxt_re_mmap;
ibdev->get_hw_stats = bnxt_re_ib_get_hw_stats;
ibdev->alloc_hw_stats = bnxt_re_ib_alloc_hw_stats;
return ib_register_device(ibdev, NULL);
}
@ -653,8 +657,12 @@ static int bnxt_re_cqn_handler(struct bnxt_qplib_nq *nq,
static void bnxt_re_cleanup_res(struct bnxt_re_dev *rdev)
{
if (rdev->nq.hwq.max_elements)
bnxt_qplib_disable_nq(&rdev->nq);
int i;
if (rdev->nq[0].hwq.max_elements) {
for (i = 1; i < rdev->num_msix; i++)
bnxt_qplib_disable_nq(&rdev->nq[i - 1]);
}
if (rdev->qplib_res.rcfw)
bnxt_qplib_cleanup_res(&rdev->qplib_res);
@ -662,31 +670,41 @@ static void bnxt_re_cleanup_res(struct bnxt_re_dev *rdev)
static int bnxt_re_init_res(struct bnxt_re_dev *rdev)
{
int rc = 0;
int rc = 0, i;
bnxt_qplib_init_res(&rdev->qplib_res);
if (rdev->msix_entries[BNXT_RE_NQ_IDX].vector <= 0)
return -EINVAL;
rc = bnxt_qplib_enable_nq(rdev->en_dev->pdev, &rdev->nq,
rdev->msix_entries[BNXT_RE_NQ_IDX].vector,
rdev->msix_entries[BNXT_RE_NQ_IDX].db_offset,
&bnxt_re_cqn_handler,
NULL);
if (rc)
dev_err(rdev_to_dev(rdev), "Failed to enable NQ: %#x", rc);
for (i = 1; i < rdev->num_msix ; i++) {
rc = bnxt_qplib_enable_nq(rdev->en_dev->pdev, &rdev->nq[i - 1],
i - 1, rdev->msix_entries[i].vector,
rdev->msix_entries[i].db_offset,
&bnxt_re_cqn_handler, NULL);
if (rc) {
dev_err(rdev_to_dev(rdev),
"Failed to enable NQ with rc = 0x%x", rc);
goto fail;
}
}
return 0;
fail:
return rc;
}
static void bnxt_re_free_nq_res(struct bnxt_re_dev *rdev, bool lock_wait)
{
int i;
for (i = 0; i < rdev->num_msix - 1; i++) {
bnxt_re_net_ring_free(rdev, rdev->nq[i].ring_id, lock_wait);
bnxt_qplib_free_nq(&rdev->nq[i]);
}
}
static void bnxt_re_free_res(struct bnxt_re_dev *rdev, bool lock_wait)
{
if (rdev->nq.hwq.max_elements) {
bnxt_re_net_ring_free(rdev, rdev->nq.ring_id, lock_wait);
bnxt_qplib_free_nq(&rdev->nq);
}
bnxt_re_free_nq_res(rdev, lock_wait);
if (rdev->qplib_res.dpi_tbl.max) {
bnxt_qplib_dealloc_dpi(&rdev->qplib_res,
&rdev->qplib_res.dpi_tbl,
@ -700,7 +718,7 @@ static void bnxt_re_free_res(struct bnxt_re_dev *rdev, bool lock_wait)
static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
{
int rc = 0;
int rc = 0, i;
/* Configure and allocate resources for qplib */
rdev->qplib_res.rcfw = &rdev->rcfw;
@ -717,30 +735,42 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
&rdev->dpi_privileged,
rdev);
if (rc)
goto fail;
goto dealloc_res;
rdev->nq.hwq.max_elements = BNXT_RE_MAX_CQ_COUNT +
BNXT_RE_MAX_SRQC_COUNT + 2;
rc = bnxt_qplib_alloc_nq(rdev->en_dev->pdev, &rdev->nq);
if (rc) {
dev_err(rdev_to_dev(rdev),
"Failed to allocate NQ memory: %#x", rc);
goto fail;
}
rc = bnxt_re_net_ring_alloc
(rdev, rdev->nq.hwq.pbl[PBL_LVL_0].pg_map_arr,
rdev->nq.hwq.pbl[rdev->nq.hwq.level].pg_count,
HWRM_RING_ALLOC_CMPL, BNXT_QPLIB_NQE_MAX_CNT - 1,
rdev->msix_entries[BNXT_RE_NQ_IDX].ring_idx,
&rdev->nq.ring_id);
if (rc) {
dev_err(rdev_to_dev(rdev),
"Failed to allocate NQ ring: %#x", rc);
goto free_nq;
for (i = 0; i < rdev->num_msix - 1; i++) {
rdev->nq[i].hwq.max_elements = BNXT_RE_MAX_CQ_COUNT +
BNXT_RE_MAX_SRQC_COUNT + 2;
rc = bnxt_qplib_alloc_nq(rdev->en_dev->pdev, &rdev->nq[i]);
if (rc) {
dev_err(rdev_to_dev(rdev), "Alloc Failed NQ%d rc:%#x",
i, rc);
goto dealloc_dpi;
}
rc = bnxt_re_net_ring_alloc
(rdev, rdev->nq[i].hwq.pbl[PBL_LVL_0].pg_map_arr,
rdev->nq[i].hwq.pbl[rdev->nq[i].hwq.level].pg_count,
HWRM_RING_ALLOC_CMPL,
BNXT_QPLIB_NQE_MAX_CNT - 1,
rdev->msix_entries[i + 1].ring_idx,
&rdev->nq[i].ring_id);
if (rc) {
dev_err(rdev_to_dev(rdev),
"Failed to allocate NQ fw id with rc = 0x%x",
rc);
goto free_nq;
}
}
return 0;
free_nq:
bnxt_qplib_free_nq(&rdev->nq);
for (i = 0; i < rdev->num_msix - 1; i++)
bnxt_qplib_free_nq(&rdev->nq[i]);
dealloc_dpi:
bnxt_qplib_dealloc_dpi(&rdev->qplib_res,
&rdev->qplib_res.dpi_tbl,
&rdev->dpi_privileged);
dealloc_res:
bnxt_qplib_free_res(&rdev->qplib_res);
fail:
rdev->qplib_res.rcfw = NULL;
return rc;
@ -835,6 +865,42 @@ static void bnxt_re_dev_stop(struct bnxt_re_dev *rdev)
mutex_unlock(&rdev->qp_lock);
}
static int bnxt_re_update_gid(struct bnxt_re_dev *rdev)
{
struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
struct bnxt_qplib_gid gid;
u16 gid_idx, index;
int rc = 0;
if (!test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
return 0;
if (!sgid_tbl) {
dev_err(rdev_to_dev(rdev), "QPLIB: SGID table not allocated");
return -EINVAL;
}
for (index = 0; index < sgid_tbl->active; index++) {
gid_idx = sgid_tbl->hw_id[index];
if (!memcmp(&sgid_tbl->tbl[index], &bnxt_qplib_gid_zero,
sizeof(bnxt_qplib_gid_zero)))
continue;
/* need to modify the VLAN enable setting of non VLAN GID only
* as setting is done for VLAN GID while adding GID
*/
if (sgid_tbl->vlan[index])
continue;
memcpy(&gid, &sgid_tbl->tbl[index], sizeof(gid));
rc = bnxt_qplib_update_sgid(sgid_tbl, &gid, gid_idx,
rdev->qplib_res.netdev->dev_addr);
}
return rc;
}
static u32 bnxt_re_get_priority_mask(struct bnxt_re_dev *rdev)
{
u32 prio_map = 0, tmp_map = 0;
@ -854,8 +920,6 @@ static u32 bnxt_re_get_priority_mask(struct bnxt_re_dev *rdev)
tmp_map = dcb_ieee_getapp_mask(netdev, &app);
prio_map |= tmp_map;
if (!prio_map)
prio_map = -EFAULT;
return prio_map;
}
@ -881,10 +945,7 @@ static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
int rc;
/* Get priority for roce */
rc = bnxt_re_get_priority_mask(rdev);
if (rc < 0)
return rc;
prio_map = (u8)rc;
prio_map = bnxt_re_get_priority_mask(rdev);
if (prio_map == rdev->cur_prio_map)
return 0;
@ -906,6 +967,16 @@ static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
return rc;
}
/* Actual priorities are not programmed as they are already
* done by L2 driver; just enable or disable priority vlan tagging
*/
if ((prio_map == 0 && rdev->qplib_res.prio) ||
(prio_map != 0 && !rdev->qplib_res.prio)) {
rdev->qplib_res.prio = prio_map ? true : false;
bnxt_re_update_gid(rdev);
}
return 0;
}
@ -998,7 +1069,8 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
/* Establish RCFW Communication Channel to initialize the context
* memory for the function and all child VFs
*/
rc = bnxt_qplib_alloc_rcfw_channel(rdev->en_dev->pdev, &rdev->rcfw);
rc = bnxt_qplib_alloc_rcfw_channel(rdev->en_dev->pdev, &rdev->rcfw,
BNXT_RE_MAX_QPC_COUNT);
if (rc)
goto fail;

View File

@ -51,6 +51,168 @@
#include "qplib_fp.h"
static void bnxt_qplib_arm_cq_enable(struct bnxt_qplib_cq *cq);
static void __clean_cq(struct bnxt_qplib_cq *cq, u64 qp);
static void bnxt_qplib_cancel_phantom_processing(struct bnxt_qplib_qp *qp)
{
qp->sq.condition = false;
qp->sq.send_phantom = false;
qp->sq.single = false;
}
/* Flush list */
static void __bnxt_qplib_add_flush_qp(struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_cq *scq, *rcq;
scq = qp->scq;
rcq = qp->rcq;
if (!qp->sq.flushed) {
dev_dbg(&scq->hwq.pdev->dev,
"QPLIB: FP: Adding to SQ Flush list = %p",
qp);
bnxt_qplib_cancel_phantom_processing(qp);
list_add_tail(&qp->sq_flush, &scq->sqf_head);
qp->sq.flushed = true;
}
if (!qp->srq) {
if (!qp->rq.flushed) {
dev_dbg(&rcq->hwq.pdev->dev,
"QPLIB: FP: Adding to RQ Flush list = %p",
qp);
list_add_tail(&qp->rq_flush, &rcq->rqf_head);
qp->rq.flushed = true;
}
}
}
void bnxt_qplib_acquire_cq_locks(struct bnxt_qplib_qp *qp,
unsigned long *flags)
__acquires(&qp->scq->hwq.lock) __acquires(&qp->rcq->hwq.lock)
{
spin_lock_irqsave(&qp->scq->hwq.lock, *flags);
if (qp->scq == qp->rcq)
__acquire(&qp->rcq->hwq.lock);
else
spin_lock(&qp->rcq->hwq.lock);
}
void bnxt_qplib_release_cq_locks(struct bnxt_qplib_qp *qp,
unsigned long *flags)
__releases(&qp->scq->hwq.lock) __releases(&qp->rcq->hwq.lock)
{
if (qp->scq == qp->rcq)
__release(&qp->rcq->hwq.lock);
else
spin_unlock(&qp->rcq->hwq.lock);
spin_unlock_irqrestore(&qp->scq->hwq.lock, *flags);
}
static struct bnxt_qplib_cq *bnxt_qplib_find_buddy_cq(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_cq *cq)
{
struct bnxt_qplib_cq *buddy_cq = NULL;
if (qp->scq == qp->rcq)
buddy_cq = NULL;
else if (qp->scq == cq)
buddy_cq = qp->rcq;
else
buddy_cq = qp->scq;
return buddy_cq;
}
static void bnxt_qplib_lock_buddy_cq(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_cq *cq)
__acquires(&buddy_cq->hwq.lock)
{
struct bnxt_qplib_cq *buddy_cq = NULL;
buddy_cq = bnxt_qplib_find_buddy_cq(qp, cq);
if (!buddy_cq)
__acquire(&cq->hwq.lock);
else
spin_lock(&buddy_cq->hwq.lock);
}
static void bnxt_qplib_unlock_buddy_cq(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_cq *cq)
__releases(&buddy_cq->hwq.lock)
{
struct bnxt_qplib_cq *buddy_cq = NULL;
buddy_cq = bnxt_qplib_find_buddy_cq(qp, cq);
if (!buddy_cq)
__release(&cq->hwq.lock);
else
spin_unlock(&buddy_cq->hwq.lock);
}
void bnxt_qplib_add_flush_qp(struct bnxt_qplib_qp *qp)
{
unsigned long flags;
bnxt_qplib_acquire_cq_locks(qp, &flags);
__bnxt_qplib_add_flush_qp(qp);
bnxt_qplib_release_cq_locks(qp, &flags);
}
static void __bnxt_qplib_del_flush_qp(struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_cq *scq, *rcq;
scq = qp->scq;
rcq = qp->rcq;
if (qp->sq.flushed) {
qp->sq.flushed = false;
list_del(&qp->sq_flush);
}
if (!qp->srq) {
if (qp->rq.flushed) {
qp->rq.flushed = false;
list_del(&qp->rq_flush);
}
}
}
void bnxt_qplib_del_flush_qp(struct bnxt_qplib_qp *qp)
{
unsigned long flags;
bnxt_qplib_acquire_cq_locks(qp, &flags);
__clean_cq(qp->scq, (u64)(unsigned long)qp);
qp->sq.hwq.prod = 0;
qp->sq.hwq.cons = 0;
__clean_cq(qp->rcq, (u64)(unsigned long)qp);
qp->rq.hwq.prod = 0;
qp->rq.hwq.cons = 0;
__bnxt_qplib_del_flush_qp(qp);
bnxt_qplib_release_cq_locks(qp, &flags);
}
static void bnxt_qpn_cqn_sched_task(struct work_struct *work)
{
struct bnxt_qplib_nq_work *nq_work =
container_of(work, struct bnxt_qplib_nq_work, work);
struct bnxt_qplib_cq *cq = nq_work->cq;
struct bnxt_qplib_nq *nq = nq_work->nq;
if (cq && nq) {
spin_lock_bh(&cq->compl_lock);
if (atomic_read(&cq->arm_state) && nq->cqn_handler) {
dev_dbg(&nq->pdev->dev,
"%s:Trigger cq = %p event nq = %p\n",
__func__, cq, nq);
nq->cqn_handler(nq, cq);
}
spin_unlock_bh(&cq->compl_lock);
}
kfree(nq_work);
}
static void bnxt_qplib_free_qp_hdr_buf(struct bnxt_qplib_res *res,
struct bnxt_qplib_qp *qp)
@ -119,6 +281,7 @@ static void bnxt_qplib_service_nq(unsigned long data)
struct bnxt_qplib_nq *nq = (struct bnxt_qplib_nq *)data;
struct bnxt_qplib_hwq *hwq = &nq->hwq;
struct nq_base *nqe, **nq_ptr;
struct bnxt_qplib_cq *cq;
int num_cqne_processed = 0;
u32 sw_cons, raw_cons;
u16 type;
@ -143,15 +306,17 @@ static void bnxt_qplib_service_nq(unsigned long data)
q_handle = le32_to_cpu(nqcne->cq_handle_low);
q_handle |= (u64)le32_to_cpu(nqcne->cq_handle_high)
<< 32;
bnxt_qplib_arm_cq_enable((struct bnxt_qplib_cq *)
((unsigned long)q_handle));
if (!nq->cqn_handler(nq, (struct bnxt_qplib_cq *)
((unsigned long)q_handle)))
cq = (struct bnxt_qplib_cq *)(unsigned long)q_handle;
bnxt_qplib_arm_cq_enable(cq);
spin_lock_bh(&cq->compl_lock);
atomic_set(&cq->arm_state, 0);
if (!nq->cqn_handler(nq, (cq)))
num_cqne_processed++;
else
dev_warn(&nq->pdev->dev,
"QPLIB: cqn - type 0x%x not handled",
type);
spin_unlock_bh(&cq->compl_lock);
break;
}
case NQ_BASE_TYPE_DBQ_EVENT:
@ -190,12 +355,17 @@ static irqreturn_t bnxt_qplib_nq_irq(int irq, void *dev_instance)
void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq)
{
if (nq->cqn_wq) {
destroy_workqueue(nq->cqn_wq);
nq->cqn_wq = NULL;
}
/* Make sure the HW is stopped! */
synchronize_irq(nq->vector);
tasklet_disable(&nq->worker);
tasklet_kill(&nq->worker);
if (nq->requested) {
irq_set_affinity_hint(nq->vector, NULL);
free_irq(nq->vector, nq);
nq->requested = false;
}
@ -209,14 +379,14 @@ void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq)
}
int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
int msix_vector, int bar_reg_offset,
int nq_idx, int msix_vector, int bar_reg_offset,
int (*cqn_handler)(struct bnxt_qplib_nq *nq,
struct bnxt_qplib_cq *),
int (*srqn_handler)(struct bnxt_qplib_nq *nq,
void *, u8 event))
{
resource_size_t nq_base;
int rc;
int rc = -1;
nq->pdev = pdev;
nq->vector = msix_vector;
@ -227,14 +397,31 @@ int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
tasklet_init(&nq->worker, bnxt_qplib_service_nq, (unsigned long)nq);
/* Have a task to schedule CQ notifiers in post send case */
nq->cqn_wq = create_singlethread_workqueue("bnxt_qplib_nq");
if (!nq->cqn_wq)
goto fail;
nq->requested = false;
rc = request_irq(nq->vector, bnxt_qplib_nq_irq, 0, "bnxt_qplib_nq", nq);
memset(nq->name, 0, 32);
sprintf(nq->name, "bnxt_qplib_nq-%d", nq_idx);
rc = request_irq(nq->vector, bnxt_qplib_nq_irq, 0, nq->name, nq);
if (rc) {
dev_err(&nq->pdev->dev,
"Failed to request IRQ for NQ: %#x", rc);
bnxt_qplib_disable_nq(nq);
goto fail;
}
cpumask_clear(&nq->mask);
cpumask_set_cpu(nq_idx, &nq->mask);
rc = irq_set_affinity_hint(nq->vector, &nq->mask);
if (rc) {
dev_warn(&nq->pdev->dev,
"QPLIB: set affinity failed; vector: %d nq_idx: %d\n",
nq->vector, nq_idx);
}
nq->requested = true;
nq->bar_reg = NQ_CONS_PCI_BAR_REGION;
nq->bar_reg_off = bar_reg_offset;
@ -258,8 +445,10 @@ fail:
void bnxt_qplib_free_nq(struct bnxt_qplib_nq *nq)
{
if (nq->hwq.max_elements)
if (nq->hwq.max_elements) {
bnxt_qplib_free_hwq(nq->pdev, &nq->hwq);
nq->hwq.max_elements = 0;
}
}
int bnxt_qplib_alloc_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq)
@ -401,8 +590,8 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
qp->id = le32_to_cpu(resp.xid);
qp->cur_qp_state = CMDQ_MODIFY_QP_NEW_STATE_RESET;
sq->flush_in_progress = false;
rq->flush_in_progress = false;
rcfw->qp_tbl[qp->id].qp_id = qp->id;
rcfw->qp_tbl[qp->id].qp_handle = (void *)qp;
return 0;
@ -615,8 +804,10 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
qp->id = le32_to_cpu(resp.xid);
qp->cur_qp_state = CMDQ_MODIFY_QP_NEW_STATE_RESET;
sq->flush_in_progress = false;
rq->flush_in_progress = false;
INIT_LIST_HEAD(&qp->sq_flush);
INIT_LIST_HEAD(&qp->rq_flush);
rcfw->qp_tbl[qp->id].qp_id = qp->id;
rcfw->qp_tbl[qp->id].qp_handle = (void *)qp;
return 0;
@ -963,13 +1154,19 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res,
u16 cmd_flags = 0;
int rc;
rcfw->qp_tbl[qp->id].qp_id = BNXT_QPLIB_QP_ID_INVALID;
rcfw->qp_tbl[qp->id].qp_handle = NULL;
RCFW_CMD_PREP(req, DESTROY_QP, cmd_flags);
req.qp_cid = cpu_to_le32(qp->id);
rc = bnxt_qplib_rcfw_send_message(rcfw, (void *)&req,
(void *)&resp, NULL, 0);
if (rc)
if (rc) {
rcfw->qp_tbl[qp->id].qp_id = qp->id;
rcfw->qp_tbl[qp->id].qp_handle = qp;
return rc;
}
/* Must walk the associated CQs to nullified the QP ptr */
spin_lock_irqsave(&qp->scq->hwq.lock, flags);
@ -1074,14 +1271,21 @@ int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_swq *swq;
struct sq_send *hw_sq_send_hdr, **hw_sq_send_ptr;
struct sq_sge *hw_sge;
struct bnxt_qplib_nq_work *nq_work = NULL;
bool sch_handler = false;
u32 sw_prod;
u8 wqe_size16;
int i, rc = 0, data_len = 0, pkt_num = 0;
__le32 temp32;
if (qp->state != CMDQ_MODIFY_QP_NEW_STATE_RTS) {
rc = -EINVAL;
goto done;
if (qp->state == CMDQ_MODIFY_QP_NEW_STATE_ERR) {
sch_handler = true;
dev_dbg(&sq->hwq.pdev->dev,
"%s Error QP. Scheduling for poll_cq\n",
__func__);
goto queue_err;
}
}
if (bnxt_qplib_queue_full(sq)) {
@ -1301,12 +1505,35 @@ int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
((swq->next_psn << SQ_PSN_SEARCH_NEXT_PSN_SFT) &
SQ_PSN_SEARCH_NEXT_PSN_MASK));
}
queue_err:
if (sch_handler) {
/* Store the ULP info in the software structures */
sw_prod = HWQ_CMP(sq->hwq.prod, &sq->hwq);
swq = &sq->swq[sw_prod];
swq->wr_id = wqe->wr_id;
swq->type = wqe->type;
swq->flags = wqe->flags;
if (qp->sig_type)
swq->flags |= SQ_SEND_FLAGS_SIGNAL_COMP;
swq->start_psn = sq->psn & BTH_PSN_MASK;
}
sq->hwq.prod++;
qp->wqe_cnt++;
done:
if (sch_handler) {
nq_work = kzalloc(sizeof(*nq_work), GFP_ATOMIC);
if (nq_work) {
nq_work->cq = qp->scq;
nq_work->nq = qp->scq->nq;
INIT_WORK(&nq_work->work, bnxt_qpn_cqn_sched_task);
queue_work(qp->scq->nq->cqn_wq, &nq_work->work);
} else {
dev_err(&sq->hwq.pdev->dev,
"QPLIB: FP: Failed to allocate SQ nq_work!");
rc = -ENOMEM;
}
}
return rc;
}
@ -1334,15 +1561,17 @@ int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_q *rq = &qp->rq;
struct rq_wqe *rqe, **rqe_ptr;
struct sq_sge *hw_sge;
struct bnxt_qplib_nq_work *nq_work = NULL;
bool sch_handler = false;
u32 sw_prod;
int i, rc = 0;
if (qp->state == CMDQ_MODIFY_QP_NEW_STATE_ERR) {
dev_err(&rq->hwq.pdev->dev,
"QPLIB: FP: QP (0x%x) is in the 0x%x state",
qp->id, qp->state);
rc = -EINVAL;
goto done;
sch_handler = true;
dev_dbg(&rq->hwq.pdev->dev,
"%s Error QP. Scheduling for poll_cq\n",
__func__);
goto queue_err;
}
if (bnxt_qplib_queue_full(rq)) {
dev_err(&rq->hwq.pdev->dev,
@ -1378,7 +1607,27 @@ int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
/* Supply the rqe->wr_id index to the wr_id_tbl for now */
rqe->wr_id[0] = cpu_to_le32(sw_prod);
queue_err:
if (sch_handler) {
/* Store the ULP info in the software structures */
sw_prod = HWQ_CMP(rq->hwq.prod, &rq->hwq);
rq->swq[sw_prod].wr_id = wqe->wr_id;
}
rq->hwq.prod++;
if (sch_handler) {
nq_work = kzalloc(sizeof(*nq_work), GFP_ATOMIC);
if (nq_work) {
nq_work->cq = qp->rcq;
nq_work->nq = qp->rcq->nq;
INIT_WORK(&nq_work->work, bnxt_qpn_cqn_sched_task);
queue_work(qp->rcq->nq->cqn_wq, &nq_work->work);
} else {
dev_err(&rq->hwq.pdev->dev,
"QPLIB: FP: Failed to allocate RQ nq_work!");
rc = -ENOMEM;
}
}
done:
return rc;
}
@ -1471,6 +1720,9 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
cq->dbr_base = res->dpi_tbl.dbr_bar_reg_iomem;
cq->period = BNXT_QPLIB_QUEUE_START_PERIOD;
init_waitqueue_head(&cq->waitq);
INIT_LIST_HEAD(&cq->sqf_head);
INIT_LIST_HEAD(&cq->rqf_head);
spin_lock_init(&cq->compl_lock);
bnxt_qplib_arm_cq_enable(cq);
return 0;
@ -1513,9 +1765,13 @@ static int __flush_sq(struct bnxt_qplib_q *sq, struct bnxt_qplib_qp *qp,
while (*budget) {
sw_cons = HWQ_CMP(sq->hwq.cons, &sq->hwq);
if (sw_cons == sw_prod) {
sq->flush_in_progress = false;
break;
}
/* Skip the FENCE WQE completions */
if (sq->swq[sw_cons].wr_id == BNXT_QPLIB_FENCE_WRID) {
bnxt_qplib_cancel_phantom_processing(qp);
goto skip_compl;
}
memset(cqe, 0, sizeof(*cqe));
cqe->status = CQ_REQ_STATUS_WORK_REQUEST_FLUSHED_ERR;
cqe->opcode = CQ_BASE_CQE_TYPE_REQ;
@ -1525,6 +1781,7 @@ static int __flush_sq(struct bnxt_qplib_q *sq, struct bnxt_qplib_qp *qp,
cqe->type = sq->swq[sw_cons].type;
cqe++;
(*budget)--;
skip_compl:
sq->hwq.cons++;
}
*pcqe = cqe;
@ -1536,11 +1793,24 @@ static int __flush_sq(struct bnxt_qplib_q *sq, struct bnxt_qplib_qp *qp,
}
static int __flush_rq(struct bnxt_qplib_q *rq, struct bnxt_qplib_qp *qp,
int opcode, struct bnxt_qplib_cqe **pcqe, int *budget)
struct bnxt_qplib_cqe **pcqe, int *budget)
{
struct bnxt_qplib_cqe *cqe;
u32 sw_prod, sw_cons;
int rc = 0;
int opcode = 0;
switch (qp->type) {
case CMDQ_CREATE_QP1_TYPE_GSI:
opcode = CQ_BASE_CQE_TYPE_RES_RAWETH_QP1;
break;
case CMDQ_CREATE_QP_TYPE_RC:
opcode = CQ_BASE_CQE_TYPE_RES_RC;
break;
case CMDQ_CREATE_QP_TYPE_UD:
opcode = CQ_BASE_CQE_TYPE_RES_UD;
break;
}
/* Flush the rest of the RQ */
sw_prod = HWQ_CMP(rq->hwq.prod, &rq->hwq);
@ -1567,6 +1837,21 @@ static int __flush_rq(struct bnxt_qplib_q *rq, struct bnxt_qplib_qp *qp,
return rc;
}
void bnxt_qplib_mark_qp_error(void *qp_handle)
{
struct bnxt_qplib_qp *qp = qp_handle;
if (!qp)
return;
/* Must block new posting of SQ and RQ */
qp->state = CMDQ_MODIFY_QP_NEW_STATE_ERR;
bnxt_qplib_cancel_phantom_processing(qp);
/* Add qp to flush list of the CQ */
__bnxt_qplib_add_flush_qp(qp);
}
/* Note: SQE is valid from sw_sq_cons up to cqe_sq_cons (exclusive)
* CQE is track from sw_cq_cons to max_element but valid only if VALID=1
*/
@ -1694,10 +1979,12 @@ static int bnxt_qplib_cq_process_req(struct bnxt_qplib_cq *cq,
cqe_sq_cons, sq->hwq.max_elements);
return -EINVAL;
}
/* If we were in the middle of flushing the SQ, continue */
if (sq->flush_in_progress)
goto flush;
if (qp->sq.flushed) {
dev_dbg(&cq->hwq.pdev->dev,
"%s: QPLIB: QP in Flush QP = %p\n", __func__, qp);
goto done;
}
/* Require to walk the sq's swq to fabricate CQEs for all previously
* signaled SWQEs due to CQE aggregation from the current sq cons
* to the cqe_sq_cons
@ -1733,11 +2020,9 @@ static int bnxt_qplib_cq_process_req(struct bnxt_qplib_cq *cq,
sw_sq_cons, cqe->wr_id, cqe->status);
cqe++;
(*budget)--;
sq->flush_in_progress = true;
/* Must block new posting of SQ and RQ */
qp->state = CMDQ_MODIFY_QP_NEW_STATE_ERR;
sq->condition = false;
sq->single = false;
bnxt_qplib_lock_buddy_cq(qp, cq);
bnxt_qplib_mark_qp_error(qp);
bnxt_qplib_unlock_buddy_cq(qp, cq);
} else {
if (swq->flags & SQ_SEND_FLAGS_SIGNAL_COMP) {
/* Before we complete, do WA 9060 */
@ -1768,15 +2053,6 @@ out:
* the WC for this CQE
*/
sq->single = false;
if (!sq->flush_in_progress)
goto done;
flush:
/* Require to walk the sq's swq to fabricate CQEs for all
* previously posted SWQEs due to the error CQE received
*/
rc = __flush_sq(sq, qp, pcqe, budget);
if (!rc)
sq->flush_in_progress = false;
done:
return rc;
}
@ -1798,6 +2074,12 @@ static int bnxt_qplib_cq_process_res_rc(struct bnxt_qplib_cq *cq,
dev_err(&cq->hwq.pdev->dev, "QPLIB: process_cq RC qp is NULL");
return -EINVAL;
}
if (qp->rq.flushed) {
dev_dbg(&cq->hwq.pdev->dev,
"%s: QPLIB: QP in Flush QP = %p\n", __func__, qp);
goto done;
}
cqe = *pcqe;
cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
cqe->length = le32_to_cpu(hwcqe->length);
@ -1817,8 +2099,6 @@ static int bnxt_qplib_cq_process_res_rc(struct bnxt_qplib_cq *cq,
wr_id_idx, rq->hwq.max_elements);
return -EINVAL;
}
if (rq->flush_in_progress)
goto flush_rq;
cqe->wr_id = rq->swq[wr_id_idx].wr_id;
cqe++;
@ -1827,12 +2107,13 @@ static int bnxt_qplib_cq_process_res_rc(struct bnxt_qplib_cq *cq,
*pcqe = cqe;
if (hwcqe->status != CQ_RES_RC_STATUS_OK) {
rq->flush_in_progress = true;
flush_rq:
rc = __flush_rq(rq, qp, CQ_BASE_CQE_TYPE_RES_RC, pcqe, budget);
if (!rc)
rq->flush_in_progress = false;
/* Add qp to flush list of the CQ */
bnxt_qplib_lock_buddy_cq(qp, cq);
__bnxt_qplib_add_flush_qp(qp);
bnxt_qplib_unlock_buddy_cq(qp, cq);
}
done:
return rc;
}
@ -1853,6 +2134,11 @@ static int bnxt_qplib_cq_process_res_ud(struct bnxt_qplib_cq *cq,
dev_err(&cq->hwq.pdev->dev, "QPLIB: process_cq UD qp is NULL");
return -EINVAL;
}
if (qp->rq.flushed) {
dev_dbg(&cq->hwq.pdev->dev,
"%s: QPLIB: QP in Flush QP = %p\n", __func__, qp);
goto done;
}
cqe = *pcqe;
cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
cqe->length = le32_to_cpu(hwcqe->length);
@ -1876,8 +2162,6 @@ static int bnxt_qplib_cq_process_res_ud(struct bnxt_qplib_cq *cq,
wr_id_idx, rq->hwq.max_elements);
return -EINVAL;
}
if (rq->flush_in_progress)
goto flush_rq;
cqe->wr_id = rq->swq[wr_id_idx].wr_id;
cqe++;
@ -1886,12 +2170,12 @@ static int bnxt_qplib_cq_process_res_ud(struct bnxt_qplib_cq *cq,
*pcqe = cqe;
if (hwcqe->status != CQ_RES_RC_STATUS_OK) {
rq->flush_in_progress = true;
flush_rq:
rc = __flush_rq(rq, qp, CQ_BASE_CQE_TYPE_RES_UD, pcqe, budget);
if (!rc)
rq->flush_in_progress = false;
/* Add qp to flush list of the CQ */
bnxt_qplib_lock_buddy_cq(qp, cq);
__bnxt_qplib_add_flush_qp(qp);
bnxt_qplib_unlock_buddy_cq(qp, cq);
}
done:
return rc;
}
@ -1932,6 +2216,11 @@ static int bnxt_qplib_cq_process_res_raweth_qp1(struct bnxt_qplib_cq *cq,
"QPLIB: process_cq Raw/QP1 qp is NULL");
return -EINVAL;
}
if (qp->rq.flushed) {
dev_dbg(&cq->hwq.pdev->dev,
"%s: QPLIB: QP in Flush QP = %p\n", __func__, qp);
goto done;
}
cqe = *pcqe;
cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
cqe->flags = le16_to_cpu(hwcqe->flags);
@ -1960,8 +2249,6 @@ static int bnxt_qplib_cq_process_res_raweth_qp1(struct bnxt_qplib_cq *cq,
wr_id_idx, rq->hwq.max_elements);
return -EINVAL;
}
if (rq->flush_in_progress)
goto flush_rq;
cqe->wr_id = rq->swq[wr_id_idx].wr_id;
cqe++;
@ -1970,13 +2257,13 @@ static int bnxt_qplib_cq_process_res_raweth_qp1(struct bnxt_qplib_cq *cq,
*pcqe = cqe;
if (hwcqe->status != CQ_RES_RC_STATUS_OK) {
rq->flush_in_progress = true;
flush_rq:
rc = __flush_rq(rq, qp, CQ_BASE_CQE_TYPE_RES_RAWETH_QP1, pcqe,
budget);
if (!rc)
rq->flush_in_progress = false;
/* Add qp to flush list of the CQ */
bnxt_qplib_lock_buddy_cq(qp, cq);
__bnxt_qplib_add_flush_qp(qp);
bnxt_qplib_unlock_buddy_cq(qp, cq);
}
done:
return rc;
}
@ -1990,7 +2277,6 @@ static int bnxt_qplib_cq_process_terminal(struct bnxt_qplib_cq *cq,
struct bnxt_qplib_cqe *cqe;
u32 sw_cons = 0, cqe_cons;
int rc = 0;
u8 opcode = 0;
/* Check the Status */
if (hwcqe->status != CQ_TERMINAL_STATUS_OK)
@ -2005,6 +2291,7 @@ static int bnxt_qplib_cq_process_terminal(struct bnxt_qplib_cq *cq,
"QPLIB: FP: CQ Process terminal qp is NULL");
return -EINVAL;
}
/* Must block new posting of SQ and RQ */
qp->state = CMDQ_MODIFY_QP_NEW_STATE_ERR;
@ -2023,9 +2310,12 @@ static int bnxt_qplib_cq_process_terminal(struct bnxt_qplib_cq *cq,
cqe_cons, sq->hwq.max_elements);
goto do_rq;
}
/* If we were in the middle of flushing, continue */
if (sq->flush_in_progress)
goto flush_sq;
if (qp->sq.flushed) {
dev_dbg(&cq->hwq.pdev->dev,
"%s: QPLIB: QP in Flush QP = %p\n", __func__, qp);
goto sq_done;
}
/* Terminal CQE can also include aggregated successful CQEs prior.
* So we must complete all CQEs from the current sq's cons to the
@ -2055,11 +2345,6 @@ static int bnxt_qplib_cq_process_terminal(struct bnxt_qplib_cq *cq,
rc = -EAGAIN;
goto sq_done;
}
sq->flush_in_progress = true;
flush_sq:
rc = __flush_sq(sq, qp, pcqe, budget);
if (!rc)
sq->flush_in_progress = false;
sq_done:
if (rc)
return rc;
@ -2075,26 +2360,23 @@ do_rq:
cqe_cons, rq->hwq.max_elements);
goto done;
}
if (qp->rq.flushed) {
dev_dbg(&cq->hwq.pdev->dev,
"%s: QPLIB: QP in Flush QP = %p\n", __func__, qp);
rc = 0;
goto done;
}
/* Terminal CQE requires all posted RQEs to complete with FLUSHED_ERR
* from the current rq->cons to the rq->prod regardless what the
* rq->cons the terminal CQE indicates
*/
rq->flush_in_progress = true;
switch (qp->type) {
case CMDQ_CREATE_QP1_TYPE_GSI:
opcode = CQ_BASE_CQE_TYPE_RES_RAWETH_QP1;
break;
case CMDQ_CREATE_QP_TYPE_RC:
opcode = CQ_BASE_CQE_TYPE_RES_RC;
break;
case CMDQ_CREATE_QP_TYPE_UD:
opcode = CQ_BASE_CQE_TYPE_RES_UD;
break;
}
rc = __flush_rq(rq, qp, opcode, pcqe, budget);
if (!rc)
rq->flush_in_progress = false;
/* Add qp to flush list of the CQ */
bnxt_qplib_lock_buddy_cq(qp, cq);
__bnxt_qplib_add_flush_qp(qp);
bnxt_qplib_unlock_buddy_cq(qp, cq);
done:
return rc;
}
@ -2115,6 +2397,33 @@ static int bnxt_qplib_cq_process_cutoff(struct bnxt_qplib_cq *cq,
return 0;
}
int bnxt_qplib_process_flush_list(struct bnxt_qplib_cq *cq,
struct bnxt_qplib_cqe *cqe,
int num_cqes)
{
struct bnxt_qplib_qp *qp = NULL;
u32 budget = num_cqes;
unsigned long flags;
spin_lock_irqsave(&cq->hwq.lock, flags);
list_for_each_entry(qp, &cq->sqf_head, sq_flush) {
dev_dbg(&cq->hwq.pdev->dev,
"QPLIB: FP: Flushing SQ QP= %p",
qp);
__flush_sq(&qp->sq, qp, &cqe, &budget);
}
list_for_each_entry(qp, &cq->rqf_head, rq_flush) {
dev_dbg(&cq->hwq.pdev->dev,
"QPLIB: FP: Flushing RQ QP= %p",
qp);
__flush_rq(&qp->rq, qp, &cqe, &budget);
}
spin_unlock_irqrestore(&cq->hwq.lock, flags);
return num_cqes - budget;
}
int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
int num_cqes, struct bnxt_qplib_qp **lib_qp)
{
@ -2205,6 +2514,7 @@ void bnxt_qplib_req_notify_cq(struct bnxt_qplib_cq *cq, u32 arm_type)
spin_lock_irqsave(&cq->hwq.lock, flags);
if (arm_type)
bnxt_qplib_arm_cq(cq, arm_type);
/* Using cq->arm_state variable to track whether to issue cq handler */
atomic_set(&cq->arm_state, 1);
spin_unlock_irqrestore(&cq->hwq.lock, flags);
}

View File

@ -220,19 +220,20 @@ struct bnxt_qplib_q {
u16 q_full_delta;
u16 max_sge;
u32 psn;
bool flush_in_progress;
bool condition;
bool single;
bool send_phantom;
u32 phantom_wqe_cnt;
u32 phantom_cqe_cnt;
u32 next_cq_cons;
bool flushed;
};
struct bnxt_qplib_qp {
struct bnxt_qplib_pd *pd;
struct bnxt_qplib_dpi *dpi;
u64 qp_handle;
#define BNXT_QPLIB_QP_ID_INVALID 0xFFFFFFFF
u32 id;
u8 type;
u8 sig_type;
@ -296,6 +297,8 @@ struct bnxt_qplib_qp {
dma_addr_t sq_hdr_buf_map;
void *rq_hdr_buf;
dma_addr_t rq_hdr_buf_map;
struct list_head sq_flush;
struct list_head rq_flush;
};
#define BNXT_QPLIB_MAX_CQE_ENTRY_SIZE sizeof(struct cq_base)
@ -351,6 +354,7 @@ struct bnxt_qplib_cq {
u16 period;
struct bnxt_qplib_hwq hwq;
u32 cnq_hw_ring_id;
struct bnxt_qplib_nq *nq;
bool resize_in_progress;
struct scatterlist *sghead;
u32 nmap;
@ -360,6 +364,9 @@ struct bnxt_qplib_cq {
unsigned long flags;
#define CQ_FLAGS_RESIZE_IN_PROG 1
wait_queue_head_t waitq;
struct list_head sqf_head, rqf_head;
atomic_t arm_state;
spinlock_t compl_lock; /* synch CQ handlers */
};
#define BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE sizeof(struct xrrq_irrq)
@ -400,6 +407,7 @@ struct bnxt_qplib_nq {
struct pci_dev *pdev;
int vector;
cpumask_t mask;
int budget;
bool requested;
struct tasklet_struct worker;
@ -417,11 +425,19 @@ struct bnxt_qplib_nq {
(struct bnxt_qplib_nq *nq,
void *srq,
u8 event);
struct workqueue_struct *cqn_wq;
char name[32];
};
struct bnxt_qplib_nq_work {
struct work_struct work;
struct bnxt_qplib_nq *nq;
struct bnxt_qplib_cq *cq;
};
void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq);
int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
int msix_vector, int bar_reg_offset,
int nq_idx, int msix_vector, int bar_reg_offset,
int (*cqn_handler)(struct bnxt_qplib_nq *nq,
struct bnxt_qplib_cq *cq),
int (*srqn_handler)(struct bnxt_qplib_nq *nq,
@ -453,4 +469,13 @@ bool bnxt_qplib_is_cq_empty(struct bnxt_qplib_cq *cq);
void bnxt_qplib_req_notify_cq(struct bnxt_qplib_cq *cq, u32 arm_type);
void bnxt_qplib_free_nq(struct bnxt_qplib_nq *nq);
int bnxt_qplib_alloc_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq);
void bnxt_qplib_add_flush_qp(struct bnxt_qplib_qp *qp);
void bnxt_qplib_del_flush_qp(struct bnxt_qplib_qp *qp);
void bnxt_qplib_acquire_cq_locks(struct bnxt_qplib_qp *qp,
unsigned long *flags);
void bnxt_qplib_release_cq_locks(struct bnxt_qplib_qp *qp,
unsigned long *flags);
int bnxt_qplib_process_flush_list(struct bnxt_qplib_cq *cq,
struct bnxt_qplib_cqe *cqe,
int num_cqes);
#endif /* __BNXT_QPLIB_FP_H__ */

View File

@ -44,6 +44,9 @@
#include "roce_hsi.h"
#include "qplib_res.h"
#include "qplib_rcfw.h"
#include "qplib_sp.h"
#include "qplib_fp.h"
static void bnxt_qplib_service_creq(unsigned long data);
/* Hardware communication channel */
@ -279,16 +282,29 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
struct creq_qp_event *qp_event)
{
struct bnxt_qplib_hwq *cmdq = &rcfw->cmdq;
struct creq_qp_error_notification *err_event;
struct bnxt_qplib_crsq *crsqe;
unsigned long flags;
struct bnxt_qplib_qp *qp;
u16 cbit, blocked = 0;
u16 cookie;
__le16 mcookie;
u32 qp_id;
switch (qp_event->event) {
case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
err_event = (struct creq_qp_error_notification *)qp_event;
qp_id = le32_to_cpu(err_event->xid);
qp = rcfw->qp_tbl[qp_id].qp_handle;
dev_dbg(&rcfw->pdev->dev,
"QPLIB: Received QP error notification");
dev_dbg(&rcfw->pdev->dev,
"QPLIB: qpid 0x%x, req_err=0x%x, resp_err=0x%x\n",
qp_id, err_event->req_err_state_reason,
err_event->res_err_state_reason);
bnxt_qplib_acquire_cq_locks(qp, &flags);
bnxt_qplib_mark_qp_error(qp);
bnxt_qplib_release_cq_locks(qp, &flags);
break;
default:
/* Command Response */
@ -507,6 +523,7 @@ skip_ctx_setup:
void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw)
{
kfree(rcfw->qp_tbl);
kfree(rcfw->crsqe_tbl);
bnxt_qplib_free_hwq(rcfw->pdev, &rcfw->cmdq);
bnxt_qplib_free_hwq(rcfw->pdev, &rcfw->creq);
@ -514,7 +531,8 @@ void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw)
}
int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
struct bnxt_qplib_rcfw *rcfw)
struct bnxt_qplib_rcfw *rcfw,
int qp_tbl_sz)
{
rcfw->pdev = pdev;
rcfw->creq.max_elements = BNXT_QPLIB_CREQE_MAX_CNT;
@ -541,6 +559,12 @@ int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
if (!rcfw->crsqe_tbl)
goto fail;
rcfw->qp_tbl_size = qp_tbl_sz;
rcfw->qp_tbl = kcalloc(qp_tbl_sz, sizeof(struct bnxt_qplib_qp_node),
GFP_KERNEL);
if (!rcfw->qp_tbl)
goto fail;
return 0;
fail:

View File

@ -148,6 +148,11 @@ struct bnxt_qplib_rcfw_sbuf {
u32 size;
};
struct bnxt_qplib_qp_node {
u32 qp_id; /* QP id */
void *qp_handle; /* ptr to qplib_qp */
};
/* RCFW Communication Channels */
struct bnxt_qplib_rcfw {
struct pci_dev *pdev;
@ -181,11 +186,13 @@ struct bnxt_qplib_rcfw {
/* Actual Cmd and Resp Queues */
struct bnxt_qplib_hwq cmdq;
struct bnxt_qplib_crsq *crsqe_tbl;
int qp_tbl_size;
struct bnxt_qplib_qp_node *qp_tbl;
};
void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw);
int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
struct bnxt_qplib_rcfw *rcfw);
struct bnxt_qplib_rcfw *rcfw, int qp_tbl_sz);
void bnxt_qplib_disable_rcfw_channel(struct bnxt_qplib_rcfw *rcfw);
int bnxt_qplib_enable_rcfw_channel(struct pci_dev *pdev,
struct bnxt_qplib_rcfw *rcfw,
@ -207,4 +214,5 @@ int bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
int bnxt_qplib_deinit_rcfw(struct bnxt_qplib_rcfw *rcfw);
int bnxt_qplib_init_rcfw(struct bnxt_qplib_rcfw *rcfw,
struct bnxt_qplib_ctx *ctx, int is_virtfn);
void bnxt_qplib_mark_qp_error(void *qp_handle);
#endif /* __BNXT_QPLIB_RCFW_H__ */

View File

@ -468,9 +468,11 @@ static void bnxt_qplib_free_sgid_tbl(struct bnxt_qplib_res *res,
kfree(sgid_tbl->tbl);
kfree(sgid_tbl->hw_id);
kfree(sgid_tbl->ctx);
kfree(sgid_tbl->vlan);
sgid_tbl->tbl = NULL;
sgid_tbl->hw_id = NULL;
sgid_tbl->ctx = NULL;
sgid_tbl->vlan = NULL;
sgid_tbl->max = 0;
sgid_tbl->active = 0;
}
@ -491,8 +493,15 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
if (!sgid_tbl->ctx)
goto out_free2;
sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
if (!sgid_tbl->vlan)
goto out_free3;
sgid_tbl->max = max;
return 0;
out_free3:
kfree(sgid_tbl->ctx);
sgid_tbl->ctx = NULL;
out_free2:
kfree(sgid_tbl->hw_id);
sgid_tbl->hw_id = NULL;
@ -514,6 +523,7 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct bnxt_qplib_res *res,
}
memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
memset(sgid_tbl->vlan, 0, sizeof(u8) * sgid_tbl->max);
sgid_tbl->active = 0;
}

View File

@ -116,6 +116,7 @@ struct bnxt_qplib_sgid_tbl {
u16 max;
u16 active;
void *ctx;
u8 *vlan;
};
struct bnxt_qplib_pkey_tbl {
@ -188,6 +189,7 @@ struct bnxt_qplib_res {
struct bnxt_qplib_sgid_tbl sgid_tbl;
struct bnxt_qplib_pkey_tbl pkey_tbl;
struct bnxt_qplib_dpi_tbl dpi_tbl;
bool prio;
};
#define to_bnxt_qplib(ptr, type, member) \

View File

@ -213,6 +213,7 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
}
memcpy(&sgid_tbl->tbl[index], &bnxt_qplib_gid_zero,
sizeof(bnxt_qplib_gid_zero));
sgid_tbl->vlan[index] = 0;
sgid_tbl->active--;
dev_dbg(&res->pdev->dev,
"QPLIB: SGID deleted hw_id[0x%x] = 0x%x active = 0x%x",
@ -265,28 +266,32 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
struct cmdq_add_gid req;
struct creq_add_gid_resp resp;
u16 cmd_flags = 0;
u32 temp32[4];
u16 temp16[3];
int rc;
RCFW_CMD_PREP(req, ADD_GID, cmd_flags);
memcpy(temp32, gid->data, sizeof(struct bnxt_qplib_gid));
req.gid[0] = cpu_to_be32(temp32[3]);
req.gid[1] = cpu_to_be32(temp32[2]);
req.gid[2] = cpu_to_be32(temp32[1]);
req.gid[3] = cpu_to_be32(temp32[0]);
if (vlan_id != 0xFFFF)
req.vlan = cpu_to_le16((vlan_id &
CMDQ_ADD_GID_VLAN_VLAN_ID_MASK) |
CMDQ_ADD_GID_VLAN_TPID_TPID_8100 |
CMDQ_ADD_GID_VLAN_VLAN_EN);
req.gid[0] = cpu_to_be32(((u32 *)gid->data)[3]);
req.gid[1] = cpu_to_be32(((u32 *)gid->data)[2]);
req.gid[2] = cpu_to_be32(((u32 *)gid->data)[1]);
req.gid[3] = cpu_to_be32(((u32 *)gid->data)[0]);
/*
* driver should ensure that all RoCE traffic is always VLAN
* tagged if RoCE traffic is running on non-zero VLAN ID or
* RoCE traffic is running on non-zero Priority.
*/
if ((vlan_id != 0xFFFF) || res->prio) {
if (vlan_id != 0xFFFF)
req.vlan = cpu_to_le16
(vlan_id & CMDQ_ADD_GID_VLAN_VLAN_ID_MASK);
req.vlan |= cpu_to_le16
(CMDQ_ADD_GID_VLAN_TPID_TPID_8100 |
CMDQ_ADD_GID_VLAN_VLAN_EN);
}
/* MAC in network format */
memcpy(temp16, smac, 6);
req.src_mac[0] = cpu_to_be16(temp16[0]);
req.src_mac[1] = cpu_to_be16(temp16[1]);
req.src_mac[2] = cpu_to_be16(temp16[2]);
req.src_mac[0] = cpu_to_be16(((u16 *)smac)[0]);
req.src_mac[1] = cpu_to_be16(((u16 *)smac)[1]);
req.src_mac[2] = cpu_to_be16(((u16 *)smac)[2]);
rc = bnxt_qplib_rcfw_send_message(rcfw, (void *)&req,
(void *)&resp, NULL, 0);
@ -297,6 +302,9 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
/* Add GID to the sgid_tbl */
memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
sgid_tbl->active++;
if (vlan_id != 0xFFFF)
sgid_tbl->vlan[free_idx] = 1;
dev_dbg(&res->pdev->dev,
"QPLIB: SGID added hw_id[0x%x] = 0x%x active = 0x%x",
free_idx, sgid_tbl->hw_id[free_idx], sgid_tbl->active);
@ -306,6 +314,43 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
return 0;
}
int bnxt_qplib_update_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
struct bnxt_qplib_gid *gid, u16 gid_idx,
u8 *smac)
{
struct bnxt_qplib_res *res = to_bnxt_qplib(sgid_tbl,
struct bnxt_qplib_res,
sgid_tbl);
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
struct creq_modify_gid_resp resp;
struct cmdq_modify_gid req;
int rc;
u16 cmd_flags = 0;
RCFW_CMD_PREP(req, MODIFY_GID, cmd_flags);
req.gid[0] = cpu_to_be32(((u32 *)gid->data)[3]);
req.gid[1] = cpu_to_be32(((u32 *)gid->data)[2]);
req.gid[2] = cpu_to_be32(((u32 *)gid->data)[1]);
req.gid[3] = cpu_to_be32(((u32 *)gid->data)[0]);
if (res->prio) {
req.vlan |= cpu_to_le16
(CMDQ_ADD_GID_VLAN_TPID_TPID_8100 |
CMDQ_ADD_GID_VLAN_VLAN_EN);
}
/* MAC in network format */
req.src_mac[0] = cpu_to_be16(((u16 *)smac)[0]);
req.src_mac[1] = cpu_to_be16(((u16 *)smac)[1]);
req.src_mac[2] = cpu_to_be16(((u16 *)smac)[2]);
req.gid_index = cpu_to_le16(gid_idx);
rc = bnxt_qplib_rcfw_send_message(rcfw, (void *)&req,
(void *)&resp, NULL, 0);
return rc;
}
/* pkeys */
int bnxt_qplib_get_pkey(struct bnxt_qplib_res *res,
struct bnxt_qplib_pkey_tbl *pkey_tbl, u16 index,

View File

@ -135,6 +135,8 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
struct bnxt_qplib_gid *gid, u8 *mac, u16 vlan_id,
bool update, u32 *index);
int bnxt_qplib_update_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
struct bnxt_qplib_gid *gid, u16 gid_idx, u8 *smac);
int bnxt_qplib_get_pkey(struct bnxt_qplib_res *res,
struct bnxt_qplib_pkey_tbl *pkey_tbl, u16 index,
u16 *pkey);

View File

@ -1473,8 +1473,8 @@ struct cmdq_modify_gid {
u8 resp_size;
u8 reserved8;
__le64 resp_addr;
__le32 gid[4];
__le16 src_mac[3];
__be32 gid[4];
__be16 src_mac[3];
__le16 vlan;
#define CMDQ_MODIFY_GID_VLAN_VLAN_ID_MASK 0xfffUL
#define CMDQ_MODIFY_GID_VLAN_VLAN_ID_SFT 0

View File

@ -45,7 +45,6 @@
MODULE_AUTHOR("Boyd Faulkner, Steve Wise");
MODULE_DESCRIPTION("Chelsio T3 RDMA Driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(DRV_VERSION);
static void open_rnic_dev(struct t3cdev *);
static void close_rnic_dev(struct t3cdev *);

View File

@ -1336,8 +1336,7 @@ static int iwch_port_immutable(struct ib_device *ibdev, u8 port_num,
return 0;
}
static void get_dev_fw_ver_str(struct ib_device *ibdev, char *str,
size_t str_len)
static void get_dev_fw_ver_str(struct ib_device *ibdev, char *str)
{
struct iwch_dev *iwch_dev = to_iwch_dev(ibdev);
struct ethtool_drvinfo info;
@ -1345,7 +1344,7 @@ static void get_dev_fw_ver_str(struct ib_device *ibdev, char *str,
pr_debug("%s dev 0x%p\n", __func__, iwch_dev);
lldev->ethtool_ops->get_drvinfo(lldev, &info);
snprintf(str, str_len, "%s", info.fw_version);
snprintf(str, IB_FW_VERSION_NAME_MAX, "%s", info.fw_version);
}
int iwch_register_device(struct iwch_dev *dev)

View File

@ -2871,7 +2871,6 @@ static int close_con_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
return 0;
pr_debug("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
BUG_ON(!ep);
/* The cm_id may be null if we failed to connect */
mutex_lock(&ep->com.mutex);

View File

@ -44,7 +44,6 @@
MODULE_AUTHOR("Steve Wise");
MODULE_DESCRIPTION("Chelsio T4/T5 RDMA Driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(DRV_VERSION);
static int allow_db_fc_on_t5;
module_param(allow_db_fc_on_t5, int, 0644);

View File

@ -517,14 +517,13 @@ static int c4iw_port_immutable(struct ib_device *ibdev, u8 port_num,
return 0;
}
static void get_dev_fw_str(struct ib_device *dev, char *str,
size_t str_len)
static void get_dev_fw_str(struct ib_device *dev, char *str)
{
struct c4iw_dev *c4iw_dev = container_of(dev, struct c4iw_dev,
ibdev);
pr_debug("%s dev 0x%p\n", __func__, dev);
snprintf(str, str_len, "%u.%u.%u.%u",
snprintf(str, IB_FW_VERSION_NAME_MAX, "%u.%u.%u.%u",
FW_HDR_FW_VER_MAJOR_G(c4iw_dev->rdev.lldi.fw_vers),
FW_HDR_FW_VER_MINOR_G(c4iw_dev->rdev.lldi.fw_vers),
FW_HDR_FW_VER_MICRO_G(c4iw_dev->rdev.lldi.fw_vers),

View File

@ -13,13 +13,6 @@ config HFI1_DEBUG_SDMA_ORDER
---help---
This is a debug flag to test for out of order
sdma completions for unit testing
config HFI1_VERBS_31BIT_PSN
bool "HFI1 enable 31 bit PSN"
depends on INFINIBAND_HFI1
default y
---help---
Setting this enables 31 BIT PSN
For verbs RC/UC
config SDMA_VERBOSITY
bool "Config SDMA Verbosity"
depends on INFINIBAND_HFI1

View File

@ -8,7 +8,7 @@
obj-$(CONFIG_INFINIBAND_HFI1) += hfi1.o
hfi1-y := affinity.o chip.o device.o driver.o efivar.o \
eprom.o file_ops.o firmware.o \
eprom.o exp_rcv.o file_ops.o firmware.o \
init.o intr.o mad.o mmu_rb.o pcie.o pio.o pio_copy.o platform.o \
qp.o qsfp.o rc.o ruc.o sdma.o sysfs.o trace.o \
uc.o ud.o user_exp_rcv.o user_pages.o user_sdma.o verbs.o \

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -335,10 +335,10 @@ static void hfi1_update_sdma_affinity(struct hfi1_msix_entry *msix, int cpu)
sde->cpu = cpu;
cpumask_clear(&msix->mask);
cpumask_set_cpu(cpu, &msix->mask);
dd_dev_dbg(dd, "IRQ vector: %u, type %s engine %u -> cpu: %d\n",
msix->msix.vector, irq_type_names[msix->type],
dd_dev_dbg(dd, "IRQ: %u, type %s engine %u -> cpu: %d\n",
msix->irq, irq_type_names[msix->type],
sde->this_idx, cpu);
irq_set_affinity_hint(msix->msix.vector, &msix->mask);
irq_set_affinity_hint(msix->irq, &msix->mask);
/*
* Set the new cpu in the hfi1_affinity_node and clean
@ -387,7 +387,7 @@ static void hfi1_setup_sdma_notifier(struct hfi1_msix_entry *msix)
{
struct irq_affinity_notify *notify = &msix->notify;
notify->irq = msix->msix.vector;
notify->irq = msix->irq;
notify->notify = hfi1_irq_notifier_notify;
notify->release = hfi1_irq_notifier_release;
@ -472,10 +472,10 @@ static int get_irq_affinity(struct hfi1_devdata *dd,
}
cpumask_set_cpu(cpu, &msix->mask);
dd_dev_info(dd, "IRQ vector: %u, type %s %s -> cpu: %d\n",
msix->msix.vector, irq_type_names[msix->type],
dd_dev_info(dd, "IRQ: %u, type %s %s -> cpu: %d\n",
msix->irq, irq_type_names[msix->type],
extra, cpu);
irq_set_affinity_hint(msix->msix.vector, &msix->mask);
irq_set_affinity_hint(msix->irq, &msix->mask);
if (msix->type == IRQ_SDMA) {
sde->cpu = cpu;
@ -533,7 +533,7 @@ void hfi1_put_irq_affinity(struct hfi1_devdata *dd,
}
}
irq_set_affinity_hint(msix->msix.vector, NULL);
irq_set_affinity_hint(msix->irq, NULL);
cpumask_clear(&msix->mask);
mutex_unlock(&node_affinity.lock);
}

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -75,24 +75,26 @@ struct hfi1_msix_entry;
/* Initialize non-HT cpu cores mask */
void init_real_cpu_mask(void);
/* Initialize driver affinity data */
int hfi1_dev_affinity_init(struct hfi1_devdata *);
int hfi1_dev_affinity_init(struct hfi1_devdata *dd);
/*
* Set IRQ affinity to a CPU. The function will determine the
* CPU and set the affinity to it.
*/
int hfi1_get_irq_affinity(struct hfi1_devdata *, struct hfi1_msix_entry *);
int hfi1_get_irq_affinity(struct hfi1_devdata *dd,
struct hfi1_msix_entry *msix);
/*
* Remove the IRQ's CPU affinity. This function also updates
* any internal CPU tracking data
*/
void hfi1_put_irq_affinity(struct hfi1_devdata *, struct hfi1_msix_entry *);
void hfi1_put_irq_affinity(struct hfi1_devdata *dd,
struct hfi1_msix_entry *msix);
/*
* Determine a CPU affinity for a user process, if the process does not
* have an affinity set yet.
*/
int hfi1_get_proc_affinity(int);
int hfi1_get_proc_affinity(int node);
/* Release a CPU used by a user process. */
void hfi1_put_proc_affinity(int);
void hfi1_put_proc_affinity(int cpu);
struct hfi1_affinity_node {
int node;

View File

@ -237,14 +237,17 @@ static inline void aspm_disable_all(struct hfi1_devdata *dd)
{
struct hfi1_ctxtdata *rcd;
unsigned long flags;
unsigned i;
u16 i;
for (i = 0; i < dd->first_dyn_alloc_ctxt; i++) {
rcd = dd->rcd[i];
del_timer_sync(&rcd->aspm_timer);
spin_lock_irqsave(&rcd->aspm_lock, flags);
rcd->aspm_intr_enable = false;
spin_unlock_irqrestore(&rcd->aspm_lock, flags);
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd) {
del_timer_sync(&rcd->aspm_timer);
spin_lock_irqsave(&rcd->aspm_lock, flags);
rcd->aspm_intr_enable = false;
spin_unlock_irqrestore(&rcd->aspm_lock, flags);
hfi1_rcd_put(rcd);
}
}
aspm_disable(dd);
@ -256,7 +259,7 @@ static inline void aspm_enable_all(struct hfi1_devdata *dd)
{
struct hfi1_ctxtdata *rcd;
unsigned long flags;
unsigned i;
u16 i;
aspm_enable(dd);
@ -264,11 +267,14 @@ static inline void aspm_enable_all(struct hfi1_devdata *dd)
return;
for (i = 0; i < dd->first_dyn_alloc_ctxt; i++) {
rcd = dd->rcd[i];
spin_lock_irqsave(&rcd->aspm_lock, flags);
rcd->aspm_intr_enable = true;
rcd->aspm_enabled = true;
spin_unlock_irqrestore(&rcd->aspm_lock, flags);
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd) {
spin_lock_irqsave(&rcd->aspm_lock, flags);
rcd->aspm_intr_enable = true;
rcd->aspm_enabled = true;
spin_unlock_irqrestore(&rcd->aspm_lock, flags);
hfi1_rcd_put(rcd);
}
}
}
@ -284,13 +290,18 @@ static inline void aspm_ctx_init(struct hfi1_ctxtdata *rcd)
static inline void aspm_init(struct hfi1_devdata *dd)
{
unsigned i;
struct hfi1_ctxtdata *rcd;
u16 i;
spin_lock_init(&dd->aspm_lock);
dd->aspm_supported = aspm_hw_l1_supported(dd);
for (i = 0; i < dd->first_dyn_alloc_ctxt; i++)
aspm_ctx_init(dd->rcd[i]);
for (i = 0; i < dd->first_dyn_alloc_ctxt; i++) {
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd)
aspm_ctx_init(rcd);
hfi1_rcd_put(rcd);
}
/* Start with ASPM disabled */
aspm_hw_set_l1_ent_latency(dd);

File diff suppressed because it is too large Load Diff

View File

@ -384,6 +384,7 @@
#define VERIFY_CAP_LOCAL_FABRIC 0x08
#define VERIFY_CAP_LOCAL_LINK_WIDTH 0x09
#define LOCAL_DEVICE_ID 0x0a
#define RESERVED_REGISTERS 0x0b
#define LOCAL_LNI_INFO 0x0c
#define REMOTE_LNI_INFO 0x0d
#define MISC_STATUS 0x0e
@ -506,6 +507,9 @@
#define DOWN_REMOTE_REASON_SHIFT 16
#define DOWN_REMOTE_REASON_MASK 0xff
#define HOST_INTERFACE_VERSION_SHIFT 16
#define HOST_INTERFACE_VERSION_MASK 0xff
/* verify capability PHY power management bits */
#define PWRM_BER_CONTROL 0x1
#define PWRM_BANDWIDTH_CONTROL 0x2
@ -605,11 +609,11 @@ int read_lcb_csr(struct hfi1_devdata *dd, u32 offset, u64 *data);
int write_lcb_csr(struct hfi1_devdata *dd, u32 offset, u64 data);
void __iomem *get_csr_addr(
struct hfi1_devdata *dd,
const struct hfi1_devdata *dd,
u32 offset);
static inline void __iomem *get_kctxt_csr_addr(
struct hfi1_devdata *dd,
const struct hfi1_devdata *dd,
int ctxt,
u32 offset0)
{
@ -644,7 +648,6 @@ u64 create_pbc(struct hfi1_pportdata *ppd, u64 flags, int srate_mbs, u32 vl,
#define NUM_PCIE_SERDES 16 /* number of PCIe serdes on the SBus */
extern const u8 pcie_serdes_broadcast[];
extern const u8 pcie_pcs_addrs[2][NUM_PCIE_SERDES];
extern uint platform_config_load;
/* SBus commands */
#define RESET_SBUS_RECEIVER 0x20
@ -704,6 +707,7 @@ int read_8051_data(struct hfi1_devdata *dd, u32 addr, u32 len, u64 *result);
/* chip.c */
void read_misc_status(struct hfi1_devdata *dd, u8 *ver_major, u8 *ver_minor,
u8 *ver_patch);
int write_host_interface_version(struct hfi1_devdata *dd, u8 version);
void read_guid(struct hfi1_devdata *dd);
int wait_fm_ready(struct hfi1_devdata *dd, u32 mstimeout);
void set_link_down_reason(struct hfi1_pportdata *ppd, u8 lcl_reason,
@ -743,11 +747,10 @@ int is_ax(struct hfi1_devdata *dd);
int is_bx(struct hfi1_devdata *dd);
u32 read_physical_state(struct hfi1_devdata *dd);
u32 chip_to_opa_pstate(struct hfi1_devdata *dd, u32 chip_pstate);
u32 get_logical_state(struct hfi1_pportdata *ppd);
const char *opa_lstate_name(u32 lstate);
const char *opa_pstate_name(u32 pstate);
u32 driver_physical_state(struct hfi1_pportdata *ppd);
u32 driver_logical_state(struct hfi1_pportdata *ppd);
u32 driver_pstate(struct hfi1_pportdata *ppd);
u32 driver_lstate(struct hfi1_pportdata *ppd);
int acquire_lcb_access(struct hfi1_devdata *dd, int sleep_ok);
int release_lcb_access(struct hfi1_devdata *dd, int sleep_ok);
@ -1347,21 +1350,21 @@ enum {
u64 get_all_cpu_total(u64 __percpu *cntr);
void hfi1_start_cleanup(struct hfi1_devdata *dd);
void hfi1_clear_tids(struct hfi1_ctxtdata *rcd);
struct ib_header *hfi1_get_msgheader(
struct hfi1_devdata *dd, __le32 *rhf_addr);
void hfi1_init_ctxt(struct send_context *sc);
void hfi1_put_tid(struct hfi1_devdata *dd, u32 index,
u32 type, unsigned long pa, u16 order);
void hfi1_quiet_serdes(struct hfi1_pportdata *ppd);
void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int op, int ctxt);
void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int op,
struct hfi1_ctxtdata *rcd);
u32 hfi1_read_cntrs(struct hfi1_devdata *dd, char **namep, u64 **cntrp);
u32 hfi1_read_portcntrs(struct hfi1_pportdata *ppd, char **namep, u64 **cntrp);
u8 hfi1_ibphys_portstate(struct hfi1_pportdata *ppd);
int hfi1_get_ib_cfg(struct hfi1_pportdata *ppd, int which);
int hfi1_set_ib_cfg(struct hfi1_pportdata *ppd, int which, u32 val);
int hfi1_set_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt, u16 jkey);
int hfi1_clear_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt);
int hfi1_set_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt, u16 pkey);
int hfi1_set_ctxt_jkey(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd,
u16 jkey);
int hfi1_clear_ctxt_jkey(struct hfi1_devdata *dd, struct hfi1_ctxtdata *ctxt);
int hfi1_set_ctxt_pkey(struct hfi1_devdata *dd, struct hfi1_ctxtdata *ctxt,
u16 pkey);
int hfi1_clear_ctxt_pkey(struct hfi1_devdata *dd, struct hfi1_ctxtdata *ctxt);
void hfi1_read_link_quality(struct hfi1_devdata *dd, u8 *link_quality);
void hfi1_init_vnic_rsm(struct hfi1_devdata *dd);

View File

@ -325,22 +325,15 @@ struct diag_pkt {
#define HFI1_LRH_BTH 0x0002 /* 1. word of IB LRH - next header: BTH */
/* misc. */
#define SC15_PACKET 0xF
#define SIZE_OF_CRC 1
#define SIZE_OF_LT 1
#define LIM_MGMT_P_KEY 0x7FFF
#define FULL_MGMT_P_KEY 0xFFFF
#define DEFAULT_P_KEY LIM_MGMT_P_KEY
/**
* 0xF8 - 4 bits of multicast range and 1 bit for collective range
* Example: For 24 bit LID space,
* Multicast range: 0xF00000 to 0xF7FFFF
* Collective range: 0xF80000 to 0xFFFFFE
*/
#define HFI1_MCAST_NR 0x4 /* Number of top bits set */
#define HFI1_COLLECTIVE_NR 0x1 /* Number of bits after MCAST_NR */
#define HFI1_PSM_IOC_BASE_SEQ 0x0
static inline __u64 rhf_to_cpu(const __le32 *rbuf)

View File

@ -1,4 +1,3 @@
#ifdef CONFIG_DEBUG_FS
/*
* Copyright(c) 2015-2017 Intel Corporation.
*
@ -173,12 +172,15 @@ static int _opcode_stats_seq_show(struct seq_file *s, void *v)
u64 n_packets = 0, n_bytes = 0;
struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
struct hfi1_devdata *dd = dd_from_dev(ibd);
struct hfi1_ctxtdata *rcd;
for (j = 0; j < dd->first_dyn_alloc_ctxt; j++) {
if (!dd->rcd[j])
continue;
n_packets += dd->rcd[j]->opstats->stats[i].n_packets;
n_bytes += dd->rcd[j]->opstats->stats[i].n_bytes;
rcd = hfi1_rcd_get_by_index(dd, j);
if (rcd) {
n_packets += rcd->opstats->stats[i].n_packets;
n_bytes += rcd->opstats->stats[i].n_bytes;
}
hfi1_rcd_put(rcd);
}
if (!n_packets && !n_bytes)
return SEQ_SKIP;
@ -231,6 +233,7 @@ static int _ctx_stats_seq_show(struct seq_file *s, void *v)
u64 n_packets = 0;
struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
struct hfi1_devdata *dd = dd_from_dev(ibd);
struct hfi1_ctxtdata *rcd;
if (v == SEQ_START_TOKEN) {
seq_puts(s, "Ctx:npkts\n");
@ -240,11 +243,14 @@ static int _ctx_stats_seq_show(struct seq_file *s, void *v)
spos = v;
i = *spos;
if (!dd->rcd[i])
rcd = hfi1_rcd_get_by_index(dd, i);
if (!rcd)
return SEQ_SKIP;
for (j = 0; j < ARRAY_SIZE(dd->rcd[i]->opstats->stats); j++)
n_packets += dd->rcd[i]->opstats->stats[j].n_packets;
for (j = 0; j < ARRAY_SIZE(rcd->opstats->stats); j++)
n_packets += rcd->opstats->stats[j].n_packets;
hfi1_rcd_put(rcd);
if (!n_packets)
return SEQ_SKIP;
@ -260,10 +266,10 @@ DEBUGFS_FILE_OPS(ctx_stats);
static void *_qp_stats_seq_start(struct seq_file *s, loff_t *pos)
__acquires(RCU)
{
struct qp_iter *iter;
struct rvt_qp_iter *iter;
loff_t n = *pos;
iter = qp_iter_init(s->private);
iter = rvt_qp_iter_init(s->private, 0, NULL);
/* stop calls rcu_read_unlock */
rcu_read_lock();
@ -272,7 +278,7 @@ static void *_qp_stats_seq_start(struct seq_file *s, loff_t *pos)
return NULL;
do {
if (qp_iter_next(iter)) {
if (rvt_qp_iter_next(iter)) {
kfree(iter);
return NULL;
}
@ -285,11 +291,11 @@ static void *_qp_stats_seq_next(struct seq_file *s, void *iter_ptr,
loff_t *pos)
__must_hold(RCU)
{
struct qp_iter *iter = iter_ptr;
struct rvt_qp_iter *iter = iter_ptr;
(*pos)++;
if (qp_iter_next(iter)) {
if (rvt_qp_iter_next(iter)) {
kfree(iter);
return NULL;
}
@ -305,7 +311,7 @@ static void _qp_stats_seq_stop(struct seq_file *s, void *iter_ptr)
static int _qp_stats_seq_show(struct seq_file *s, void *iter_ptr)
{
struct qp_iter *iter = iter_ptr;
struct rvt_qp_iter *iter = iter_ptr;
if (!iter)
return 0;
@ -361,6 +367,52 @@ DEBUGFS_SEQ_FILE_OPS(sdes);
DEBUGFS_SEQ_FILE_OPEN(sdes)
DEBUGFS_FILE_OPS(sdes);
static void *_rcds_seq_start(struct seq_file *s, loff_t *pos)
{
struct hfi1_ibdev *ibd;
struct hfi1_devdata *dd;
ibd = (struct hfi1_ibdev *)s->private;
dd = dd_from_dev(ibd);
if (!dd->rcd || *pos >= dd->n_krcv_queues)
return NULL;
return pos;
}
static void *_rcds_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
struct hfi1_devdata *dd = dd_from_dev(ibd);
++*pos;
if (!dd->rcd || *pos >= dd->n_krcv_queues)
return NULL;
return pos;
}
static void _rcds_seq_stop(struct seq_file *s, void *v)
{
}
static int _rcds_seq_show(struct seq_file *s, void *v)
{
struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
struct hfi1_devdata *dd = dd_from_dev(ibd);
struct hfi1_ctxtdata *rcd;
loff_t *spos = v;
loff_t i = *spos;
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd)
seqfile_dump_rcd(s, rcd);
hfi1_rcd_put(rcd);
return 0;
}
DEBUGFS_SEQ_FILE_OPS(rcds);
DEBUGFS_SEQ_FILE_OPEN(rcds)
DEBUGFS_FILE_OPS(rcds);
/* read the per-device counters */
static ssize_t dev_counters_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@ -1098,12 +1150,15 @@ static int _fault_stats_seq_show(struct seq_file *s, void *v)
u64 n_packets = 0, n_bytes = 0;
struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
struct hfi1_devdata *dd = dd_from_dev(ibd);
struct hfi1_ctxtdata *rcd;
for (j = 0; j < dd->first_dyn_alloc_ctxt; j++) {
if (!dd->rcd[j])
continue;
n_packets += dd->rcd[j]->opstats->stats[i].n_packets;
n_bytes += dd->rcd[j]->opstats->stats[i].n_bytes;
rcd = hfi1_rcd_get_by_index(dd, j);
if (rcd) {
n_packets += rcd->opstats->stats[i].n_packets;
n_bytes += rcd->opstats->stats[i].n_bytes;
}
hfi1_rcd_put(rcd);
}
if (!n_packets && !n_bytes)
return SEQ_SKIP;
@ -1311,6 +1366,7 @@ void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
DEBUGFS_SEQ_FILE_CREATE(ctx_stats, ibd->hfi1_ibdev_dbg, ibd);
DEBUGFS_SEQ_FILE_CREATE(qp_stats, ibd->hfi1_ibdev_dbg, ibd);
DEBUGFS_SEQ_FILE_CREATE(sdes, ibd->hfi1_ibdev_dbg, ibd);
DEBUGFS_SEQ_FILE_CREATE(rcds, ibd->hfi1_ibdev_dbg, ibd);
DEBUGFS_SEQ_FILE_CREATE(sdma_cpu_list, ibd->hfi1_ibdev_dbg, ibd);
/* dev counter files */
for (i = 0; i < ARRAY_SIZE(cntr_ops); i++)
@ -1478,5 +1534,3 @@ void hfi1_dbg_exit(void)
debugfs_remove_recursive(hfi1_dbg_root);
hfi1_dbg_root = NULL;
}
#endif

View File

@ -96,7 +96,6 @@ MODULE_PARM_DESC(cap_mask, "Bit mask of enabled/disabled HW features");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_DESCRIPTION("Intel Omni-Path Architecture driver");
MODULE_VERSION(HFI1_DRIVER_VERSION);
/*
* MAX_PKT_RCV is the max # if packets processed per receive interrupt.
@ -196,7 +195,7 @@ int hfi1_count_active_units(void)
spin_lock_irqsave(&hfi1_devs_lock, flags);
list_for_each_entry(dd, &hfi1_dev_list, list) {
if (!(dd->flags & HFI1_PRESENT) || !dd->kregbase)
if (!(dd->flags & HFI1_PRESENT) || !dd->kregbase1)
continue;
for (pidx = 0; pidx < dd->num_pports; ++pidx) {
ppd = dd->pport + pidx;
@ -224,6 +223,27 @@ static inline void *get_egrbuf(const struct hfi1_ctxtdata *rcd, u64 rhf,
(offset * RCV_BUF_BLOCK_SIZE));
}
static inline void *hfi1_get_header(struct hfi1_devdata *dd,
__le32 *rhf_addr)
{
u32 offset = rhf_hdrq_offset(rhf_to_cpu(rhf_addr));
return (void *)(rhf_addr - dd->rhf_offset + offset);
}
static inline struct ib_header *hfi1_get_msgheader(struct hfi1_devdata *dd,
__le32 *rhf_addr)
{
return (struct ib_header *)hfi1_get_header(dd, rhf_addr);
}
static inline struct hfi1_16b_header
*hfi1_get_16B_header(struct hfi1_devdata *dd,
__le32 *rhf_addr)
{
return (struct hfi1_16b_header *)hfi1_get_header(dd, rhf_addr);
}
/*
* Validate and encode the a given RcvArray Buffer size.
* The function will check whether the given size falls within
@ -249,7 +269,7 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
{
struct ib_header *rhdr = packet->hdr;
u32 rte = rhf_rcv_type_err(packet->rhf);
int lnh = ib_get_lnh(rhdr);
u32 mlid_base;
struct hfi1_ibport *ibp = rcd_to_iport(rcd);
struct hfi1_devdata *dd = ppd->dd;
struct rvt_dev_info *rdi = &dd->verbs_dev.rdi;
@ -257,37 +277,47 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
if (packet->rhf & (RHF_VCRC_ERR | RHF_ICRC_ERR))
return;
if (packet->etype == RHF_RCV_TYPE_BYPASS) {
goto drop;
} else {
u8 lnh = ib_get_lnh(rhdr);
mlid_base = be16_to_cpu(IB_MULTICAST_LID_BASE);
if (lnh == HFI1_LRH_BTH) {
packet->ohdr = &rhdr->u.oth;
} else if (lnh == HFI1_LRH_GRH) {
packet->ohdr = &rhdr->u.l.oth;
packet->grh = &rhdr->u.l.grh;
} else {
goto drop;
}
}
if (packet->rhf & RHF_TID_ERR) {
/* For TIDERR and RC QPs preemptively schedule a NAK */
struct ib_other_headers *ohdr = NULL;
u32 tlen = rhf_pkt_len(packet->rhf); /* in bytes */
u16 lid = ib_get_dlid(rhdr);
u32 dlid = ib_get_dlid(rhdr);
u32 qp_num;
u32 rcv_flags = 0;
/* Sanity check packet */
if (tlen < 24)
goto drop;
/* Check for GRH */
if (lnh == HFI1_LRH_BTH) {
ohdr = &rhdr->u.oth;
} else if (lnh == HFI1_LRH_GRH) {
if (packet->grh) {
u32 vtf;
struct ib_grh *grh = packet->grh;
ohdr = &rhdr->u.l.oth;
if (rhdr->u.l.grh.next_hdr != IB_GRH_NEXT_HDR)
if (grh->next_hdr != IB_GRH_NEXT_HDR)
goto drop;
vtf = be32_to_cpu(rhdr->u.l.grh.version_tclass_flow);
vtf = be32_to_cpu(grh->version_tclass_flow);
if ((vtf >> IB_GRH_VERSION_SHIFT) != IB_GRH_VERSION)
goto drop;
rcv_flags |= HFI1_HAS_GRH;
} else {
goto drop;
}
/* Get the destination QP number. */
qp_num = be32_to_cpu(ohdr->bth[1]) & RVT_QPN_MASK;
if (lid < be16_to_cpu(IB_MULTICAST_LID_BASE)) {
qp_num = ib_bth_get_qpn(packet->ohdr);
if (dlid < mlid_base) {
struct rvt_qp *qp;
unsigned long flags;
@ -312,11 +342,7 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
switch (qp->ibqp.qp_type) {
case IB_QPT_RC:
hfi1_rc_hdrerr(
rcd,
rhdr,
rcv_flags,
qp);
hfi1_rc_hdrerr(rcd, packet, qp);
break;
default:
/* For now don't handle any other QP types */
@ -332,9 +358,8 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
switch (rte) {
case RHF_RTE_ERROR_OP_CODE_ERR:
{
u32 opcode;
void *ebuf = NULL;
__be32 *bth = NULL;
u8 opcode;
if (rhf_use_egr_bfr(packet->rhf))
ebuf = packet->ebuf;
@ -342,16 +367,7 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
if (!ebuf)
goto drop; /* this should never happen */
if (lnh == HFI1_LRH_BTH)
bth = (__be32 *)ebuf;
else if (lnh == HFI1_LRH_GRH)
bth = (__be32 *)((char *)ebuf + sizeof(struct ib_grh));
else
goto drop;
opcode = be32_to_cpu(bth[0]) >> 24;
opcode &= 0xff;
opcode = ib_bth_get_opcode(packet->ohdr);
if (opcode == IB_OPCODE_CNP) {
/*
* Only in pre-B0 h/w is the CNP_OPCODE handled
@ -365,7 +381,7 @@ static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
sc5 = hfi1_9B_get_sc5(rhdr, packet->rhf);
sl = ibp->sc_to_sl[sc5];
lqpn = be32_to_cpu(bth[1]) & RVT_QPN_MASK;
lqpn = ib_bth_get_qpn(packet->ohdr);
rcu_read_lock();
qp = rvt_lookup_qpn(rdi, &ibp->rvp, lqpn);
if (!qp) {
@ -415,33 +431,39 @@ static inline void init_packet(struct hfi1_ctxtdata *rcd,
packet->rhf = rhf_to_cpu(packet->rhf_addr);
packet->rhqoff = rcd->head;
packet->numpkt = 0;
packet->rcv_flags = 0;
}
void hfi1_process_ecn_slowpath(struct rvt_qp *qp, struct hfi1_packet *pkt,
bool do_cnp)
{
struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
struct ib_header *hdr = pkt->hdr;
struct ib_other_headers *ohdr = pkt->ohdr;
struct ib_grh *grh = NULL;
struct ib_grh *grh = pkt->grh;
u32 rqpn = 0, bth1;
u16 rlid, dlid = ib_get_dlid(hdr);
u8 sc, svc_type;
u16 pkey, rlid, dlid = ib_get_dlid(pkt->hdr);
u8 hdr_type, sc, svc_type;
bool is_mcast = false;
if (pkt->rcv_flags & HFI1_HAS_GRH)
grh = &hdr->u.l.grh;
if (pkt->etype == RHF_RCV_TYPE_BYPASS) {
is_mcast = hfi1_is_16B_mcast(dlid);
pkey = hfi1_16B_get_pkey(pkt->hdr);
sc = hfi1_16B_get_sc(pkt->hdr);
hdr_type = HFI1_PKT_TYPE_16B;
} else {
is_mcast = (dlid > be16_to_cpu(IB_MULTICAST_LID_BASE)) &&
(dlid != be16_to_cpu(IB_LID_PERMISSIVE));
pkey = ib_bth_get_pkey(ohdr);
sc = hfi1_9B_get_sc5(pkt->hdr, pkt->rhf);
hdr_type = HFI1_PKT_TYPE_9B;
}
switch (qp->ibqp.qp_type) {
case IB_QPT_SMI:
case IB_QPT_GSI:
case IB_QPT_UD:
rlid = ib_get_slid(hdr);
rqpn = be32_to_cpu(ohdr->u.ud.deth[1]) & RVT_QPN_MASK;
rlid = ib_get_slid(pkt->hdr);
rqpn = ib_get_sqpn(pkt->ohdr);
svc_type = IB_CC_SVCTYPE_UD;
is_mcast = (dlid > be16_to_cpu(IB_MULTICAST_LID_BASE)) &&
(dlid != be16_to_cpu(IB_LID_PERMISSIVE));
break;
case IB_QPT_UC:
rlid = rdma_ah_get_dlid(&qp->remote_ah_attr);
@ -457,14 +479,11 @@ void hfi1_process_ecn_slowpath(struct rvt_qp *qp, struct hfi1_packet *pkt,
return;
}
sc = hfi1_9B_get_sc5(hdr, pkt->rhf);
bth1 = be32_to_cpu(ohdr->bth[1]);
if (do_cnp && (bth1 & IB_FECN_SMASK)) {
u16 pkey = (u16)be32_to_cpu(ohdr->bth[0]);
return_cnp(ibp, qp, rqpn, pkey, dlid, rlid, sc, grh);
}
/* Call appropriate CNP handler */
if (do_cnp && (bth1 & IB_FECN_SMASK))
hfi1_handle_cnp_tbl[hdr_type](ibp, qp, rqpn, pkey,
dlid, rlid, sc, grh);
if (!is_mcast && (bth1 & IB_BECN_SMASK)) {
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
@ -591,9 +610,10 @@ static void __prescan_rxq(struct hfi1_packet *packet)
if (lnh == HFI1_LRH_BTH) {
packet->ohdr = &hdr->u.oth;
packet->grh = NULL;
} else if (lnh == HFI1_LRH_GRH) {
packet->ohdr = &hdr->u.l.oth;
packet->rcv_flags |= HFI1_HAS_GRH;
packet->grh = &hdr->u.l.grh;
} else {
goto next; /* just in case */
}
@ -698,10 +718,8 @@ static inline int process_rcv_packet(struct hfi1_packet *packet, int thread)
{
int ret;
packet->hdr = hfi1_get_msgheader(packet->rcd->dd,
packet->rhf_addr);
packet->hlen = (u8 *)packet->rhf_addr - (u8 *)packet->hdr;
packet->etype = rhf_rcv_type(packet->rhf);
/* total length */
packet->tlen = rhf_pkt_len(packet->rhf); /* in bytes */
/* retrieve eager buffer details */
@ -759,7 +777,7 @@ static inline void process_rcv_update(int last, struct hfi1_packet *packet)
packet->etail, 0, 0);
packet->updegr = 0;
}
packet->rcv_flags = 0;
packet->grh = NULL;
}
static inline void finish_packet(struct hfi1_packet *packet)
@ -837,9 +855,10 @@ bail:
return last;
}
static inline void set_nodma_rtail(struct hfi1_devdata *dd, u8 ctxt)
static inline void set_nodma_rtail(struct hfi1_devdata *dd, u16 ctxt)
{
int i;
struct hfi1_ctxtdata *rcd;
u16 i;
/*
* For dynamically allocated kernel contexts (like vnic) switch
@ -847,19 +866,28 @@ static inline void set_nodma_rtail(struct hfi1_devdata *dd, u8 ctxt)
* interrupt handler for all statically allocated kernel contexts.
*/
if (ctxt >= dd->first_dyn_alloc_ctxt) {
dd->rcd[ctxt]->do_interrupt =
&handle_receive_interrupt_nodma_rtail;
rcd = hfi1_rcd_get_by_index(dd, ctxt);
if (rcd) {
rcd->do_interrupt =
&handle_receive_interrupt_nodma_rtail;
hfi1_rcd_put(rcd);
}
return;
}
for (i = HFI1_CTRL_CTXT + 1; i < dd->first_dyn_alloc_ctxt; i++)
dd->rcd[i]->do_interrupt =
&handle_receive_interrupt_nodma_rtail;
for (i = HFI1_CTRL_CTXT + 1; i < dd->first_dyn_alloc_ctxt; i++) {
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd)
rcd->do_interrupt =
&handle_receive_interrupt_nodma_rtail;
hfi1_rcd_put(rcd);
}
}
static inline void set_dma_rtail(struct hfi1_devdata *dd, u8 ctxt)
static inline void set_dma_rtail(struct hfi1_devdata *dd, u16 ctxt)
{
int i;
struct hfi1_ctxtdata *rcd;
u16 i;
/*
* For dynamically allocated kernel contexts (like vnic) switch
@ -867,27 +895,39 @@ static inline void set_dma_rtail(struct hfi1_devdata *dd, u8 ctxt)
* interrupt handler for all statically allocated kernel contexts.
*/
if (ctxt >= dd->first_dyn_alloc_ctxt) {
dd->rcd[ctxt]->do_interrupt =
&handle_receive_interrupt_dma_rtail;
rcd = hfi1_rcd_get_by_index(dd, ctxt);
if (rcd) {
rcd->do_interrupt =
&handle_receive_interrupt_dma_rtail;
hfi1_rcd_put(rcd);
}
return;
}
for (i = HFI1_CTRL_CTXT + 1; i < dd->first_dyn_alloc_ctxt; i++)
dd->rcd[i]->do_interrupt =
&handle_receive_interrupt_dma_rtail;
for (i = HFI1_CTRL_CTXT + 1; i < dd->first_dyn_alloc_ctxt; i++) {
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd)
rcd->do_interrupt =
&handle_receive_interrupt_dma_rtail;
hfi1_rcd_put(rcd);
}
}
void set_all_slowpath(struct hfi1_devdata *dd)
{
int i;
struct hfi1_ctxtdata *rcd;
u16 i;
/* HFI1_CTRL_CTXT must always use the slow path interrupt handler */
for (i = HFI1_CTRL_CTXT + 1; i < dd->num_rcv_contexts; i++) {
struct hfi1_ctxtdata *rcd = dd->rcd[i];
rcd = hfi1_rcd_get_by_index(dd, i);
if (!rcd)
continue;
if ((i < dd->first_dyn_alloc_ctxt) ||
(rcd && rcd->sc && (rcd->sc->type == SC_KERNEL)))
(rcd->sc && (rcd->sc->type == SC_KERNEL))) {
rcd->do_interrupt = &handle_receive_interrupt;
}
hfi1_rcd_put(rcd);
}
}
@ -896,20 +936,30 @@ static inline int set_armed_to_active(struct hfi1_ctxtdata *rcd,
struct hfi1_devdata *dd)
{
struct work_struct *lsaw = &rcd->ppd->linkstate_active_work;
struct ib_header *hdr = hfi1_get_msgheader(packet->rcd->dd,
packet->rhf_addr);
u8 etype = rhf_rcv_type(packet->rhf);
u8 sc = SC15_PACKET;
if (etype == RHF_RCV_TYPE_IB &&
hfi1_9B_get_sc5(hdr, packet->rhf) != 0xf) {
int hwstate = read_logical_state(dd);
if (etype == RHF_RCV_TYPE_IB) {
struct ib_header *hdr = hfi1_get_msgheader(packet->rcd->dd,
packet->rhf_addr);
sc = hfi1_9B_get_sc5(hdr, packet->rhf);
} else if (etype == RHF_RCV_TYPE_BYPASS) {
struct hfi1_16b_header *hdr = hfi1_get_16B_header(
packet->rcd->dd,
packet->rhf_addr);
sc = hfi1_16B_get_sc(hdr);
}
if (sc != SC15_PACKET) {
int hwstate = driver_lstate(rcd->ppd);
if (hwstate != LSTATE_ACTIVE) {
dd_dev_info(dd, "Unexpected link state %d\n", hwstate);
if (hwstate != IB_PORT_ACTIVE) {
dd_dev_info(dd,
"Unexpected link state %s\n",
opa_lstate_name(hwstate));
return 0;
}
queue_work(rcd->ppd->hfi1_wq, lsaw);
queue_work(rcd->ppd->link_wq, lsaw);
return 1;
}
return 0;
@ -1063,7 +1113,8 @@ void receive_interrupt_work(struct work_struct *work)
struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
linkstate_active_work);
struct hfi1_devdata *dd = ppd->dd;
int i;
struct hfi1_ctxtdata *rcd;
u16 i;
/* Received non-SC15 packet implies neighbor_normal */
ppd->neighbor_normal = 1;
@ -1073,8 +1124,12 @@ void receive_interrupt_work(struct work_struct *work)
* Interrupt all statically allocated kernel contexts that could
* have had an interrupt during auto activation.
*/
for (i = HFI1_CTRL_CTXT; i < dd->first_dyn_alloc_ctxt; i++)
force_recv_intr(dd->rcd[i]);
for (i = HFI1_CTRL_CTXT; i < dd->first_dyn_alloc_ctxt; i++) {
rcd = hfi1_rcd_get_by_index(dd, i);
if (rcd)
force_recv_intr(rcd);
hfi1_rcd_put(rcd);
}
}
/*
@ -1264,10 +1319,9 @@ void hfi1_start_led_override(struct hfi1_pportdata *ppd, unsigned int timeon,
*/
int hfi1_reset_device(int unit)
{
int ret, i;
int ret;
struct hfi1_devdata *dd = hfi1_lookup(unit);
struct hfi1_pportdata *ppd;
unsigned long flags;
int pidx;
if (!dd) {
@ -1277,7 +1331,7 @@ int hfi1_reset_device(int unit)
dd_dev_info(dd, "Reset on unit %u requested\n", unit);
if (!dd->kregbase || !(dd->flags & HFI1_PRESENT)) {
if (!dd->kregbase1 || !(dd->flags & HFI1_PRESENT)) {
dd_dev_info(dd,
"Invalid unit number %u or not initialized or not present\n",
unit);
@ -1285,17 +1339,15 @@ int hfi1_reset_device(int unit)
goto bail;
}
spin_lock_irqsave(&dd->uctxt_lock, flags);
/* If there are any user/vnic contexts, we cannot reset */
mutex_lock(&hfi1_mutex);
if (dd->rcd)
for (i = dd->first_dyn_alloc_ctxt;
i < dd->num_rcv_contexts; i++) {
if (!dd->rcd[i])
continue;
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
if (hfi1_stats.sps_ctxts) {
mutex_unlock(&hfi1_mutex);
ret = -EBUSY;
goto bail;
}
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
mutex_unlock(&hfi1_mutex);
for (pidx = 0; pidx < dd->num_pports; ++pidx) {
ppd = dd->pport + pidx;
@ -1321,6 +1373,162 @@ bail:
return ret;
}
static inline void hfi1_setup_ib_header(struct hfi1_packet *packet)
{
packet->hdr = (struct hfi1_ib_message_header *)
hfi1_get_msgheader(packet->rcd->dd,
packet->rhf_addr);
packet->hlen = (u8 *)packet->rhf_addr - (u8 *)packet->hdr;
}
static int hfi1_bypass_ingress_pkt_check(struct hfi1_packet *packet)
{
struct hfi1_pportdata *ppd = packet->rcd->ppd;
/* slid and dlid cannot be 0 */
if ((!packet->slid) || (!packet->dlid))
return -EINVAL;
/* Compare port lid with incoming packet dlid */
if ((!(hfi1_is_16B_mcast(packet->dlid))) &&
(packet->dlid !=
opa_get_lid(be32_to_cpu(OPA_LID_PERMISSIVE), 16B))) {
if (packet->dlid != ppd->lid)
return -EINVAL;
}
/* No multicast packets with SC15 */
if ((hfi1_is_16B_mcast(packet->dlid)) && (packet->sc == 0xF))
return -EINVAL;
/* Packets with permissive DLID always on SC15 */
if ((packet->dlid == opa_get_lid(be32_to_cpu(OPA_LID_PERMISSIVE),
16B)) &&
(packet->sc != 0xF))
return -EINVAL;
return 0;
}
static int hfi1_setup_9B_packet(struct hfi1_packet *packet)
{
struct hfi1_ibport *ibp = rcd_to_iport(packet->rcd);
struct ib_header *hdr;
u8 lnh;
hfi1_setup_ib_header(packet);
hdr = packet->hdr;
lnh = ib_get_lnh(hdr);
if (lnh == HFI1_LRH_BTH) {
packet->ohdr = &hdr->u.oth;
packet->grh = NULL;
} else if (lnh == HFI1_LRH_GRH) {
u32 vtf;
packet->ohdr = &hdr->u.l.oth;
packet->grh = &hdr->u.l.grh;
if (packet->grh->next_hdr != IB_GRH_NEXT_HDR)
goto drop;
vtf = be32_to_cpu(packet->grh->version_tclass_flow);
if ((vtf >> IB_GRH_VERSION_SHIFT) != IB_GRH_VERSION)
goto drop;
} else {
goto drop;
}
/* Query commonly used fields from packet header */
packet->payload = packet->ebuf;
packet->opcode = ib_bth_get_opcode(packet->ohdr);
packet->slid = ib_get_slid(hdr);
packet->dlid = ib_get_dlid(hdr);
if (unlikely((packet->dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE)) &&
(packet->dlid != be16_to_cpu(IB_LID_PERMISSIVE))))
packet->dlid += opa_get_mcast_base(OPA_MCAST_NR) -
be16_to_cpu(IB_MULTICAST_LID_BASE);
packet->sl = ib_get_sl(hdr);
packet->sc = hfi1_9B_get_sc5(hdr, packet->rhf);
packet->pad = ib_bth_get_pad(packet->ohdr);
packet->extra_byte = 0;
packet->fecn = ib_bth_get_fecn(packet->ohdr);
packet->becn = ib_bth_get_becn(packet->ohdr);
return 0;
drop:
ibp->rvp.n_pkt_drops++;
return -EINVAL;
}
static int hfi1_setup_bypass_packet(struct hfi1_packet *packet)
{
/*
* Bypass packets have a different header/payload split
* compared to an IB packet.
* Current split is set such that 16 bytes of the actual
* header is in the header buffer and the remining is in
* the eager buffer. We chose 16 since hfi1 driver only
* supports 16B bypass packets and we will be able to
* receive the entire LRH with such a split.
*/
struct hfi1_ctxtdata *rcd = packet->rcd;
struct hfi1_pportdata *ppd = rcd->ppd;
struct hfi1_ibport *ibp = &ppd->ibport_data;
u8 l4;
u8 grh_len;
packet->hdr = (struct hfi1_16b_header *)
hfi1_get_16B_header(packet->rcd->dd,
packet->rhf_addr);
packet->hlen = (u8 *)packet->rhf_addr - (u8 *)packet->hdr;
l4 = hfi1_16B_get_l4(packet->hdr);
if (l4 == OPA_16B_L4_IB_LOCAL) {
grh_len = 0;
packet->ohdr = packet->ebuf;
packet->grh = NULL;
} else if (l4 == OPA_16B_L4_IB_GLOBAL) {
u32 vtf;
grh_len = sizeof(struct ib_grh);
packet->ohdr = packet->ebuf + grh_len;
packet->grh = packet->ebuf;
if (packet->grh->next_hdr != IB_GRH_NEXT_HDR)
goto drop;
vtf = be32_to_cpu(packet->grh->version_tclass_flow);
if ((vtf >> IB_GRH_VERSION_SHIFT) != IB_GRH_VERSION)
goto drop;
} else {
goto drop;
}
/* Query commonly used fields from packet header */
packet->opcode = ib_bth_get_opcode(packet->ohdr);
packet->hlen = hdr_len_by_opcode[packet->opcode] + 8 + grh_len;
packet->payload = packet->ebuf + packet->hlen - (4 * sizeof(u32));
packet->slid = hfi1_16B_get_slid(packet->hdr);
packet->dlid = hfi1_16B_get_dlid(packet->hdr);
if (unlikely(hfi1_is_16B_mcast(packet->dlid)))
packet->dlid += opa_get_mcast_base(OPA_MCAST_NR) -
opa_get_lid(opa_get_mcast_base(OPA_MCAST_NR),
16B);
packet->sc = hfi1_16B_get_sc(packet->hdr);
packet->sl = ibp->sc_to_sl[packet->sc];
packet->pad = hfi1_16B_bth_get_pad(packet->ohdr);
packet->extra_byte = SIZE_OF_LT;
packet->fecn = hfi1_16B_get_fecn(packet->hdr);
packet->becn = hfi1_16B_get_becn(packet->hdr);
if (hfi1_bypass_ingress_pkt_check(packet))
goto drop;
return 0;
drop:
hfi1_cdbg(PKT, "%s: packet dropped\n", __func__);
ibp->rvp.n_pkt_drops++;
return -EINVAL;
}
void handle_eflags(struct hfi1_packet *packet)
{
struct hfi1_ctxtdata *rcd = packet->rcd;
@ -1351,6 +1559,9 @@ int process_receive_ib(struct hfi1_packet *packet)
if (unlikely(hfi1_dbg_fault_packet(packet)))
return RHF_RCV_CONTINUE;
if (hfi1_setup_9B_packet(packet))
return RHF_RCV_CONTINUE;
trace_hfi1_rcvhdr(packet->rcd->ppd->dd,
packet->rcd->ctxt,
rhf_err_flags(packet->rhf),
@ -1380,8 +1591,8 @@ static inline bool hfi1_is_vnic_packet(struct hfi1_packet *packet)
if (packet->rcd->is_vnic)
return true;
if ((HFI1_GET_L2_TYPE(packet->ebuf) == OPA_VNIC_L2_TYPE) &&
(HFI1_GET_L4_TYPE(packet->ebuf) == OPA_VNIC_L4_ETHR))
if ((hfi1_16B_get_l2(packet->ebuf) == OPA_16B_L2_TYPE) &&
(hfi1_16B_get_l4(packet->ebuf) == OPA_16B_L4_ETHR))
return true;
return false;
@ -1391,25 +1602,38 @@ int process_receive_bypass(struct hfi1_packet *packet)
{
struct hfi1_devdata *dd = packet->rcd->dd;
if (unlikely(rhf_err_flags(packet->rhf))) {
handle_eflags(packet);
} else if (hfi1_is_vnic_packet(packet)) {
if (hfi1_is_vnic_packet(packet)) {
hfi1_vnic_bypass_rcv(packet);
return RHF_RCV_CONTINUE;
}
dd_dev_err(dd, "Unsupported bypass packet. Dropping\n");
incr_cntr64(&dd->sw_rcv_bypass_packet_errors);
if (!(dd->err_info_rcvport.status_and_code & OPA_EI_STATUS_SMASK)) {
u64 *flits = packet->ebuf;
if (hfi1_setup_bypass_packet(packet))
return RHF_RCV_CONTINUE;
if (flits && !(packet->rhf & RHF_LEN_ERR)) {
dd->err_info_rcvport.packet_flit1 = flits[0];
dd->err_info_rcvport.packet_flit2 =
packet->tlen > sizeof(flits[0]) ? flits[1] : 0;
if (unlikely(rhf_err_flags(packet->rhf))) {
handle_eflags(packet);
return RHF_RCV_CONTINUE;
}
if (hfi1_16B_get_l2(packet->hdr) == 0x2) {
hfi1_16B_rcv(packet);
} else {
dd_dev_err(dd,
"Bypass packets other than 16B are not supported in normal operation. Dropping\n");
incr_cntr64(&dd->sw_rcv_bypass_packet_errors);
if (!(dd->err_info_rcvport.status_and_code &
OPA_EI_STATUS_SMASK)) {
u64 *flits = packet->ebuf;
if (flits && !(packet->rhf & RHF_LEN_ERR)) {
dd->err_info_rcvport.packet_flit1 = flits[0];
dd->err_info_rcvport.packet_flit2 =
packet->tlen > sizeof(flits[0]) ?
flits[1] : 0;
}
dd->err_info_rcvport.status_and_code |=
(OPA_EI_STATUS_SMASK | BAD_L2_ERR);
}
dd->err_info_rcvport.status_and_code |=
(OPA_EI_STATUS_SMASK | BAD_L2_ERR);
}
return RHF_RCV_CONTINUE;
}
@ -1422,6 +1646,7 @@ int process_receive_error(struct hfi1_packet *packet)
rhf_rcv_type_err(packet->rhf) == 3))
return RHF_RCV_CONTINUE;
hfi1_setup_ib_header(packet);
handle_eflags(packet);
if (unlikely(rhf_err_flags(packet->rhf)))
@ -1435,6 +1660,8 @@ int kdeth_process_expected(struct hfi1_packet *packet)
{
if (unlikely(hfi1_dbg_fault_packet(packet)))
return RHF_RCV_CONTINUE;
hfi1_setup_ib_header(packet);
if (unlikely(rhf_err_flags(packet->rhf)))
handle_eflags(packet);
@ -1445,6 +1672,7 @@ int kdeth_process_expected(struct hfi1_packet *packet)
int kdeth_process_eager(struct hfi1_packet *packet)
{
hfi1_setup_ib_header(packet);
if (unlikely(rhf_err_flags(packet->rhf)))
handle_eflags(packet);
if (unlikely(hfi1_dbg_fault_packet(packet)))
@ -1461,3 +1689,62 @@ int process_receive_invalid(struct hfi1_packet *packet)
rhf_rcv_type(packet->rhf));
return RHF_RCV_CONTINUE;
}
void seqfile_dump_rcd(struct seq_file *s, struct hfi1_ctxtdata *rcd)
{
struct hfi1_packet packet;
struct ps_mdata mdata;
seq_printf(s, "Rcd %u: RcvHdr cnt %u entsize %u %s head %llu tail %llu\n",
rcd->ctxt, rcd->rcvhdrq_cnt, rcd->rcvhdrqentsize,
HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL) ?
"dma_rtail" : "nodma_rtail",
read_uctxt_csr(rcd->dd, rcd->ctxt, RCV_HDR_HEAD) &
RCV_HDR_HEAD_HEAD_MASK,
read_uctxt_csr(rcd->dd, rcd->ctxt, RCV_HDR_TAIL));
init_packet(rcd, &packet);
init_ps_mdata(&mdata, &packet);
while (1) {
struct hfi1_devdata *dd = rcd->dd;
__le32 *rhf_addr = (__le32 *)rcd->rcvhdrq + mdata.ps_head +
dd->rhf_offset;
struct ib_header *hdr;
u64 rhf = rhf_to_cpu(rhf_addr);
u32 etype = rhf_rcv_type(rhf), qpn;
u8 opcode;
u32 psn;
u8 lnh;
if (ps_done(&mdata, rhf, rcd))
break;
if (ps_skip(&mdata, rhf, rcd))
goto next;
if (etype > RHF_RCV_TYPE_IB)
goto next;
packet.hdr = hfi1_get_msgheader(dd, rhf_addr);
hdr = packet.hdr;
lnh = be16_to_cpu(hdr->lrh[0]) & 3;
if (lnh == HFI1_LRH_BTH)
packet.ohdr = &hdr->u.oth;
else if (lnh == HFI1_LRH_GRH)
packet.ohdr = &hdr->u.l.oth;
else
goto next; /* just in case */
opcode = (be32_to_cpu(packet.ohdr->bth[0]) >> 24);
qpn = be32_to_cpu(packet.ohdr->bth[1]) & RVT_QPN_MASK;
psn = mask_psn(be32_to_cpu(packet.ohdr->bth[2]));
seq_printf(s, "\tEnt %u: opcode 0x%x, qpn 0x%x, psn 0x%x\n",
mdata.ps_head, opcode, qpn, psn);
next:
update_ps_mdata(&mdata, rcd);
}
}

View File

@ -250,7 +250,6 @@ static int read_partition_platform_config(struct hfi1_devdata *dd, void **data,
{
void *buffer;
void *p;
u32 length;
int ret;
buffer = kmalloc(P1_SIZE, GFP_KERNEL);
@ -265,13 +264,13 @@ static int read_partition_platform_config(struct hfi1_devdata *dd, void **data,
/* scan for image magic that may trail the actual data */
p = strnstr(buffer, IMAGE_TRAIL_MAGIC, P1_SIZE);
if (p)
length = p - buffer;
else
length = P1_SIZE;
if (!p) {
kfree(buffer);
return -ENOENT;
}
*data = buffer;
*size = length;
*size = p - buffer;
return 0;
}

View File

@ -0,0 +1,114 @@
/*
* Copyright(c) 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
#include "exp_rcv.h"
#include "trace.h"
/**
* exp_tid_group_init - initialize exp_tid_set
* @set - the set
*/
void hfi1_exp_tid_group_init(struct exp_tid_set *set)
{
INIT_LIST_HEAD(&set->list);
set->count = 0;
}
/**
* alloc_ctxt_rcv_groups - initialize expected receive groups
* @rcd - the context to add the groupings to
*/
int hfi1_alloc_ctxt_rcv_groups(struct hfi1_ctxtdata *rcd)
{
struct hfi1_devdata *dd = rcd->dd;
u32 tidbase;
struct tid_group *grp;
int i;
tidbase = rcd->expected_base;
for (i = 0; i < rcd->expected_count /
dd->rcv_entries.group_size; i++) {
grp = kzalloc(sizeof(*grp), GFP_KERNEL);
if (!grp)
goto bail;
grp->size = dd->rcv_entries.group_size;
grp->base = tidbase;
tid_group_add_tail(grp, &rcd->tid_group_list);
tidbase += dd->rcv_entries.group_size;
}
return 0;
bail:
hfi1_free_ctxt_rcv_groups(rcd);
return -ENOMEM;
}
/**
* free_ctxt_rcv_groups - free expected receive groups
* @rcd - the context to free
*
* The routine dismantles the expect receive linked
* list and clears any tids associated with the receive
* context.
*
* This should only be called for kernel contexts and the
* a base user context.
*/
void hfi1_free_ctxt_rcv_groups(struct hfi1_ctxtdata *rcd)
{
struct tid_group *grp, *gptr;
WARN_ON(!EXP_TID_SET_EMPTY(rcd->tid_full_list));
WARN_ON(!EXP_TID_SET_EMPTY(rcd->tid_used_list));
list_for_each_entry_safe(grp, gptr, &rcd->tid_group_list.list, list) {
tid_group_remove(grp, &rcd->tid_group_list);
kfree(grp);
}
hfi1_clear_tids(rcd);
}

View File

@ -0,0 +1,190 @@
#ifndef _HFI1_EXP_RCV_H
#define _HFI1_EXP_RCV_H
/*
* Copyright(c) 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
#include "hfi.h"
#define EXP_TID_SET_EMPTY(set) (set.count == 0 && list_empty(&set.list))
#define EXP_TID_TIDLEN_MASK 0x7FFULL
#define EXP_TID_TIDLEN_SHIFT 0
#define EXP_TID_TIDCTRL_MASK 0x3ULL
#define EXP_TID_TIDCTRL_SHIFT 20
#define EXP_TID_TIDIDX_MASK 0x3FFULL
#define EXP_TID_TIDIDX_SHIFT 22
#define EXP_TID_GET(tid, field) \
(((tid) >> EXP_TID_TID##field##_SHIFT) & EXP_TID_TID##field##_MASK)
#define EXP_TID_SET(field, value) \
(((value) & EXP_TID_TID##field##_MASK) << \
EXP_TID_TID##field##_SHIFT)
#define EXP_TID_CLEAR(tid, field) ({ \
(tid) &= ~(EXP_TID_TID##field##_MASK << \
EXP_TID_TID##field##_SHIFT); \
})
#define EXP_TID_RESET(tid, field, value) do { \
EXP_TID_CLEAR(tid, field); \
(tid) |= EXP_TID_SET(field, (value)); \
} while (0)
/*
* Define fields in the KDETH header so we can update the header
* template.
*/
#define KDETH_OFFSET_SHIFT 0
#define KDETH_OFFSET_MASK 0x7fff
#define KDETH_OM_SHIFT 15
#define KDETH_OM_MASK 0x1
#define KDETH_TID_SHIFT 16
#define KDETH_TID_MASK 0x3ff
#define KDETH_TIDCTRL_SHIFT 26
#define KDETH_TIDCTRL_MASK 0x3
#define KDETH_INTR_SHIFT 28
#define KDETH_INTR_MASK 0x1
#define KDETH_SH_SHIFT 29
#define KDETH_SH_MASK 0x1
#define KDETH_KVER_SHIFT 30
#define KDETH_KVER_MASK 0x3
#define KDETH_JKEY_SHIFT 0x0
#define KDETH_JKEY_MASK 0xff
#define KDETH_HCRC_UPPER_SHIFT 16
#define KDETH_HCRC_UPPER_MASK 0xff
#define KDETH_HCRC_LOWER_SHIFT 24
#define KDETH_HCRC_LOWER_MASK 0xff
#define KDETH_GET(val, field) \
(((le32_to_cpu((val))) >> KDETH_##field##_SHIFT) & KDETH_##field##_MASK)
#define KDETH_SET(dw, field, val) do { \
u32 dwval = le32_to_cpu(dw); \
dwval &= ~(KDETH_##field##_MASK << KDETH_##field##_SHIFT); \
dwval |= (((val) & KDETH_##field##_MASK) << \
KDETH_##field##_SHIFT); \
dw = cpu_to_le32(dwval); \
} while (0)
#define KDETH_RESET(dw, field, val) ({ dw = 0; KDETH_SET(dw, field, val); })
/* KDETH OM multipliers and switch over point */
#define KDETH_OM_SMALL 4
#define KDETH_OM_SMALL_SHIFT 2
#define KDETH_OM_LARGE 64
#define KDETH_OM_LARGE_SHIFT 6
#define KDETH_OM_MAX_SIZE (1 << ((KDETH_OM_LARGE / KDETH_OM_SMALL) + 1))
struct tid_group {
struct list_head list;
u32 base;
u8 size;
u8 used;
u8 map;
};
/*
* Write an "empty" RcvArray entry.
* This function exists so the TID registaration code can use it
* to write to unused/unneeded entries and still take advantage
* of the WC performance improvements. The HFI will ignore this
* write to the RcvArray entry.
*/
static inline void rcv_array_wc_fill(struct hfi1_devdata *dd, u32 index)
{
/*
* Doing the WC fill writes only makes sense if the device is
* present and the RcvArray has been mapped as WC memory.
*/
if ((dd->flags & HFI1_PRESENT) && dd->rcvarray_wc) {
writeq(0, dd->rcvarray_wc + (index * 8));
if ((index & 3) == 3)
flush_wc();
}
}
static inline void tid_group_add_tail(struct tid_group *grp,
struct exp_tid_set *set)
{
list_add_tail(&grp->list, &set->list);
set->count++;
}
static inline void tid_group_remove(struct tid_group *grp,
struct exp_tid_set *set)
{
list_del_init(&grp->list);
set->count--;
}
static inline void tid_group_move(struct tid_group *group,
struct exp_tid_set *s1,
struct exp_tid_set *s2)
{
tid_group_remove(group, s1);
tid_group_add_tail(group, s2);
}
static inline struct tid_group *tid_group_pop(struct exp_tid_set *set)
{
struct tid_group *grp =
list_first_entry(&set->list, struct tid_group, list);
list_del_init(&grp->list);
set->count--;
return grp;
}
static inline u32 rcventry2tidinfo(u32 rcventry)
{
u32 pair = rcventry & ~0x1;
return EXP_TID_SET(IDX, pair >> 1) |
EXP_TID_SET(CTRL, 1 << (rcventry - pair));
}
int hfi1_alloc_ctxt_rcv_groups(struct hfi1_ctxtdata *rcd);
void hfi1_free_ctxt_rcv_groups(struct hfi1_ctxtdata *rcd);
void hfi1_exp_tid_group_init(struct exp_tid_set *set);
#endif /* _HFI1_EXP_RCV_H */

View File

@ -58,10 +58,10 @@
#include "device.h"
#include "common.h"
#include "trace.h"
#include "mmu_rb.h"
#include "user_sdma.h"
#include "user_exp_rcv.h"
#include "aspm.h"
#include "mmu_rb.h"
#undef pr_fmt
#define pr_fmt(fmt) DRIVER_NAME ": " fmt
@ -79,21 +79,25 @@ static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma);
static u64 kvirt_to_phys(void *addr);
static int assign_ctxt(struct hfi1_filedata *fd, struct hfi1_user_info *uinfo);
static int init_subctxts(struct hfi1_ctxtdata *uctxt,
const struct hfi1_user_info *uinfo);
static int init_user_ctxt(struct hfi1_filedata *fd);
static void init_subctxts(struct hfi1_ctxtdata *uctxt,
const struct hfi1_user_info *uinfo);
static int init_user_ctxt(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt);
static void user_init(struct hfi1_ctxtdata *uctxt);
static int get_ctxt_info(struct hfi1_filedata *fd, void __user *ubase,
__u32 len);
static int get_base_info(struct hfi1_filedata *fd, void __user *ubase,
__u32 len);
static int setup_base_ctxt(struct hfi1_filedata *fd);
static int setup_base_ctxt(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt);
static int setup_subctxt(struct hfi1_ctxtdata *uctxt);
static int find_sub_ctxt(struct hfi1_filedata *fd,
const struct hfi1_user_info *uinfo);
static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
struct hfi1_user_info *uinfo);
struct hfi1_user_info *uinfo,
struct hfi1_ctxtdata **cd);
static void deallocate_ctxt(struct hfi1_ctxtdata *uctxt);
static unsigned int poll_urgent(struct file *fp, struct poll_table_struct *pt);
static unsigned int poll_next(struct file *fp, struct poll_table_struct *pt);
static int user_event_ack(struct hfi1_ctxtdata *uctxt, u16 subctxt,
@ -116,7 +120,7 @@ static const struct file_operations hfi1_file_ops = {
.llseek = noop_llseek,
};
static struct vm_operations_struct vm_ops = {
static const struct vm_operations_struct vm_ops = {
.fault = vma_fault,
};
@ -181,7 +185,7 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
struct hfi1_devdata,
user_cdev);
if (!((dd->flags & HFI1_PRESENT) && dd->kregbase))
if (!((dd->flags & HFI1_PRESENT) && dd->kregbase1))
return -EINVAL;
if (!atomic_inc_not_zero(&dd->user_refcount))
@ -267,12 +271,14 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
/*
* Copy the number of tidlist entries we used
* and the length of the buffer we registered.
* These fields are adjacent in the structure so
* we can copy them at the same time.
*/
addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
sizeof(tinfo.tidcnt) +
sizeof(tinfo.tidcnt)))
return -EFAULT;
addr = arg + offsetof(struct hfi1_tid_info, length);
if (copy_to_user((void __user *)addr, &tinfo.length,
sizeof(tinfo.length)))
ret = -EFAULT;
}
@ -388,8 +394,7 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
sc_disable(sc);
ret = sc_enable(sc);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB,
uctxt->ctxt);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB, uctxt);
} else {
ret = sc_restart(sc);
}
@ -425,8 +430,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
if (!iter_is_iovec(from) || !dim)
return -EINVAL;
hfi1_cdbg(SDMA, "SDMA request from %u:%u (%lu)",
fd->uctxt->ctxt, fd->subctxt, dim);
trace_hfi1_sdma_request(fd->dd, fd->uctxt->ctxt, fd->subctxt, dim);
if (atomic_read(&pq->n_reqs) == pq->n_max_reqs)
return -ENOSPC;
@ -752,12 +756,11 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
if (!uctxt)
goto done;
hfi1_cdbg(PROC, "freeing ctxt %u:%u", uctxt->ctxt, fdata->subctxt);
mutex_lock(&hfi1_mutex);
hfi1_cdbg(PROC, "closing ctxt %u:%u", uctxt->ctxt, fdata->subctxt);
flush_wc();
/* drain user sdma queue */
hfi1_user_sdma_free_queues(fdata);
hfi1_user_sdma_free_queues(fdata, uctxt);
/* release the cpu */
hfi1_put_proc_affinity(fdata->rec_cpu_num);
@ -765,6 +768,13 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
/* clean up rcv side */
hfi1_user_exp_rcv_free(fdata);
/*
* fdata->uctxt is used in the above cleanup. It is not ready to be
* removed until here.
*/
fdata->uctxt = NULL;
hfi1_rcd_put(uctxt);
/*
* Clear any left over, unhandled events so the next process that
* gets this context doesn't get confused.
@ -773,13 +783,14 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
HFI1_MAX_SHARED_CTXTS) + fdata->subctxt;
*ev = 0;
spin_lock_irqsave(&dd->uctxt_lock, flags);
__clear_bit(fdata->subctxt, uctxt->in_use_ctxts);
if (!bitmap_empty(uctxt->in_use_ctxts, HFI1_MAX_SHARED_CTXTS)) {
mutex_unlock(&hfi1_mutex);
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
goto done;
}
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
spin_lock_irqsave(&dd->uctxt_lock, flags);
/*
* Disable receive context and interrupt available, reset all
* RcvCtxtCtrl bits to default values.
@ -790,34 +801,24 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
HFI1_RCVCTRL_TAILUPD_DIS |
HFI1_RCVCTRL_ONE_PKT_EGR_DIS |
HFI1_RCVCTRL_NO_RHQ_DROP_DIS |
HFI1_RCVCTRL_NO_EGR_DROP_DIS, uctxt->ctxt);
HFI1_RCVCTRL_NO_EGR_DROP_DIS, uctxt);
/* Clear the context's J_KEY */
hfi1_clear_ctxt_jkey(dd, uctxt->ctxt);
hfi1_clear_ctxt_jkey(dd, uctxt);
/*
* Reset context integrity checks to default.
* (writes to CSRs probably belong in chip.c)
* If a send context is allocated, reset context integrity
* checks to default and disable the send context.
*/
write_kctxt_csr(dd, uctxt->sc->hw_context, SEND_CTXT_CHECK_ENABLE,
hfi1_pkt_default_send_ctxt_mask(dd, uctxt->sc->type));
sc_disable(uctxt->sc);
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
if (uctxt->sc) {
set_pio_integrity(uctxt->sc);
sc_disable(uctxt->sc);
}
dd->rcd[uctxt->ctxt] = NULL;
hfi1_user_exp_rcv_grp_free(uctxt);
hfi1_free_ctxt_rcv_groups(uctxt);
hfi1_clear_ctxt_pkey(dd, uctxt);
uctxt->rcvwait_to = 0;
uctxt->piowait_to = 0;
uctxt->rcvnowait = 0;
uctxt->pionowait = 0;
uctxt->event_flags = 0;
hfi1_stats.sps_ctxts--;
if (++dd->freectxts == dd->num_user_contexts)
aspm_enable_all(dd);
mutex_unlock(&hfi1_mutex);
hfi1_free_ctxtdata(dd, uctxt);
deallocate_ctxt(uctxt);
done:
mmdrop(fdata->mm);
kobject_put(&dd->kobj);
@ -845,135 +846,211 @@ static u64 kvirt_to_phys(void *addr)
return paddr;
}
static int assign_ctxt(struct hfi1_filedata *fd, struct hfi1_user_info *uinfo)
/**
* complete_subctxt
* @fd: valid filedata pointer
*
* Sub-context info can only be set up after the base context
* has been completed. This is indicated by the clearing of the
* HFI1_CTXT_BASE_UINIT bit.
*
* Wait for the bit to be cleared, and then complete the subcontext
* initialization.
*
*/
static int complete_subctxt(struct hfi1_filedata *fd)
{
int ret;
unsigned int swmajor, swminor;
unsigned long flags;
swmajor = uinfo->userversion >> 16;
if (swmajor != HFI1_USER_SWMAJOR)
return -ENODEV;
swminor = uinfo->userversion & 0xffff;
mutex_lock(&hfi1_mutex);
/*
* Get a sub context if necessary.
* ret < 0 error, 0 no context, 1 sub-context found
* sub-context info can only be set up after the base context
* has been completed.
*/
ret = 0;
if (uinfo->subctxt_cnt) {
ret = find_sub_ctxt(fd, uinfo);
if (ret > 0)
fd->rec_cpu_num =
hfi1_get_proc_affinity(fd->uctxt->numa_id);
ret = wait_event_interruptible(
fd->uctxt->wait,
!test_bit(HFI1_CTXT_BASE_UNINIT, &fd->uctxt->event_flags));
if (test_bit(HFI1_CTXT_BASE_FAILED, &fd->uctxt->event_flags))
ret = -ENOMEM;
/* Finish the sub-context init */
if (!ret) {
fd->rec_cpu_num = hfi1_get_proc_affinity(fd->uctxt->numa_id);
ret = init_user_ctxt(fd, fd->uctxt);
}
/*
* Allocate a base context if context sharing is not required or we
* couldn't find a sub context.
*/
if (!ret)
ret = allocate_ctxt(fd, fd->dd, uinfo);
mutex_unlock(&hfi1_mutex);
/* Depending on the context type, do the appropriate init */
if (ret > 0) {
/*
* sub-context info can only be set up after the base
* context has been completed.
*/
ret = wait_event_interruptible(fd->uctxt->wait, !test_bit(
HFI1_CTXT_BASE_UNINIT,
&fd->uctxt->event_flags));
if (test_bit(HFI1_CTXT_BASE_FAILED, &fd->uctxt->event_flags)) {
clear_bit(fd->subctxt, fd->uctxt->in_use_ctxts);
return -ENOMEM;
}
/* The only thing a sub context needs is the user_xxx stuff */
if (!ret)
ret = init_user_ctxt(fd);
if (ret)
clear_bit(fd->subctxt, fd->uctxt->in_use_ctxts);
} else if (!ret) {
ret = setup_base_ctxt(fd);
if (fd->uctxt->subctxt_cnt) {
/* If there is an error, set the failed bit. */
if (ret)
set_bit(HFI1_CTXT_BASE_FAILED,
&fd->uctxt->event_flags);
/*
* Base context is done, notify anybody using a
* sub-context that is waiting for this completion
*/
clear_bit(HFI1_CTXT_BASE_UNINIT,
&fd->uctxt->event_flags);
wake_up(&fd->uctxt->wait);
}
if (ret) {
hfi1_rcd_put(fd->uctxt);
fd->uctxt = NULL;
spin_lock_irqsave(&fd->dd->uctxt_lock, flags);
__clear_bit(fd->subctxt, fd->uctxt->in_use_ctxts);
spin_unlock_irqrestore(&fd->dd->uctxt_lock, flags);
}
return ret;
}
/*
static int assign_ctxt(struct hfi1_filedata *fd, struct hfi1_user_info *uinfo)
{
int ret;
unsigned int swmajor, swminor;
struct hfi1_ctxtdata *uctxt = NULL;
swmajor = uinfo->userversion >> 16;
if (swmajor != HFI1_USER_SWMAJOR)
return -ENODEV;
if (uinfo->subctxt_cnt > HFI1_MAX_SHARED_CTXTS)
return -EINVAL;
swminor = uinfo->userversion & 0xffff;
/*
* Acquire the mutex to protect against multiple creations of what
* could be a shared base context.
*/
mutex_lock(&hfi1_mutex);
/*
* Get a sub context if available (fd->uctxt will be set).
* ret < 0 error, 0 no context, 1 sub-context found
*/
ret = find_sub_ctxt(fd, uinfo);
/*
* Allocate a base context if context sharing is not required or a
* sub context wasn't found.
*/
if (!ret)
ret = allocate_ctxt(fd, fd->dd, uinfo, &uctxt);
mutex_unlock(&hfi1_mutex);
/* Depending on the context type, finish the appropriate init */
switch (ret) {
case 0:
ret = setup_base_ctxt(fd, uctxt);
if (uctxt->subctxt_cnt) {
/*
* Base context is done (successfully or not), notify
* anybody using a sub-context that is waiting for
* this completion.
*/
clear_bit(HFI1_CTXT_BASE_UNINIT, &uctxt->event_flags);
wake_up(&uctxt->wait);
}
break;
case 1:
ret = complete_subctxt(fd);
break;
default:
break;
}
return ret;
}
/**
* match_ctxt
* @fd: valid filedata pointer
* @uinfo: user info to compare base context with
* @uctxt: context to compare uinfo to.
*
* Compare the given context with the given information to see if it
* can be used for a sub context.
*/
static int match_ctxt(struct hfi1_filedata *fd,
const struct hfi1_user_info *uinfo,
struct hfi1_ctxtdata *uctxt)
{
struct hfi1_devdata *dd = fd->dd;
unsigned long flags;
u16 subctxt;
/* Skip dynamically allocated kernel contexts */
if (uctxt->sc && (uctxt->sc->type == SC_KERNEL))
return 0;
/* Skip ctxt if it doesn't match the requested one */
if (memcmp(uctxt->uuid, uinfo->uuid, sizeof(uctxt->uuid)) ||
uctxt->jkey != generate_jkey(current_uid()) ||
uctxt->subctxt_id != uinfo->subctxt_id ||
uctxt->subctxt_cnt != uinfo->subctxt_cnt)
return 0;
/* Verify the sharing process matches the base */
if (uctxt->userversion != uinfo->userversion)
return -EINVAL;
/* Find an unused sub context */
spin_lock_irqsave(&dd->uctxt_lock, flags);
if (bitmap_empty(uctxt->in_use_ctxts, HFI1_MAX_SHARED_CTXTS)) {
/* context is being closed, do not use */
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
return 0;
}
subctxt = find_first_zero_bit(uctxt->in_use_ctxts,
HFI1_MAX_SHARED_CTXTS);
if (subctxt >= uctxt->subctxt_cnt) {
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
return -EBUSY;
}
fd->subctxt = subctxt;
__set_bit(fd->subctxt, uctxt->in_use_ctxts);
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
fd->uctxt = uctxt;
hfi1_rcd_get(uctxt);
return 1;
}
/**
* find_sub_ctxt
* @fd: valid filedata pointer
* @uinfo: matching info to use to find a possible context to share.
*
* The hfi1_mutex must be held when this function is called. It is
* necessary to ensure serialized access to the bitmask in_use_ctxts.
* necessary to ensure serialized creation of shared contexts.
*
* Return:
* 0 No sub-context found
* 1 Subcontext found and allocated
* errno EINVAL (incorrect parameters)
* EBUSY (all sub contexts in use)
*/
static int find_sub_ctxt(struct hfi1_filedata *fd,
const struct hfi1_user_info *uinfo)
{
int i;
struct hfi1_ctxtdata *uctxt;
struct hfi1_devdata *dd = fd->dd;
u16 subctxt;
u16 i;
int ret;
if (!uinfo->subctxt_cnt)
return 0;
for (i = dd->first_dyn_alloc_ctxt; i < dd->num_rcv_contexts; i++) {
struct hfi1_ctxtdata *uctxt = dd->rcd[i];
/* Skip ctxts which are not yet open */
if (!uctxt ||
bitmap_empty(uctxt->in_use_ctxts,
HFI1_MAX_SHARED_CTXTS))
continue;
/* Skip dynamically allocted kernel contexts */
if (uctxt->sc && (uctxt->sc->type == SC_KERNEL))
continue;
/* Skip ctxt if it doesn't match the requested one */
if (memcmp(uctxt->uuid, uinfo->uuid,
sizeof(uctxt->uuid)) ||
uctxt->jkey != generate_jkey(current_uid()) ||
uctxt->subctxt_id != uinfo->subctxt_id ||
uctxt->subctxt_cnt != uinfo->subctxt_cnt)
continue;
/* Verify the sharing process matches the master */
if (uctxt->userversion != uinfo->userversion)
return -EINVAL;
/* Find an unused context */
subctxt = find_first_zero_bit(uctxt->in_use_ctxts,
HFI1_MAX_SHARED_CTXTS);
if (subctxt >= uctxt->subctxt_cnt)
return -EBUSY;
fd->uctxt = uctxt;
fd->subctxt = subctxt;
__set_bit(fd->subctxt, uctxt->in_use_ctxts);
return 1;
uctxt = hfi1_rcd_get_by_index(dd, i);
if (uctxt) {
ret = match_ctxt(fd, uinfo, uctxt);
hfi1_rcd_put(uctxt);
/* value of != 0 will return */
if (ret)
return ret;
}
}
return 0;
}
static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
struct hfi1_user_info *uinfo)
struct hfi1_user_info *uinfo,
struct hfi1_ctxtdata **rcd)
{
struct hfi1_ctxtdata *uctxt;
unsigned int ctxt;
int ret, numa;
if (dd->flags & HFI1_FROZEN) {
@ -987,22 +1064,9 @@ static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
return -EIO;
}
/*
* This check is sort of redundant to the next EBUSY error. It would
* also indicate an inconsistancy in the driver if this value was
* zero, but there were still contexts available.
*/
if (!dd->freectxts)
return -EBUSY;
for (ctxt = dd->first_dyn_alloc_ctxt;
ctxt < dd->num_rcv_contexts; ctxt++)
if (!dd->rcd[ctxt])
break;
if (ctxt == dd->num_rcv_contexts)
return -EBUSY;
/*
* If we don't have a NUMA node requested, preference is towards
* device NUMA node.
@ -1012,11 +1076,10 @@ static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
numa = cpu_to_node(fd->rec_cpu_num);
else
numa = numa_node_id();
uctxt = hfi1_create_ctxtdata(dd->pport, ctxt, numa);
if (!uctxt) {
dd_dev_err(dd,
"Unable to allocate ctxtdata memory, failing open\n");
return -ENOMEM;
ret = hfi1_create_ctxtdata(dd->pport, numa, &uctxt);
if (ret < 0) {
dd_dev_err(dd, "user ctxtdata allocation failed\n");
return ret;
}
hfi1_cdbg(PROC, "[%u:%u] pid %u assigned to CPU %d (NUMA %u)",
uctxt->ctxt, fd->subctxt, current->pid, fd->rec_cpu_num,
@ -1025,8 +1088,7 @@ static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
/*
* Allocate and enable a PIO send context.
*/
uctxt->sc = sc_alloc(dd, SC_USER, uctxt->rcvhdrqentsize,
uctxt->dd->node);
uctxt->sc = sc_alloc(dd, SC_USER, uctxt->rcvhdrqentsize, dd->node);
if (!uctxt->sc) {
ret = -ENOMEM;
goto ctxdata_free;
@ -1038,28 +1100,19 @@ static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
goto ctxdata_free;
/*
* Setup sub context resources if the user-level has requested
* Setup sub context information if the user-level has requested
* sub contexts.
* This has to be done here so the rest of the sub-contexts find the
* proper master.
* proper base context.
*/
if (uinfo->subctxt_cnt) {
ret = init_subctxts(uctxt, uinfo);
/*
* On error, we don't need to disable and de-allocate the
* send context because it will be done during file close
*/
if (ret)
goto ctxdata_free;
}
if (uinfo->subctxt_cnt)
init_subctxts(uctxt, uinfo);
uctxt->userversion = uinfo->userversion;
uctxt->flags = hfi1_cap_mask; /* save current flag state */
init_waitqueue_head(&uctxt->wait);
strlcpy(uctxt->comm, current->comm, sizeof(uctxt->comm));
memcpy(uctxt->uuid, uinfo->uuid, sizeof(uctxt->uuid));
uctxt->jkey = generate_jkey(current_uid());
INIT_LIST_HEAD(&uctxt->sdma_queues);
spin_lock_init(&uctxt->sdma_qlock);
hfi1_stats.sps_ctxts++;
/*
* Disable ASPM when there are open user/PSM contexts to avoid
@ -1067,31 +1120,33 @@ static int allocate_ctxt(struct hfi1_filedata *fd, struct hfi1_devdata *dd,
*/
if (dd->freectxts-- == dd->num_user_contexts)
aspm_disable_all(dd);
fd->uctxt = uctxt;
*rcd = uctxt;
return 0;
ctxdata_free:
dd->rcd[ctxt] = NULL;
hfi1_free_ctxtdata(dd, uctxt);
hfi1_free_ctxt(uctxt);
return ret;
}
static int init_subctxts(struct hfi1_ctxtdata *uctxt,
const struct hfi1_user_info *uinfo)
static void deallocate_ctxt(struct hfi1_ctxtdata *uctxt)
{
u16 num_subctxts;
mutex_lock(&hfi1_mutex);
hfi1_stats.sps_ctxts--;
if (++uctxt->dd->freectxts == uctxt->dd->num_user_contexts)
aspm_enable_all(uctxt->dd);
mutex_unlock(&hfi1_mutex);
num_subctxts = uinfo->subctxt_cnt;
if (num_subctxts > HFI1_MAX_SHARED_CTXTS)
return -EINVAL;
hfi1_free_ctxt(uctxt);
}
static void init_subctxts(struct hfi1_ctxtdata *uctxt,
const struct hfi1_user_info *uinfo)
{
uctxt->subctxt_cnt = uinfo->subctxt_cnt;
uctxt->subctxt_id = uinfo->subctxt_id;
uctxt->redirect_seq_cnt = 1;
set_bit(HFI1_CTXT_BASE_UNINIT, &uctxt->event_flags);
return 0;
}
static int setup_subctxt(struct hfi1_ctxtdata *uctxt)
@ -1153,7 +1208,7 @@ static void user_init(struct hfi1_ctxtdata *uctxt)
clear_rcvhdrtail(uctxt);
/* Setup J_KEY before enabling the context */
hfi1_set_ctxt_jkey(uctxt->dd, uctxt->ctxt, uctxt->jkey);
hfi1_set_ctxt_jkey(uctxt->dd, uctxt, uctxt->jkey);
rcvctrl_ops = HFI1_RCVCTRL_CTXT_ENB;
if (HFI1_CAP_UGET_MASK(uctxt->flags, HDRSUPP))
@ -1179,7 +1234,7 @@ static void user_init(struct hfi1_ctxtdata *uctxt)
rcvctrl_ops |= HFI1_RCVCTRL_TAILUPD_ENB;
else
rcvctrl_ops |= HFI1_RCVCTRL_TAILUPD_DIS;
hfi1_rcvctrl(uctxt->dd, rcvctrl_ops, uctxt->ctxt);
hfi1_rcvctrl(uctxt->dd, rcvctrl_ops, uctxt);
}
static int get_ctxt_info(struct hfi1_filedata *fd, void __user *ubase,
@ -1223,23 +1278,25 @@ static int get_ctxt_info(struct hfi1_filedata *fd, void __user *ubase,
return ret;
}
static int init_user_ctxt(struct hfi1_filedata *fd)
static int init_user_ctxt(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt)
{
struct hfi1_ctxtdata *uctxt = fd->uctxt;
int ret;
ret = hfi1_user_sdma_alloc_queues(uctxt, fd);
if (ret)
return ret;
ret = hfi1_user_exp_rcv_init(fd);
ret = hfi1_user_exp_rcv_init(fd, uctxt);
if (ret)
hfi1_user_sdma_free_queues(fd, uctxt);
return ret;
}
static int setup_base_ctxt(struct hfi1_filedata *fd)
static int setup_base_ctxt(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt)
{
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_devdata *dd = uctxt->dd;
int ret = 0;
@ -1260,20 +1317,27 @@ static int setup_base_ctxt(struct hfi1_filedata *fd)
if (ret)
goto setup_failed;
ret = hfi1_user_exp_rcv_grp_init(fd);
ret = hfi1_alloc_ctxt_rcv_groups(uctxt);
if (ret)
goto setup_failed;
ret = init_user_ctxt(fd);
ret = init_user_ctxt(fd, uctxt);
if (ret)
goto setup_failed;
user_init(uctxt);
/* Now that the context is set up, the fd can get a reference. */
fd->uctxt = uctxt;
hfi1_rcd_get(uctxt);
return 0;
setup_failed:
hfi1_free_ctxtdata(dd, uctxt);
/* Set the failed bit so sub-context init can do the right thing */
set_bit(HFI1_CTXT_BASE_FAILED, &uctxt->event_flags);
deallocate_ctxt(uctxt);
return ret;
}
@ -1390,7 +1454,7 @@ static unsigned int poll_next(struct file *fp,
spin_lock_irq(&dd->uctxt_lock);
if (hdrqempty(uctxt)) {
set_bit(HFI1_CTXT_WAITING_RCV, &uctxt->event_flags);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_INTRAVAIL_ENB, uctxt->ctxt);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_INTRAVAIL_ENB, uctxt);
pollflag = 0;
} else {
pollflag = POLLIN | POLLRDNORM;
@ -1409,19 +1473,14 @@ int hfi1_set_uevent_bits(struct hfi1_pportdata *ppd, const int evtbit)
{
struct hfi1_ctxtdata *uctxt;
struct hfi1_devdata *dd = ppd->dd;
unsigned ctxt;
int ret = 0;
unsigned long flags;
u16 ctxt;
if (!dd->events) {
ret = -EINVAL;
goto done;
}
if (!dd->events)
return -EINVAL;
spin_lock_irqsave(&dd->uctxt_lock, flags);
for (ctxt = dd->first_dyn_alloc_ctxt; ctxt < dd->num_rcv_contexts;
ctxt++) {
uctxt = dd->rcd[ctxt];
uctxt = hfi1_rcd_get_by_index(dd, ctxt);
if (uctxt) {
unsigned long *evs = dd->events +
(uctxt->ctxt - dd->first_dyn_alloc_ctxt) *
@ -1434,11 +1493,11 @@ int hfi1_set_uevent_bits(struct hfi1_pportdata *ppd, const int evtbit)
set_bit(evtbit, evs);
for (i = 1; i < uctxt->subctxt_cnt; i++)
set_bit(evtbit, evs + i);
hfi1_rcd_put(uctxt);
}
}
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
done:
return ret;
return 0;
}
/**
@ -1475,7 +1534,7 @@ static int manage_rcvq(struct hfi1_ctxtdata *uctxt, u16 subctxt,
} else {
rcvctrl_op = HFI1_RCVCTRL_CTXT_DIS;
}
hfi1_rcvctrl(dd, rcvctrl_op, uctxt->ctxt);
hfi1_rcvctrl(dd, rcvctrl_op, uctxt);
/* always; new head should be equal to new tail; see above */
bail:
return 0;
@ -1525,7 +1584,7 @@ static int set_ctxt_pkey(struct hfi1_ctxtdata *uctxt, u16 subctxt, u16 pkey)
}
if (intable)
ret = hfi1_set_ctxt_pkey(dd, uctxt->ctxt, pkey);
ret = hfi1_set_ctxt_pkey(dd, uctxt, pkey);
done:
return ret;
}

View File

@ -64,30 +64,22 @@
#define DEFAULT_FW_FABRIC_NAME "hfi1_fabric.fw"
#define DEFAULT_FW_SBUS_NAME "hfi1_sbus.fw"
#define DEFAULT_FW_PCIE_NAME "hfi1_pcie.fw"
#define DEFAULT_PLATFORM_CONFIG_NAME "hfi1_platform.dat"
#define ALT_FW_8051_NAME_ASIC "hfi1_dc8051_d.fw"
#define ALT_FW_FABRIC_NAME "hfi1_fabric_d.fw"
#define ALT_FW_SBUS_NAME "hfi1_sbus_d.fw"
#define ALT_FW_PCIE_NAME "hfi1_pcie_d.fw"
#define HOST_INTERFACE_VERSION 1
static uint fw_8051_load = 1;
static uint fw_fabric_serdes_load = 1;
static uint fw_pcie_serdes_load = 1;
static uint fw_sbus_load = 1;
/*
* Access required in platform.c
* Maintains state of whether the platform config was fetched via the
* fallback option
*/
uint platform_config_load;
/* Firmware file names get set in hfi1_firmware_init() based on the above */
static char *fw_8051_name;
static char *fw_fabric_serdes_name;
static char *fw_sbus_name;
static char *fw_pcie_serdes_name;
static char *platform_config_name;
#define SBUS_MAX_POLL_COUNT 100
#define SBUS_COUNTER(reg, name) \
@ -177,7 +169,6 @@ static struct firmware_details fw_8051;
static struct firmware_details fw_fabric;
static struct firmware_details fw_pcie;
static struct firmware_details fw_sbus;
static const struct firmware *platform_config;
/* flags for turn_off_spicos() */
#define SPICO_SBUS 0x1
@ -615,6 +606,14 @@ retry:
fw_fabric_serdes_name = ALT_FW_FABRIC_NAME;
fw_sbus_name = ALT_FW_SBUS_NAME;
fw_pcie_serdes_name = ALT_FW_PCIE_NAME;
/*
* Add a delay before obtaining and loading debug firmware.
* Authorization will fail if the delay between firmware
* authorization events is shorter than 50us. Add 100us to
* make a delay time safe.
*/
usleep_range(100, 120);
}
if (fw_sbus_load) {
@ -675,7 +674,6 @@ done:
static int obtain_firmware(struct hfi1_devdata *dd)
{
unsigned long timeout;
int err = 0;
mutex_lock(&fw_mutex);
@ -699,38 +697,11 @@ static int obtain_firmware(struct hfi1_devdata *dd)
}
/* not in FW_TRY state */
if (fw_state == FW_FINAL) {
if (platform_config) {
dd->platform_config.data = platform_config->data;
dd->platform_config.size = platform_config->size;
}
goto done; /* already acquired */
} else if (fw_state == FW_ERR) {
goto done; /* already tried and failed */
}
/* fw_state is FW_EMPTY */
/* set fw_state to FW_TRY, FW_FINAL, or FW_ERR, and fw_err */
__obtain_firmware(dd);
if (fw_state == FW_EMPTY)
__obtain_firmware(dd);
if (platform_config_load) {
platform_config = NULL;
err = request_firmware(&platform_config, platform_config_name,
&dd->pcidev->dev);
if (err) {
platform_config = NULL;
dd_dev_err(dd,
"%s: No default platform config file found\n",
__func__);
goto done;
}
dd->platform_config.data = platform_config->data;
dd->platform_config.size = platform_config->size;
}
done:
mutex_unlock(&fw_mutex);
return fw_err;
}
@ -752,9 +723,6 @@ void dispose_firmware(void)
dispose_one_firmware(&fw_pcie);
dispose_one_firmware(&fw_sbus);
release_firmware(platform_config);
platform_config = NULL;
/* retain the error state, otherwise revert to empty */
if (fw_state != FW_ERR)
fw_state = FW_EMPTY;
@ -1079,6 +1047,13 @@ static int load_8051_firmware(struct hfi1_devdata *dd,
dd_dev_info(dd, "8051 firmware version %d.%d.%d\n",
(int)ver_major, (int)ver_minor, (int)ver_patch);
dd->dc8051_ver = dc8051_ver(ver_major, ver_minor, ver_patch);
ret = write_host_interface_version(dd, HOST_INTERFACE_VERSION);
if (ret != HCMD_SUCCESS) {
dd_dev_err(dd,
"Failed to set host interface version, return 0x%x\n",
ret);
return -EIO;
}
return 0;
}
@ -1709,10 +1684,8 @@ int hfi1_firmware_init(struct hfi1_devdata *dd)
}
/* no 8051 or QSFP on simulator */
if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR) {
if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR)
fw_8051_load = 0;
platform_config_load = 0;
}
if (!fw_8051_name) {
if (dd->icode == ICODE_RTL_SILICON)
@ -1726,8 +1699,6 @@ int hfi1_firmware_init(struct hfi1_devdata *dd)
fw_sbus_name = DEFAULT_FW_SBUS_NAME;
if (!fw_pcie_serdes_name)
fw_pcie_serdes_name = DEFAULT_FW_PCIE_NAME;
if (!platform_config_name)
platform_config_name = DEFAULT_PLATFORM_CONFIG_NAME;
return obtain_firmware(dd);
}
@ -1773,6 +1744,7 @@ static int check_meta_version(struct hfi1_devdata *dd, u32 *system_table)
int parse_platform_config(struct hfi1_devdata *dd)
{
struct platform_config_cache *pcfgcache = &dd->pcfg_cache;
struct hfi1_pportdata *ppd = dd->pport;
u32 *ptr = NULL;
u32 header1 = 0, header2 = 0, magic_num = 0, crc = 0, file_length = 0;
u32 record_idx = 0, table_type = 0, table_length_dwords = 0;
@ -1784,7 +1756,7 @@ int parse_platform_config(struct hfi1_devdata *dd)
* scratch register bitmap, thus there is no platform config to parse.
* Skip parsing in these situations.
*/
if (is_integrated(dd) && !platform_config_load)
if (ppd->config_from_scratch)
return 0;
if (!dd->platform_config.data) {
@ -2073,13 +2045,14 @@ int get_platform_config_field(struct hfi1_devdata *dd,
int ret = 0, wlen = 0, seek = 0;
u32 field_len_bits = 0, field_start_bits = 0, *src_ptr = NULL;
struct platform_config_cache *pcfgcache = &dd->pcfg_cache;
struct hfi1_pportdata *ppd = dd->pport;
if (data)
memset(data, 0, len);
else
return -EINVAL;
if (is_integrated(dd) && !platform_config_load) {
if (ppd->config_from_scratch) {
/*
* Use saved configuration from ppd for integrated platforms
*/

View File

@ -66,9 +66,11 @@
#include <linux/i2c.h>
#include <linux/i2c-algo-bit.h>
#include <rdma/ib_hdrs.h>
#include <rdma/opa_addr.h>
#include <linux/rhashtable.h>
#include <linux/netdevice.h>
#include <rdma/rdma_vt.h>
#include <rdma/opa_addr.h>
#include "chip_registers.h"
#include "common.h"
@ -213,13 +215,11 @@ struct hfi1_ctxtdata {
/* dynamic receive available interrupt timeout */
u32 rcvavail_timeout;
/*
* number of opens (including slave sub-contexts) on this instance
* (ignoring forks, dup, etc. for now)
*/
int cnt;
/* Reference count the base context usage */
struct kref kref;
/* Device context index */
unsigned ctxt;
u16 ctxt;
/*
* non-zero if ctxt can be shared, and defines the maximum number of
* sub-contexts for this device context.
@ -245,24 +245,10 @@ struct hfi1_ctxtdata {
/* lock protecting all Expected TID data */
struct mutex exp_lock;
/* number of pio bufs for this ctxt (all procs, if shared) */
u32 piocnt;
/* first pio buffer for this ctxt */
u32 pio_base;
/* chip offset of PIO buffers for this ctxt */
u32 piobufs;
/* per-context configuration flags */
unsigned long flags;
/* per-context event flags for fileops/intr communication */
unsigned long event_flags;
/* WAIT_RCV that timed out, no interrupt */
u32 rcvwait_to;
/* WAIT_PIO that timed out, no interrupt */
u32 piowait_to;
/* WAIT_RCV already happened, no wait */
u32 rcvnowait;
/* WAIT_PIO already happened, no wait */
u32 pionowait;
/* total number of polled urgent packets */
u32 urgent;
/* saved total number of polled urgent packets for poll edge trigger */
@ -289,10 +275,8 @@ struct hfi1_ctxtdata {
u16 poll_type;
/* receive packet sequence counter */
u8 seq_cnt;
u8 redirect_seq_cnt;
/* ctxt rcvhdrq head offset */
u32 head;
u32 pkt_count;
/* QPs waiting for context processing */
struct list_head qp_wait_list;
/* interrupt handling */
@ -301,15 +285,6 @@ struct hfi1_ctxtdata {
unsigned numa_id; /* numa node of this context */
/* verbs stats per CTX */
struct hfi1_opcode_stats_perctx *opstats;
/*
* This is the kernel thread that will keep making
* progress on the user sdma requests behind the scenes.
* There is one per context (shared contexts use the master's).
*/
struct task_struct *progress;
struct list_head sdma_queues;
/* protect sdma queues */
spinlock_t sdma_qlock;
/* Is ASPM interrupt supported for this context */
bool aspm_intr_supported;
@ -352,23 +327,150 @@ struct hfi1_ctxtdata {
struct hfi1_packet {
void *ebuf;
void *hdr;
void *payload;
struct hfi1_ctxtdata *rcd;
__le32 *rhf_addr;
struct rvt_qp *qp;
struct ib_other_headers *ohdr;
struct ib_grh *grh;
u64 rhf;
u32 maxcnt;
u32 rhqoff;
u32 dlid;
u32 slid;
u16 tlen;
s16 etail;
u8 hlen;
u8 numpkt;
u8 rsize;
u8 updegr;
u8 rcv_flags;
u8 etype;
u8 extra_byte;
u8 pad;
u8 sc;
u8 sl;
u8 opcode;
bool becn;
bool fecn;
};
/* Packet types */
#define HFI1_PKT_TYPE_9B 0
#define HFI1_PKT_TYPE_16B 1
/*
* OPA 16B Header
*/
#define OPA_16B_L4_MASK 0xFFull
#define OPA_16B_SC_MASK 0x1F00000ull
#define OPA_16B_SC_SHIFT 20
#define OPA_16B_LID_MASK 0xFFFFFull
#define OPA_16B_DLID_MASK 0xF000ull
#define OPA_16B_DLID_SHIFT 20
#define OPA_16B_DLID_HIGH_SHIFT 12
#define OPA_16B_SLID_MASK 0xF00ull
#define OPA_16B_SLID_SHIFT 20
#define OPA_16B_SLID_HIGH_SHIFT 8
#define OPA_16B_BECN_MASK 0x80000000ull
#define OPA_16B_BECN_SHIFT 31
#define OPA_16B_FECN_MASK 0x10000000ull
#define OPA_16B_FECN_SHIFT 28
#define OPA_16B_L2_MASK 0x60000000ull
#define OPA_16B_L2_SHIFT 29
#define OPA_16B_PKEY_MASK 0xFFFF0000ull
#define OPA_16B_PKEY_SHIFT 16
#define OPA_16B_LEN_MASK 0x7FF00000ull
#define OPA_16B_LEN_SHIFT 20
#define OPA_16B_RC_MASK 0xE000000ull
#define OPA_16B_RC_SHIFT 25
#define OPA_16B_AGE_MASK 0xFF0000ull
#define OPA_16B_AGE_SHIFT 16
#define OPA_16B_ENTROPY_MASK 0xFFFFull
/*
* OPA 16B L2/L4 Encodings
*/
#define OPA_16B_L2_TYPE 0x02
#define OPA_16B_L4_IB_LOCAL 0x09
#define OPA_16B_L4_IB_GLOBAL 0x0A
#define OPA_16B_L4_ETHR OPA_VNIC_L4_ETHR
static inline u8 hfi1_16B_get_l4(struct hfi1_16b_header *hdr)
{
return (u8)(hdr->lrh[2] & OPA_16B_L4_MASK);
}
static inline u8 hfi1_16B_get_sc(struct hfi1_16b_header *hdr)
{
return (u8)((hdr->lrh[1] & OPA_16B_SC_MASK) >> OPA_16B_SC_SHIFT);
}
static inline u32 hfi1_16B_get_dlid(struct hfi1_16b_header *hdr)
{
return (u32)((hdr->lrh[1] & OPA_16B_LID_MASK) |
(((hdr->lrh[2] & OPA_16B_DLID_MASK) >>
OPA_16B_DLID_HIGH_SHIFT) << OPA_16B_DLID_SHIFT));
}
static inline u32 hfi1_16B_get_slid(struct hfi1_16b_header *hdr)
{
return (u32)((hdr->lrh[0] & OPA_16B_LID_MASK) |
(((hdr->lrh[2] & OPA_16B_SLID_MASK) >>
OPA_16B_SLID_HIGH_SHIFT) << OPA_16B_SLID_SHIFT));
}
static inline u8 hfi1_16B_get_becn(struct hfi1_16b_header *hdr)
{
return (u8)((hdr->lrh[0] & OPA_16B_BECN_MASK) >> OPA_16B_BECN_SHIFT);
}
static inline u8 hfi1_16B_get_fecn(struct hfi1_16b_header *hdr)
{
return (u8)((hdr->lrh[1] & OPA_16B_FECN_MASK) >> OPA_16B_FECN_SHIFT);
}
static inline u8 hfi1_16B_get_l2(struct hfi1_16b_header *hdr)
{
return (u8)((hdr->lrh[1] & OPA_16B_L2_MASK) >> OPA_16B_L2_SHIFT);
}
static inline u16 hfi1_16B_get_pkey(struct hfi1_16b_header *hdr)
{
return (u16)((hdr->lrh[2] & OPA_16B_PKEY_MASK) >> OPA_16B_PKEY_SHIFT);
}
static inline u8 hfi1_16B_get_rc(struct hfi1_16b_header *hdr)
{
return (u8)((hdr->lrh[1] & OPA_16B_RC_MASK) >> OPA_16B_RC_SHIFT);
}
static inline u8 hfi1_16B_get_age(struct hfi1_16b_header *hdr)
{
return (u8)((hdr->lrh[3] & OPA_16B_AGE_MASK) >> OPA_16B_AGE_SHIFT);
}
static inline u16 hfi1_16B_get_len(struct hfi1_16b_header *hdr)
{
return (u16)((hdr->lrh[0] & OPA_16B_LEN_MASK) >> OPA_16B_LEN_SHIFT);
}
static inline u16 hfi1_16B_get_entropy(struct hfi1_16b_header *hdr)
{
return (u16)(hdr->lrh[3] & OPA_16B_ENTROPY_MASK);
}
#define OPA_16B_MAKE_QW(low_dw, high_dw) (((u64)(high_dw) << 32) | (low_dw))
/*
* BTH
*/
#define OPA_16B_BTH_PAD_MASK 7
static inline u8 hfi1_16B_bth_get_pad(struct ib_other_headers *ohdr)
{
return (u8)((be32_to_cpu(ohdr->bth[0]) >> IB_BTH_PAD_SHIFT) &
OPA_16B_BTH_PAD_MASK);
}
struct rvt_sge_state;
/*
@ -512,7 +614,7 @@ static inline void incr_cntr32(u32 *cntr)
#define MAX_NAME_SIZE 64
struct hfi1_msix_entry {
enum irq_type type;
struct msix_entry msix;
int irq;
void *arg;
char name[MAX_NAME_SIZE];
cpumask_t mask;
@ -575,6 +677,9 @@ struct hfi1_pportdata {
u8 default_atten;
u8 max_power_class;
/* did we read platform config from scratch registers? */
bool config_from_scratch;
/* GUIDs for this interface, in host order, guids[0] is a port guid */
u64 guids[HFI1_GUIDS_PER_PORT];
@ -593,6 +698,7 @@ struct hfi1_pportdata {
/* SendDMA related entries */
struct workqueue_struct *hfi1_wq;
struct workqueue_struct *link_wq;
/* move out of interrupt context */
struct work_struct link_vc_work;
@ -607,8 +713,6 @@ struct hfi1_pportdata {
struct mutex hls_lock;
u32 host_link_state;
u32 lstate; /* logical link state */
/* these are the "32 bit" regs */
u32 ibmtu; /* The MTU programmed for this unit */
@ -619,7 +723,7 @@ struct hfi1_pportdata {
u32 ibmaxlen;
u32 current_egress_rate; /* units [10^6 bits/sec] */
/* LID programmed for this instance */
u16 lid;
u32 lid;
/* list of pkeys programmed; 0 if not set */
u16 pkeys[MAX_PKEY_VALUES];
u16 link_width_supported;
@ -654,12 +758,12 @@ struct hfi1_pportdata {
u8 link_enabled; /* link enabled? */
u8 linkinit_reason;
u8 local_tx_rate; /* rate given to 8051 firmware */
u8 last_pstate; /* info only */
u8 qsfp_retry_count;
/* placeholders for IB MAD packet settings */
u8 overrun_threshold;
u8 phy_error_threshold;
unsigned int is_link_down_queued;
/* Used to override LED behavior for things like maintenance beaconing*/
/*
@ -756,6 +860,10 @@ struct hfi1_pportdata {
typedef int (*rhf_rcv_function_ptr)(struct hfi1_packet *packet);
typedef void (*opcode_handler)(struct hfi1_packet *packet);
typedef void (*hfi1_make_req)(struct rvt_qp *qp,
struct hfi1_pkt_state *ps,
struct rvt_swqe *wqe);
/* return values for the RHF receive functions */
#define RHF_RCV_CONTINUE 0 /* keep going */
@ -860,12 +968,15 @@ struct hfi1_devdata {
struct device *diag_device;
struct device *ui_device;
/* mem-mapped pointer to base of chip regs */
u8 __iomem *kregbase;
/* end of mem-mapped chip space excluding sendbuf and user regs */
u8 __iomem *kregend;
/* physical address of chip for io_remap, etc. */
/* first mapping up to RcvArray */
u8 __iomem *kregbase1;
resource_size_t physaddr;
/* second uncached mapping from RcvArray to pio send buffers */
u8 __iomem *kregbase2;
/* for detecting offset above kregbase2 address */
u32 base2_start;
/* Per VL data. Enough for all VLs but not all elements are set/used. */
struct per_vl_data vld[PER_VL_SEND_CONTEXTS];
/* send context data */
@ -953,8 +1064,7 @@ struct hfi1_devdata {
u64 __iomem *egrtidbase;
spinlock_t sendctrl_lock; /* protect changes to SendCtrl */
spinlock_t rcvctrl_lock; /* protect changes to RcvCtrl */
/* around rcd and (user ctxts) ctxt_cnt use (intr vs free) */
spinlock_t uctxt_lock; /* rcd and user context changes */
spinlock_t uctxt_lock; /* protect rcd changes */
struct mutex dc8051_lock; /* exclusive access to 8051 */
struct workqueue_struct *update_cntr_wq;
struct work_struct update_cntr_work;
@ -1229,9 +1339,10 @@ static inline bool hfi1_vnic_is_rsm_full(struct hfi1_devdata *dd, int spare)
#define dc8051_ver_patch(a) ((a) & 0x0000ff)
/* f_put_tid types */
#define PT_EXPECTED 0
#define PT_EAGER 1
#define PT_INVALID 2
#define PT_EXPECTED 0
#define PT_EAGER 1
#define PT_INVALID_FLUSH 2
#define PT_INVALID 3
struct tid_rb_node;
struct mmu_rb_node;
@ -1276,13 +1387,16 @@ void handle_user_interrupt(struct hfi1_ctxtdata *rcd);
int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd);
int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd);
int hfi1_create_ctxts(struct hfi1_devdata *dd);
struct hfi1_ctxtdata *hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, u32 ctxt,
int numa);
int hfi1_create_kctxts(struct hfi1_devdata *dd);
int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
struct hfi1_ctxtdata **rcd);
void hfi1_free_ctxt(struct hfi1_ctxtdata *rcd);
void hfi1_init_pportdata(struct pci_dev *pdev, struct hfi1_pportdata *ppd,
struct hfi1_devdata *dd, u8 hw_pidx, u8 port);
void hfi1_free_ctxtdata(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd);
int hfi1_rcd_put(struct hfi1_ctxtdata *rcd);
void hfi1_rcd_get(struct hfi1_ctxtdata *rcd);
struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt);
int handle_receive_interrupt(struct hfi1_ctxtdata *rcd, int thread);
int handle_receive_interrupt_nodma_rtail(struct hfi1_ctxtdata *rcd, int thread);
int handle_receive_interrupt_dma_rtail(struct hfi1_ctxtdata *rcd, int thread);
@ -1292,6 +1406,13 @@ void hfi1_set_vnic_msix_info(struct hfi1_ctxtdata *rcd);
void hfi1_reset_vnic_msix_info(struct hfi1_ctxtdata *rcd);
extern const struct pci_device_id hfi1_pci_tbl[];
void hfi1_make_ud_req_9B(struct rvt_qp *qp,
struct hfi1_pkt_state *ps,
struct rvt_swqe *wqe);
void hfi1_make_ud_req_16B(struct rvt_qp *qp,
struct hfi1_pkt_state *ps,
struct rvt_swqe *wqe);
/* receive packet handler dispositions */
#define RCV_PKT_OK 0x0 /* keep going */
@ -1306,21 +1427,6 @@ static inline __le32 *get_rhf_addr(struct hfi1_ctxtdata *rcd)
int hfi1_reset_device(int);
/* return the driver's idea of the logical OPA port state */
static inline u32 driver_lstate(struct hfi1_pportdata *ppd)
{
/*
* The driver does some processing from the time the logical
* link state is at INIT to the time the SM can be notified
* as such. Return IB_PORT_DOWN until the software state
* is ready.
*/
if (ppd->lstate == IB_PORT_INIT && !(ppd->host_link_state & HLS_UP))
return IB_PORT_DOWN;
else
return ppd->lstate;
}
void receive_interrupt_work(struct work_struct *work);
/* extract service channel from header and rhf */
@ -1413,13 +1519,25 @@ static inline u32 egress_cycles(u32 len, u32 rate)
}
void set_link_ipg(struct hfi1_pportdata *ppd);
void process_becn(struct hfi1_pportdata *ppd, u8 sl, u16 rlid, u32 lqpn,
void process_becn(struct hfi1_pportdata *ppd, u8 sl, u32 rlid, u32 lqpn,
u32 rqpn, u8 svc_type);
void return_cnp(struct hfi1_ibport *ibp, struct rvt_qp *qp, u32 remote_qpn,
u32 pkey, u32 slid, u32 dlid, u8 sc5,
const struct ib_grh *old_grh);
void return_cnp_16B(struct hfi1_ibport *ibp, struct rvt_qp *qp,
u32 remote_qpn, u32 pkey, u32 slid, u32 dlid,
u8 sc5, const struct ib_grh *old_grh);
typedef void (*hfi1_handle_cnp)(struct hfi1_ibport *ibp, struct rvt_qp *qp,
u32 remote_qpn, u32 pkey, u32 slid, u32 dlid,
u8 sc5, const struct ib_grh *old_grh);
/* We support only two types - 9B and 16B for now */
static const hfi1_handle_cnp hfi1_handle_cnp_tbl[2] = {
[HFI1_PKT_TYPE_9B] = &return_cnp,
[HFI1_PKT_TYPE_16B] = &return_cnp_16B
};
#define PKEY_CHECK_INVALID -1
int egress_pkey_check(struct hfi1_pportdata *ppd, __be16 *lrh, __be32 *bth,
int egress_pkey_check(struct hfi1_pportdata *ppd, u32 slid, u16 pkey,
u8 sc5, int8_t s_pkey_index);
#define PACKET_EGRESS_TIMEOUT 350
@ -1522,9 +1640,9 @@ static void ingress_pkey_table_fail(struct hfi1_pportdata *ppd, u16 pkey,
* by HW and rcv_pkey_check function should be called instead.
*/
static inline int ingress_pkey_check(struct hfi1_pportdata *ppd, u16 pkey,
u8 sc5, u8 idx, u16 slid)
u8 sc5, u8 idx, u32 slid, bool force)
{
if (!(ppd->part_enforce & HFI1_PART_ENFORCE_IN))
if (!(force) && !(ppd->part_enforce & HFI1_PART_ENFORCE_IN))
return 0;
/* If SC15, pkey[0:14] must be 0x7fff */
@ -1658,12 +1776,22 @@ static inline bool process_ecn(struct rvt_qp *qp, struct hfi1_packet *pkt,
bool do_cnp)
{
struct ib_other_headers *ohdr = pkt->ohdr;
u32 bth1;
bth1 = be32_to_cpu(ohdr->bth[1]);
if (unlikely(bth1 & (IB_BECN_SMASK | IB_FECN_SMASK))) {
u32 bth1;
bool becn = false;
bool fecn = false;
if (pkt->etype == RHF_RCV_TYPE_BYPASS) {
fecn = hfi1_16B_get_fecn(pkt->hdr);
becn = hfi1_16B_get_becn(pkt->hdr);
} else {
bth1 = be32_to_cpu(ohdr->bth[1]);
fecn = bth1 & IB_FECN_SMASK;
becn = bth1 & IB_BECN_SMASK;
}
if (unlikely(fecn || becn)) {
hfi1_process_ecn_slowpath(qp, pkt, do_cnp);
return !!(bth1 & IB_FECN_SMASK);
return fecn;
}
return false;
}
@ -1829,10 +1957,9 @@ void hfi1_pcie_cleanup(struct pci_dev *pdev);
int hfi1_pcie_ddinit(struct hfi1_devdata *dd, struct pci_dev *pdev);
void hfi1_pcie_ddcleanup(struct hfi1_devdata *);
int pcie_speeds(struct hfi1_devdata *dd);
void request_msix(struct hfi1_devdata *dd, u32 *nent,
struct hfi1_msix_entry *entry);
void hfi1_enable_intx(struct pci_dev *pdev);
void restore_pci_variables(struct hfi1_devdata *dd);
int request_msix(struct hfi1_devdata *dd, u32 msireq);
int restore_pci_variables(struct hfi1_devdata *dd);
int save_pci_variables(struct hfi1_devdata *dd);
int do_pcie_gen3_transition(struct hfi1_devdata *dd);
int parse_platform_config(struct hfi1_devdata *dd);
int get_platform_config_field(struct hfi1_devdata *dd,
@ -1860,6 +1987,7 @@ int process_receive_error(struct hfi1_packet *packet);
int kdeth_process_expected(struct hfi1_packet *packet);
int kdeth_process_eager(struct hfi1_packet *packet);
int process_receive_invalid(struct hfi1_packet *packet);
void seqfile_dump_rcd(struct seq_file *s, struct hfi1_ctxtdata *rcd);
/* global module parameter variables */
extern unsigned int hfi1_max_mtu;
@ -1991,9 +2119,15 @@ static inline u64 hfi1_pkt_base_sdma_integrity(struct hfi1_devdata *dd)
#define dd_dev_emerg(dd, fmt, ...) \
dev_emerg(&(dd)->pcidev->dev, "%s: " fmt, \
get_unit_name((dd)->unit), ##__VA_ARGS__)
#define dd_dev_err(dd, fmt, ...) \
dev_err(&(dd)->pcidev->dev, "%s: " fmt, \
get_unit_name((dd)->unit), ##__VA_ARGS__)
#define dd_dev_err_ratelimited(dd, fmt, ...) \
dev_err_ratelimited(&(dd)->pcidev->dev, "%s: " fmt, \
get_unit_name((dd)->unit), ##__VA_ARGS__)
#define dd_dev_warn(dd, fmt, ...) \
dev_warn(&(dd)->pcidev->dev, "%s: " fmt, \
get_unit_name((dd)->unit), ##__VA_ARGS__)
@ -2087,52 +2221,220 @@ int hfi1_tempsense_rd(struct hfi1_devdata *dd, struct hfi1_temp *temp);
#define DD_DEV_ENTRY(dd) __string(dev, dev_name(&(dd)->pcidev->dev))
#define DD_DEV_ASSIGN(dd) __assign_str(dev, dev_name(&(dd)->pcidev->dev))
#define packettype_name(etype) { RHF_RCV_TYPE_##etype, #etype }
#define show_packettype(etype) \
__print_symbolic(etype, \
packettype_name(EXPECTED), \
packettype_name(EAGER), \
packettype_name(IB), \
packettype_name(ERROR), \
packettype_name(BYPASS))
static inline void hfi1_update_ah_attr(struct ib_device *ibdev,
struct rdma_ah_attr *attr)
{
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
u32 dlid = rdma_ah_get_dlid(attr);
#define ib_opcode_name(opcode) { IB_OPCODE_##opcode, #opcode }
#define show_ib_opcode(opcode) \
__print_symbolic(opcode, \
ib_opcode_name(RC_SEND_FIRST), \
ib_opcode_name(RC_SEND_MIDDLE), \
ib_opcode_name(RC_SEND_LAST), \
ib_opcode_name(RC_SEND_LAST_WITH_IMMEDIATE), \
ib_opcode_name(RC_SEND_ONLY), \
ib_opcode_name(RC_SEND_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(RC_RDMA_WRITE_FIRST), \
ib_opcode_name(RC_RDMA_WRITE_MIDDLE), \
ib_opcode_name(RC_RDMA_WRITE_LAST), \
ib_opcode_name(RC_RDMA_WRITE_LAST_WITH_IMMEDIATE), \
ib_opcode_name(RC_RDMA_WRITE_ONLY), \
ib_opcode_name(RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(RC_RDMA_READ_REQUEST), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_FIRST), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_MIDDLE), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_LAST), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_ONLY), \
ib_opcode_name(RC_ACKNOWLEDGE), \
ib_opcode_name(RC_ATOMIC_ACKNOWLEDGE), \
ib_opcode_name(RC_COMPARE_SWAP), \
ib_opcode_name(RC_FETCH_ADD), \
ib_opcode_name(UC_SEND_FIRST), \
ib_opcode_name(UC_SEND_MIDDLE), \
ib_opcode_name(UC_SEND_LAST), \
ib_opcode_name(UC_SEND_LAST_WITH_IMMEDIATE), \
ib_opcode_name(UC_SEND_ONLY), \
ib_opcode_name(UC_SEND_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(UC_RDMA_WRITE_FIRST), \
ib_opcode_name(UC_RDMA_WRITE_MIDDLE), \
ib_opcode_name(UC_RDMA_WRITE_LAST), \
ib_opcode_name(UC_RDMA_WRITE_LAST_WITH_IMMEDIATE), \
ib_opcode_name(UC_RDMA_WRITE_ONLY), \
ib_opcode_name(UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(UD_SEND_ONLY), \
ib_opcode_name(UD_SEND_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(CNP))
/*
* Kernel clients may not have setup GRH information
* Set that here.
*/
ibp = to_iport(ibdev, rdma_ah_get_port_num(attr));
ppd = ppd_from_ibp(ibp);
if ((((dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE)) ||
(ppd->lid >= be16_to_cpu(IB_MULTICAST_LID_BASE))) &&
(dlid != be32_to_cpu(OPA_LID_PERMISSIVE)) &&
(dlid != be16_to_cpu(IB_LID_PERMISSIVE)) &&
(!(rdma_ah_get_ah_flags(attr) & IB_AH_GRH))) ||
(rdma_ah_get_make_grd(attr))) {
rdma_ah_set_ah_flags(attr, IB_AH_GRH);
rdma_ah_set_interface_id(attr, OPA_MAKE_ID(dlid));
rdma_ah_set_subnet_prefix(attr, ibp->rvp.gid_prefix);
}
}
/*
* hfi1_check_mcast- Check if the given lid is
* in the OPA multicast range.
*
* The LID might either reside in ah.dlid or might be
* in the GRH of the address handle as DGID if extended
* addresses are in use.
*/
static inline bool hfi1_check_mcast(u32 lid)
{
return ((lid >= opa_get_mcast_base(OPA_MCAST_NR)) &&
(lid != be32_to_cpu(OPA_LID_PERMISSIVE)));
}
#define opa_get_lid(lid, format) \
__opa_get_lid(lid, OPA_PORT_PACKET_FORMAT_##format)
/* Convert a lid to a specific lid space */
static inline u32 __opa_get_lid(u32 lid, u8 format)
{
bool is_mcast = hfi1_check_mcast(lid);
switch (format) {
case OPA_PORT_PACKET_FORMAT_8B:
case OPA_PORT_PACKET_FORMAT_10B:
if (is_mcast)
return (lid - opa_get_mcast_base(OPA_MCAST_NR) +
0xF0000);
return lid & 0xFFFFF;
case OPA_PORT_PACKET_FORMAT_16B:
if (is_mcast)
return (lid - opa_get_mcast_base(OPA_MCAST_NR) +
0xF00000);
return lid & 0xFFFFFF;
case OPA_PORT_PACKET_FORMAT_9B:
if (is_mcast)
return (lid -
opa_get_mcast_base(OPA_MCAST_NR) +
be16_to_cpu(IB_MULTICAST_LID_BASE));
else
return lid & 0xFFFF;
default:
return lid;
}
}
/* Return true if the given lid is the OPA 16B multicast range */
static inline bool hfi1_is_16B_mcast(u32 lid)
{
return ((lid >=
opa_get_lid(opa_get_mcast_base(OPA_MCAST_NR), 16B)) &&
(lid != opa_get_lid(be32_to_cpu(OPA_LID_PERMISSIVE), 16B)));
}
static inline void hfi1_make_opa_lid(struct rdma_ah_attr *attr)
{
const struct ib_global_route *grh = rdma_ah_read_grh(attr);
u32 dlid = rdma_ah_get_dlid(attr);
/* Modify ah_attr.dlid to be in the 32 bit LID space.
* This is how the address will be laid out:
* Assuming MCAST_NR to be 4,
* 32 bit permissive LID = 0xFFFFFFFF
* Multicast LID range = 0xFFFFFFFE to 0xF0000000
* Unicast LID range = 0xEFFFFFFF to 1
* Invalid LID = 0
*/
if (ib_is_opa_gid(&grh->dgid))
dlid = opa_get_lid_from_gid(&grh->dgid);
else if ((dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE)) &&
(dlid != be16_to_cpu(IB_LID_PERMISSIVE)) &&
(dlid != be32_to_cpu(OPA_LID_PERMISSIVE)))
dlid = dlid - be16_to_cpu(IB_MULTICAST_LID_BASE) +
opa_get_mcast_base(OPA_MCAST_NR);
else if (dlid == be16_to_cpu(IB_LID_PERMISSIVE))
dlid = be32_to_cpu(OPA_LID_PERMISSIVE);
rdma_ah_set_dlid(attr, dlid);
}
static inline u8 hfi1_get_packet_type(u32 lid)
{
/* 9B if lid > 0xF0000000 */
if (lid >= opa_get_mcast_base(OPA_MCAST_NR))
return HFI1_PKT_TYPE_9B;
/* 16B if lid > 0xC000 */
if (lid >= opa_get_lid(opa_get_mcast_base(OPA_MCAST_NR), 9B))
return HFI1_PKT_TYPE_16B;
return HFI1_PKT_TYPE_9B;
}
static inline bool hfi1_get_hdr_type(u32 lid, struct rdma_ah_attr *attr)
{
/*
* If there was an incoming 16B packet with permissive
* LIDs, OPA GIDs would have been programmed when those
* packets were received. A 16B packet will have to
* be sent in response to that packet. Return a 16B
* header type if that's the case.
*/
if (rdma_ah_get_dlid(attr) == be32_to_cpu(OPA_LID_PERMISSIVE))
return (ib_is_opa_gid(&rdma_ah_read_grh(attr)->dgid)) ?
HFI1_PKT_TYPE_16B : HFI1_PKT_TYPE_9B;
/*
* Return a 16B header type if either the the destination
* or source lid is extended.
*/
if (hfi1_get_packet_type(rdma_ah_get_dlid(attr)) == HFI1_PKT_TYPE_16B)
return HFI1_PKT_TYPE_16B;
return hfi1_get_packet_type(lid);
}
static inline void hfi1_make_ext_grh(struct hfi1_packet *packet,
struct ib_grh *grh, u32 slid,
u32 dlid)
{
struct hfi1_ibport *ibp = &packet->rcd->ppd->ibport_data;
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
if (!ibp)
return;
grh->hop_limit = 1;
grh->sgid.global.subnet_prefix = ibp->rvp.gid_prefix;
if (slid == opa_get_lid(be32_to_cpu(OPA_LID_PERMISSIVE), 16B))
grh->sgid.global.interface_id =
OPA_MAKE_ID(be32_to_cpu(OPA_LID_PERMISSIVE));
else
grh->sgid.global.interface_id = OPA_MAKE_ID(slid);
/*
* Upper layers (like mad) may compare the dgid in the
* wc that is obtained here with the sgid_index in
* the wr. Since sgid_index in wr is always 0 for
* extended lids, set the dgid here to the default
* IB gid.
*/
grh->dgid.global.subnet_prefix = ibp->rvp.gid_prefix;
grh->dgid.global.interface_id =
cpu_to_be64(ppd->guids[HFI1_PORT_GUID_INDEX]);
}
static inline int hfi1_get_16b_padding(u32 hdr_size, u32 payload)
{
return -(hdr_size + payload + (SIZE_OF_CRC << 2) +
SIZE_OF_LT) & 0x7;
}
static inline void hfi1_make_ib_hdr(struct ib_header *hdr,
u16 lrh0, u16 len,
u16 dlid, u16 slid)
{
hdr->lrh[0] = cpu_to_be16(lrh0);
hdr->lrh[1] = cpu_to_be16(dlid);
hdr->lrh[2] = cpu_to_be16(len);
hdr->lrh[3] = cpu_to_be16(slid);
}
static inline void hfi1_make_16b_hdr(struct hfi1_16b_header *hdr,
u32 slid, u32 dlid,
u16 len, u16 pkey,
u8 becn, u8 fecn, u8 l4,
u8 sc)
{
u32 lrh0 = 0;
u32 lrh1 = 0x40000000;
u32 lrh2 = 0;
u32 lrh3 = 0;
lrh0 = (lrh0 & ~OPA_16B_BECN_MASK) | (becn << OPA_16B_BECN_SHIFT);
lrh0 = (lrh0 & ~OPA_16B_LEN_MASK) | (len << OPA_16B_LEN_SHIFT);
lrh0 = (lrh0 & ~OPA_16B_LID_MASK) | (slid & OPA_16B_LID_MASK);
lrh1 = (lrh1 & ~OPA_16B_FECN_MASK) | (fecn << OPA_16B_FECN_SHIFT);
lrh1 = (lrh1 & ~OPA_16B_SC_MASK) | (sc << OPA_16B_SC_SHIFT);
lrh1 = (lrh1 & ~OPA_16B_LID_MASK) | (dlid & OPA_16B_LID_MASK);
lrh2 = (lrh2 & ~OPA_16B_SLID_MASK) |
((slid >> OPA_16B_SLID_SHIFT) << OPA_16B_SLID_HIGH_SHIFT);
lrh2 = (lrh2 & ~OPA_16B_DLID_MASK) |
((dlid >> OPA_16B_DLID_SHIFT) << OPA_16B_DLID_HIGH_SHIFT);
lrh2 = (lrh2 & ~OPA_16B_PKEY_MASK) | (pkey << OPA_16B_PKEY_SHIFT);
lrh2 = (lrh2 & ~OPA_16B_L4_MASK) | l4;
hdr->lrh[0] = lrh0;
hdr->lrh[1] = lrh1;
hdr->lrh[2] = lrh2;
hdr->lrh[3] = lrh3;
}
#endif /* _HFI1_KERNEL_H */

View File

@ -67,6 +67,7 @@
#include "aspm.h"
#include "affinity.h"
#include "vnic.h"
#include "exp_rcv.h"
#undef pr_fmt
#define pr_fmt(fmt) DRIVER_NAME ": " fmt
@ -125,85 +126,198 @@ static struct idr hfi1_unit_table;
u32 hfi1_cpulist_count;
unsigned long *hfi1_cpulist;
/*
* Common code for creating the receive context array.
*/
int hfi1_create_ctxts(struct hfi1_devdata *dd)
static int hfi1_create_kctxt(struct hfi1_devdata *dd,
struct hfi1_pportdata *ppd)
{
unsigned i;
struct hfi1_ctxtdata *rcd;
int ret;
/* Control context has to be always 0 */
BUILD_BUG_ON(HFI1_CTRL_CTXT != 0);
dd->rcd = kzalloc_node(dd->num_rcv_contexts * sizeof(*dd->rcd),
GFP_KERNEL, dd->node);
if (!dd->rcd)
goto nomem;
/* create one or more kernel contexts */
for (i = 0; i < dd->first_dyn_alloc_ctxt; ++i) {
struct hfi1_pportdata *ppd;
struct hfi1_ctxtdata *rcd;
ppd = dd->pport + (i % dd->num_pports);
/* dd->rcd[i] gets assigned inside the callee */
rcd = hfi1_create_ctxtdata(ppd, i, dd->node);
if (!rcd) {
dd_dev_err(dd,
"Unable to allocate kernel receive context, failing\n");
goto nomem;
}
/*
* Set up the kernel context flags here and now because they
* use default values for all receive side memories. User
* contexts will be handled as they are created.
*/
rcd->flags = HFI1_CAP_KGET(MULTI_PKT_EGR) |
HFI1_CAP_KGET(NODROP_RHQ_FULL) |
HFI1_CAP_KGET(NODROP_EGR_FULL) |
HFI1_CAP_KGET(DMA_RTAIL);
/* Control context must use DMA_RTAIL */
if (rcd->ctxt == HFI1_CTRL_CTXT)
rcd->flags |= HFI1_CAP_DMA_RTAIL;
rcd->seq_cnt = 1;
rcd->sc = sc_alloc(dd, SC_ACK, rcd->rcvhdrqentsize, dd->node);
if (!rcd->sc) {
dd_dev_err(dd,
"Unable to allocate kernel send context, failing\n");
goto nomem;
}
hfi1_init_ctxt(rcd->sc);
ret = hfi1_create_ctxtdata(ppd, dd->node, &rcd);
if (ret < 0) {
dd_dev_err(dd, "Kernel receive context allocation failed\n");
return ret;
}
/*
* Initialize aspm, to be done after gen3 transition and setting up
* contexts and before enabling interrupts
* Set up the kernel context flags here and now because they use
* default values for all receive side memories. User contexts will
* be handled as they are created.
*/
aspm_init(dd);
rcd->flags = HFI1_CAP_KGET(MULTI_PKT_EGR) |
HFI1_CAP_KGET(NODROP_RHQ_FULL) |
HFI1_CAP_KGET(NODROP_EGR_FULL) |
HFI1_CAP_KGET(DMA_RTAIL);
/* Control context must use DMA_RTAIL */
if (rcd->ctxt == HFI1_CTRL_CTXT)
rcd->flags |= HFI1_CAP_DMA_RTAIL;
rcd->seq_cnt = 1;
rcd->sc = sc_alloc(dd, SC_ACK, rcd->rcvhdrqentsize, dd->node);
if (!rcd->sc) {
dd_dev_err(dd, "Kernel send context allocation failed\n");
return -ENOMEM;
}
hfi1_init_ctxt(rcd->sc);
return 0;
nomem:
ret = -ENOMEM;
}
if (dd->rcd) {
for (i = 0; i < dd->num_rcv_contexts; ++i)
hfi1_free_ctxtdata(dd, dd->rcd[i]);
/*
* Create the receive context array and one or more kernel contexts
*/
int hfi1_create_kctxts(struct hfi1_devdata *dd)
{
u16 i;
int ret;
dd->rcd = kzalloc_node(dd->num_rcv_contexts * sizeof(*dd->rcd),
GFP_KERNEL, dd->node);
if (!dd->rcd)
return -ENOMEM;
for (i = 0; i < dd->first_dyn_alloc_ctxt; ++i) {
ret = hfi1_create_kctxt(dd, dd->pport);
if (ret)
goto bail;
}
return 0;
bail:
for (i = 0; dd->rcd && i < dd->first_dyn_alloc_ctxt; ++i)
hfi1_free_ctxt(dd->rcd[i]);
/* All the contexts should be freed, free the array */
kfree(dd->rcd);
dd->rcd = NULL;
return ret;
}
/*
* Common code for user and kernel context setup.
* Helper routines for the receive context reference count (rcd and uctxt).
*/
struct hfi1_ctxtdata *hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, u32 ctxt,
int numa)
static void hfi1_rcd_init(struct hfi1_ctxtdata *rcd)
{
kref_init(&rcd->kref);
}
/**
* hfi1_rcd_free - When reference is zero clean up.
* @kref: pointer to an initialized rcd data structure
*
*/
static void hfi1_rcd_free(struct kref *kref)
{
unsigned long flags;
struct hfi1_ctxtdata *rcd =
container_of(kref, struct hfi1_ctxtdata, kref);
hfi1_free_ctxtdata(rcd->dd, rcd);
spin_lock_irqsave(&rcd->dd->uctxt_lock, flags);
rcd->dd->rcd[rcd->ctxt] = NULL;
spin_unlock_irqrestore(&rcd->dd->uctxt_lock, flags);
kfree(rcd);
}
/**
* hfi1_rcd_put - decrement reference for rcd
* @rcd: pointer to an initialized rcd data structure
*
* Use this to put a reference after the init.
*/
int hfi1_rcd_put(struct hfi1_ctxtdata *rcd)
{
if (rcd)
return kref_put(&rcd->kref, hfi1_rcd_free);
return 0;
}
/**
* hfi1_rcd_get - increment reference for rcd
* @rcd: pointer to an initialized rcd data structure
*
* Use this to get a reference after the init.
*/
void hfi1_rcd_get(struct hfi1_ctxtdata *rcd)
{
kref_get(&rcd->kref);
}
/**
* allocate_rcd_index - allocate an rcd index from the rcd array
* @dd: pointer to a valid devdata structure
* @rcd: rcd data structure to assign
* @index: pointer to index that is allocated
*
* Find an empty index in the rcd array, and assign the given rcd to it.
* If the array is full, we are EBUSY.
*
*/
static int allocate_rcd_index(struct hfi1_devdata *dd,
struct hfi1_ctxtdata *rcd, u16 *index)
{
unsigned long flags;
u16 ctxt;
spin_lock_irqsave(&dd->uctxt_lock, flags);
for (ctxt = 0; ctxt < dd->num_rcv_contexts; ctxt++)
if (!dd->rcd[ctxt])
break;
if (ctxt < dd->num_rcv_contexts) {
rcd->ctxt = ctxt;
dd->rcd[ctxt] = rcd;
hfi1_rcd_init(rcd);
}
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
if (ctxt >= dd->num_rcv_contexts)
return -EBUSY;
*index = ctxt;
return 0;
}
/**
* hfi1_rcd_get_by_index
* @dd: pointer to a valid devdata structure
* @ctxt: the index of an possilbe rcd
*
* We need to protect access to the rcd array. If access is needed to
* one or more index, get the protecting spinlock and then increment the
* kref.
*
* The caller is responsible for making the _put().
*
*/
struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt)
{
unsigned long flags;
struct hfi1_ctxtdata *rcd = NULL;
spin_lock_irqsave(&dd->uctxt_lock, flags);
if (dd->rcd[ctxt]) {
rcd = dd->rcd[ctxt];
hfi1_rcd_get(rcd);
}
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
return rcd;
}
/*
* Common code for user and kernel context create and setup.
* NOTE: the initial kref is done here (hf1_rcd_init()).
*/
int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
struct hfi1_ctxtdata **context)
{
struct hfi1_devdata *dd = ppd->dd;
struct hfi1_ctxtdata *rcd;
@ -217,20 +331,30 @@ struct hfi1_ctxtdata *hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, u32 ctxt,
rcd = kzalloc_node(sizeof(*rcd), GFP_KERNEL, numa);
if (rcd) {
u32 rcvtids, max_entries;
u16 ctxt;
int ret;
hfi1_cdbg(PROC, "setting up context %u\n", ctxt);
ret = allocate_rcd_index(dd, rcd, &ctxt);
if (ret) {
*context = NULL;
kfree(rcd);
return ret;
}
INIT_LIST_HEAD(&rcd->qp_wait_list);
hfi1_exp_tid_group_init(&rcd->tid_group_list);
hfi1_exp_tid_group_init(&rcd->tid_used_list);
hfi1_exp_tid_group_init(&rcd->tid_full_list);
rcd->ppd = ppd;
rcd->dd = dd;
__set_bit(0, rcd->in_use_ctxts);
rcd->ctxt = ctxt;
dd->rcd[ctxt] = rcd;
rcd->numa_id = numa;
rcd->rcv_array_groups = dd->rcv_entries.ngroups;
mutex_init(&rcd->exp_lock);
hfi1_cdbg(PROC, "setting up context %u\n", rcd->ctxt);
/*
* Calculate the context's RcvArray entry starting point.
* We do this here because we have to take into account all
@ -328,14 +452,30 @@ struct hfi1_ctxtdata *hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, u32 ctxt,
if (!rcd->opstats)
goto bail;
}
*context = rcd;
return 0;
}
return rcd;
bail:
dd->rcd[ctxt] = NULL;
kfree(rcd->egrbufs.rcvtids);
kfree(rcd->egrbufs.buffers);
kfree(rcd);
return NULL;
*context = NULL;
hfi1_free_ctxt(rcd);
return -ENOMEM;
}
/**
* hfi1_free_ctxt
* @rcd: pointer to an initialized rcd data structure
*
* This wrapper is the free function that matches hfi1_create_ctxtdata().
* When a context is done being used (kernel or user), this function is called
* for the "final" put to match the kref init from hf1i_create_ctxtdata().
* Other users of the context do a get/put sequence to make sure that the
* structure isn't removed while in use.
*/
void hfi1_free_ctxt(struct hfi1_ctxtdata *rcd)
{
hfi1_rcd_put(rcd);
}
/*
@ -483,7 +623,6 @@ void hfi1_init_pportdata(struct pci_dev *pdev, struct hfi1_pportdata *ppd,
ppd->pkeys[default_pkey_idx] = DEFAULT_P_KEY;
ppd->part_enforce |= HFI1_PART_ENFORCE_IN;
ppd->part_enforce |= HFI1_PART_ENFORCE_OUT;
if (loopback) {
hfi1_early_err(&pdev->dev,
@ -559,16 +698,19 @@ static int loadtime_init(struct hfi1_devdata *dd)
static int init_after_reset(struct hfi1_devdata *dd)
{
int i;
struct hfi1_ctxtdata *rcd;
/*
* Ensure chip does no sends or receives, tail updates, or
* pioavail updates while we re-initialize. This is mostly
* for the driver data structures, not chip registers.
*/
for (i = 0; i < dd->num_rcv_contexts; i++)
for (i = 0; i < dd->num_rcv_contexts; i++) {
rcd = hfi1_rcd_get_by_index(dd, i);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_DIS |
HFI1_RCVCTRL_INTRAVAIL_DIS |
HFI1_RCVCTRL_TAILUPD_DIS, i);
HFI1_RCVCTRL_INTRAVAIL_DIS |
HFI1_RCVCTRL_TAILUPD_DIS, rcd);
hfi1_rcd_put(rcd);
}
pio_send_control(dd, PSC_GLOBAL_DISABLE);
for (i = 0; i < dd->num_send_contexts; i++)
sc_disable(dd->send_contexts[i].sc);
@ -578,8 +720,9 @@ static int init_after_reset(struct hfi1_devdata *dd)
static void enable_chip(struct hfi1_devdata *dd)
{
struct hfi1_ctxtdata *rcd;
u32 rcvmask;
u32 i;
u16 i;
/* enable PIO send */
pio_send_control(dd, PSC_GLOBAL_ENABLE);
@ -589,17 +732,21 @@ static void enable_chip(struct hfi1_devdata *dd)
* Other ctxts done as user opens and initializes them.
*/
for (i = 0; i < dd->first_dyn_alloc_ctxt; ++i) {
rcd = hfi1_rcd_get_by_index(dd, i);
if (!rcd)
continue;
rcvmask = HFI1_RCVCTRL_CTXT_ENB | HFI1_RCVCTRL_INTRAVAIL_ENB;
rcvmask |= HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, DMA_RTAIL) ?
rcvmask |= HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL) ?
HFI1_RCVCTRL_TAILUPD_ENB : HFI1_RCVCTRL_TAILUPD_DIS;
if (!HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, MULTI_PKT_EGR))
if (!HFI1_CAP_KGET_MASK(rcd->flags, MULTI_PKT_EGR))
rcvmask |= HFI1_RCVCTRL_ONE_PKT_EGR_ENB;
if (HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, NODROP_RHQ_FULL))
if (HFI1_CAP_KGET_MASK(rcd->flags, NODROP_RHQ_FULL))
rcvmask |= HFI1_RCVCTRL_NO_RHQ_DROP_ENB;
if (HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, NODROP_EGR_FULL))
if (HFI1_CAP_KGET_MASK(rcd->flags, NODROP_EGR_FULL))
rcvmask |= HFI1_RCVCTRL_NO_EGR_DROP_ENB;
hfi1_rcvctrl(dd, rcvmask, i);
sc_enable(dd->rcd[i]->sc);
hfi1_rcvctrl(dd, rcvmask, rcd);
sc_enable(rcd->sc);
hfi1_rcd_put(rcd);
}
}
@ -624,6 +771,20 @@ static int create_workqueues(struct hfi1_devdata *dd)
if (!ppd->hfi1_wq)
goto wq_error;
}
if (!ppd->link_wq) {
/*
* Make the link workqueue single-threaded to enforce
* serialization.
*/
ppd->link_wq =
alloc_workqueue(
"hfi_link_%d_%d",
WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND,
1, /* max_active */
dd->unit, pidx);
if (!ppd->link_wq)
goto wq_error;
}
}
return 0;
wq_error:
@ -634,6 +795,10 @@ wq_error:
destroy_workqueue(ppd->hfi1_wq);
ppd->hfi1_wq = NULL;
}
if (ppd->link_wq) {
destroy_workqueue(ppd->link_wq);
ppd->link_wq = NULL;
}
}
return -ENOMEM;
}
@ -656,7 +821,8 @@ wq_error:
int hfi1_init(struct hfi1_devdata *dd, int reinit)
{
int ret = 0, pidx, lastfail = 0;
unsigned i, len;
unsigned long len;
u16 i;
struct hfi1_ctxtdata *rcd;
struct hfi1_pportdata *ppd;
@ -725,7 +891,7 @@ int hfi1_init(struct hfi1_devdata *dd, int reinit)
* existing, and re-allocate.
* Need to re-create rest of ctxt 0 ctxtdata as well.
*/
rcd = dd->rcd[i];
rcd = hfi1_rcd_get_by_index(dd, i);
if (!rcd)
continue;
@ -739,6 +905,7 @@ int hfi1_init(struct hfi1_devdata *dd, int reinit)
"failed to allocate kernel ctxt's rcvhdrq and/or egr bufs\n");
ret = lastfail;
}
hfi1_rcd_put(rcd);
}
/* Allocate enough memory for user event notification. */
@ -858,6 +1025,7 @@ static void stop_timers(struct hfi1_devdata *dd)
static void shutdown_device(struct hfi1_devdata *dd)
{
struct hfi1_pportdata *ppd;
struct hfi1_ctxtdata *rcd;
unsigned pidx;
int i;
@ -876,12 +1044,15 @@ static void shutdown_device(struct hfi1_devdata *dd)
for (pidx = 0; pidx < dd->num_pports; ++pidx) {
ppd = dd->pport + pidx;
for (i = 0; i < dd->num_rcv_contexts; i++)
for (i = 0; i < dd->num_rcv_contexts; i++) {
rcd = hfi1_rcd_get_by_index(dd, i);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_TAILUPD_DIS |
HFI1_RCVCTRL_CTXT_DIS |
HFI1_RCVCTRL_INTRAVAIL_DIS |
HFI1_RCVCTRL_PKEY_DIS |
HFI1_RCVCTRL_ONE_PKT_EGR_DIS, i);
HFI1_RCVCTRL_CTXT_DIS |
HFI1_RCVCTRL_INTRAVAIL_DIS |
HFI1_RCVCTRL_PKEY_DIS |
HFI1_RCVCTRL_ONE_PKT_EGR_DIS, rcd);
hfi1_rcd_put(rcd);
}
/*
* Gracefully stop all sends allowing any in progress to
* trickle out first.
@ -917,6 +1088,10 @@ static void shutdown_device(struct hfi1_devdata *dd)
destroy_workqueue(ppd->hfi1_wq);
ppd->hfi1_wq = NULL;
}
if (ppd->link_wq) {
destroy_workqueue(ppd->link_wq);
ppd->link_wq = NULL;
}
}
sdma_exit(dd);
}
@ -927,14 +1102,11 @@ static void shutdown_device(struct hfi1_devdata *dd)
* @rcd: the ctxtdata structure
*
* free up any allocated data for a context
* This should not touch anything that would affect a simultaneous
* re-allocation of context data, because it is called after hfi1_mutex
* is released (and can be called from reinit as well).
* It should never change any chip state, or global driver state.
*/
void hfi1_free_ctxtdata(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
{
unsigned e;
u32 e;
if (!rcd)
return;
@ -953,6 +1125,7 @@ void hfi1_free_ctxtdata(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
/* all the RcvArray entries should have been cleared by now */
kfree(rcd->egrbufs.rcvtids);
rcd->egrbufs.rcvtids = NULL;
for (e = 0; e < rcd->egrbufs.alloced; e++) {
if (rcd->egrbufs.buffers[e].dma)
@ -962,13 +1135,21 @@ void hfi1_free_ctxtdata(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
rcd->egrbufs.buffers[e].dma);
}
kfree(rcd->egrbufs.buffers);
rcd->egrbufs.alloced = 0;
rcd->egrbufs.buffers = NULL;
sc_free(rcd->sc);
rcd->sc = NULL;
vfree(rcd->subctxt_uregbase);
vfree(rcd->subctxt_rcvegrbuf);
vfree(rcd->subctxt_rcvhdr_base);
kfree(rcd->opstats);
kfree(rcd);
rcd->subctxt_uregbase = NULL;
rcd->subctxt_rcvegrbuf = NULL;
rcd->subctxt_rcvhdr_base = NULL;
rcd->opstats = NULL;
}
/*
@ -1311,8 +1492,6 @@ static void cleanup_device_data(struct hfi1_devdata *dd)
{
int ctxt;
int pidx;
struct hfi1_ctxtdata **tmp;
unsigned long flags;
/* users can't do anything more with chip */
for (pidx = 0; pidx < dd->num_pports; ++pidx) {
@ -1337,18 +1516,6 @@ static void cleanup_device_data(struct hfi1_devdata *dd)
free_credit_return(dd);
/*
* Free any resources still in use (usually just kernel contexts)
* at unload; we do for ctxtcnt, because that's what we allocate.
* We acquire lock to be really paranoid that rcd isn't being
* accessed from some interrupt-related code (that should not happen,
* but best to be sure).
*/
spin_lock_irqsave(&dd->uctxt_lock, flags);
tmp = dd->rcd;
dd->rcd = NULL;
spin_unlock_irqrestore(&dd->uctxt_lock, flags);
if (dd->rcvhdrtail_dummy_kvaddr) {
dma_free_coherent(&dd->pcidev->dev, sizeof(u64),
(void *)dd->rcvhdrtail_dummy_kvaddr,
@ -1356,16 +1523,22 @@ static void cleanup_device_data(struct hfi1_devdata *dd)
dd->rcvhdrtail_dummy_kvaddr = NULL;
}
for (ctxt = 0; tmp && ctxt < dd->num_rcv_contexts; ctxt++) {
struct hfi1_ctxtdata *rcd = tmp[ctxt];
/*
* Free any resources still in use (usually just kernel contexts)
* at unload; we do for ctxtcnt, because that's what we allocate.
*/
for (ctxt = 0; dd->rcd && ctxt < dd->num_rcv_contexts; ctxt++) {
struct hfi1_ctxtdata *rcd = dd->rcd[ctxt];
tmp[ctxt] = NULL; /* debugging paranoia */
if (rcd) {
hfi1_clear_tids(rcd);
hfi1_free_ctxtdata(dd, rcd);
hfi1_free_ctxt(rcd);
}
}
kfree(tmp);
kfree(dd->rcd);
dd->rcd = NULL;
free_pio_map(dd);
/* must follow rcv context free - need to remove rcv's hooks */
for (ctxt = 0; ctxt < dd->num_send_contexts; ctxt++)
@ -1532,6 +1705,10 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
destroy_workqueue(ppd->hfi1_wq);
ppd->hfi1_wq = NULL;
}
if (ppd->link_wq) {
destroy_workqueue(ppd->link_wq);
ppd->link_wq = NULL;
}
}
if (!j)
hfi1_device_remove(dd);

View File

@ -164,6 +164,7 @@ void handle_linkup_change(struct hfi1_devdata *dd, u32 linkup)
ppd->linkup = 0;
/* clear HW details of the previous connection */
ppd->actual_vls_operational = 0;
reset_link_credits(dd);
/* freeze after a link down to guarantee a clean egress */
@ -196,7 +197,7 @@ void handle_user_interrupt(struct hfi1_ctxtdata *rcd)
if (test_and_clear_bit(HFI1_CTXT_WAITING_RCV, &rcd->event_flags)) {
wake_up_interruptible(&rcd->wait);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_INTRAVAIL_DIS, rcd->ctxt);
hfi1_rcvctrl(dd, HFI1_RCVCTRL_INTRAVAIL_DIS, rcd);
} else if (test_and_clear_bit(HFI1_CTXT_WAITING_URG,
&rcd->event_flags)) {
rcd->urgent++;

View File

@ -106,7 +106,9 @@ struct iowait {
struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *tx,
unsigned seq);
uint seq,
bool pkts_sent
);
void (*wakeup)(struct iowait *wait, int reason);
void (*sdma_drained)(struct iowait *wait);
seqlock_t *lock;
@ -118,6 +120,7 @@ struct iowait {
u32 count;
u32 tx_limit;
u32 tx_count;
u8 starved_cnt;
};
#define SDMA_AVAIL_REASON 0
@ -143,7 +146,8 @@ static inline void iowait_init(
struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *tx,
unsigned seq),
uint seq,
bool pkts_sent),
void (*wakeup)(struct iowait *wait, int reason),
void (*sdma_drained)(struct iowait *wait))
{
@ -305,4 +309,66 @@ static inline struct sdma_txreq *iowait_get_txhead(struct iowait *wait)
return tx;
}
/**
* iowait_queue - Put the iowait on a wait queue
* @pkts_sent: have some packets been sent before queuing?
* @w: the iowait struct
* @wait_head: the wait queue
*
* This function is called to insert an iowait struct into a
* wait queue after a resource (eg, sdma decriptor or pio
* buffer) is run out.
*/
static inline void iowait_queue(bool pkts_sent, struct iowait *w,
struct list_head *wait_head)
{
/*
* To play fair, insert the iowait at the tail of the wait queue if it
* has already sent some packets; Otherwise, put it at the head.
*/
if (pkts_sent) {
list_add_tail(&w->list, wait_head);
w->starved_cnt = 0;
} else {
list_add(&w->list, wait_head);
w->starved_cnt++;
}
}
/**
* iowait_starve_clear - clear the wait queue's starve count
* @pkts_sent: have some packets been sent?
* @w: the iowait struct
*
* This function is called to clear the starve count. If no
* packets have been sent, the starve count will not be cleared.
*/
static inline void iowait_starve_clear(bool pkts_sent, struct iowait *w)
{
if (pkts_sent)
w->starved_cnt = 0;
}
/**
* iowait_starve_find_max - Find the maximum of the starve count
* @w: the iowait struct
* @max: a variable containing the max starve count
* @idx: the index of the current iowait in an array
* @max_idx: a variable containing the array index for the
* iowait entry that has the max starve count
*
* This function is called to compare the starve count of a
* given iowait with the given max starve count. The max starve
* count and the index will be updated if the iowait's start
* count is larger.
*/
static inline void iowait_starve_find_max(struct iowait *w, u8 *max,
uint idx, uint *max_idx)
{
if (w->starved_cnt > *max) {
*max = w->starved_cnt;
*max_idx = idx;
}
}
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -115,7 +115,7 @@ struct opa_mad_notice_attr {
__be32 lid; /* LID where change occurred */
__be32 new_cap_mask; /* new capability mask */
__be16 reserved2;
__be16 cap_mask;
__be16 cap_mask3;
__be16 change_flags; /* low 4 bits only */
} __packed ntc_144;
@ -428,5 +428,6 @@ struct sc2vlnt {
COUNTER_MASK(1, 4))
void hfi1_event_pkey_change(struct hfi1_devdata *dd, u8 port);
void hfi1_handle_trap_timer(unsigned long data);
#endif /* _HFI1_MAD_H */

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2016 Intel Corporation.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -169,9 +169,8 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
unsigned long flags;
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
spin_lock_irqsave(&handler->lock, flags);
hfi1_cdbg(MMU, "Inserting node addr 0x%llx, len %u", mnode->addr,
mnode->len);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
ret = -EINVAL;
@ -197,7 +196,7 @@ static struct mmu_rb_node *__mmu_rb_search(struct mmu_rb_handler *handler,
{
struct mmu_rb_node *node = NULL;
hfi1_cdbg(MMU, "Searching for addr 0x%llx, len %u", addr, len);
trace_hfi1_mmu_rb_search(addr, len);
if (!handler->ops->filter) {
node = __mmu_int_rb_iter_first(&handler->root, addr,
(addr + len) - 1);
@ -214,21 +213,27 @@ static struct mmu_rb_node *__mmu_rb_search(struct mmu_rb_handler *handler,
return node;
}
struct mmu_rb_node *hfi1_mmu_rb_extract(struct mmu_rb_handler *handler,
unsigned long addr, unsigned long len)
bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long addr, unsigned long len,
struct mmu_rb_node **rb_node)
{
struct mmu_rb_node *node;
unsigned long flags;
bool ret = false;
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
if (node->addr == addr && node->len == len)
goto unlock;
__mmu_int_rb_remove(node, &handler->root);
list_del(&node->list); /* remove from LRU list */
ret = true;
}
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return node;
*rb_node = node;
return ret;
}
void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
@ -272,8 +277,7 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
unsigned long flags;
/* Validity of handler and node pointers has been checked by caller. */
hfi1_cdbg(MMU, "Removing node addr 0x%llx, len %u", node->addr,
node->len);
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
__mmu_int_rb_remove(node, &handler->root);
list_del(&node->list); /* remove from LRU list */
@ -306,8 +310,7 @@ static void mmu_notifier_mem_invalidate(struct mmu_notifier *mn,
node; node = ptr) {
/* Guard against node removal. */
ptr = __mmu_int_rb_iter_next(node, start, end - 1);
hfi1_cdbg(MMU, "Invalidating node addr 0x%llx, len %u",
node->addr, node->len);
trace_hfi1_mmu_mem_invalidate(node->addr, node->len);
if (handler->ops->invalidate(handler->ops_arg, node)) {
__mmu_int_rb_remove(node, root);
/* move from LRU list to delete list */

View File

@ -81,7 +81,8 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg);
void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
struct mmu_rb_node *mnode);
struct mmu_rb_node *hfi1_mmu_rb_extract(struct mmu_rb_handler *handler,
unsigned long addr, unsigned long len);
bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long addr, unsigned long len,
struct mmu_rb_node **rb_node);
#endif /* _HFI1_MMU_RB_H */

View File

@ -84,7 +84,8 @@ static inline u8 port_states_to_phys_state(struct opa_port_states *ps)
/*
* OPA port physical states
* IB Volume 1, Table 146 PortInfo/IB Volume 2 Section 5.4.2(1) PortPhysState
* values.
* values are the same in OmniPath Architecture. OPA leverages some of the same
* concepts as InfiniBand, but has a few other states as well.
*
* When writing, only values 0-3 are valid, other values are ignored.
* When reading, 0 is reserved.
@ -92,6 +93,8 @@ static inline u8 port_states_to_phys_state(struct opa_port_states *ps)
* Returned by the ibphys_portstate() routine.
*/
enum opa_port_phys_state {
/* Values 0-7 have the same meaning in OPA as in InfiniBand. */
IB_PORTPHYSSTATE_NOP = 0,
/* 1 is reserved */
IB_PORTPHYSSTATE_POLLING = 2,
@ -101,9 +104,23 @@ enum opa_port_phys_state {
IB_PORTPHYSSTATE_LINK_ERROR_RECOVERY = 6,
IB_PORTPHYSSTATE_PHY_TEST = 7,
/* 8 is reserved */
/*
* Offline: Port is quiet (transmitters disabled) due to lack of
* physical media, unsupported media, or transition between link up
* and next link up attempt
*/
OPA_PORTPHYSSTATE_OFFLINE = 9,
OPA_PORTPHYSSTATE_GANGED = 10,
/* 10 is reserved */
/*
* Phy_Test: Specific test patterns are transmitted, and receiver BER
* can be monitored. This facilitates signal integrity testing for the
* physical layer of the port.
*/
OPA_PORTPHYSSTATE_TEST = 11,
OPA_PORTPHYSSTATE_MAX = 11,
/* values 12-15 are reserved/ignored */
};

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -68,7 +68,7 @@
/*
* Code to adjust PCIe capabilities.
*/
static void tune_pcie_caps(struct hfi1_devdata *);
static int tune_pcie_caps(struct hfi1_devdata *);
/*
* Do all the common PCIe setup and initialization.
@ -161,6 +161,7 @@ int hfi1_pcie_ddinit(struct hfi1_devdata *dd, struct pci_dev *pdev)
{
unsigned long len;
resource_size_t addr;
int ret = 0;
dd->pcidev = pdev;
pci_set_drvdata(pdev, dd);
@ -179,47 +180,54 @@ int hfi1_pcie_ddinit(struct hfi1_devdata *dd, struct pci_dev *pdev)
return -EINVAL;
}
dd->kregbase = ioremap_nocache(addr, TXE_PIO_SEND);
if (!dd->kregbase)
dd->kregbase1 = ioremap_nocache(addr, RCV_ARRAY);
if (!dd->kregbase1) {
dd_dev_err(dd, "UC mapping of kregbase1 failed\n");
return -ENOMEM;
}
dd_dev_info(dd, "UC base1: %p for %x\n", dd->kregbase1, RCV_ARRAY);
dd->chip_rcv_array_count = readq(dd->kregbase1 + RCV_ARRAY_CNT);
dd_dev_info(dd, "RcvArray count: %u\n", dd->chip_rcv_array_count);
dd->base2_start = RCV_ARRAY + dd->chip_rcv_array_count * 8;
dd->kregbase2 = ioremap_nocache(
addr + dd->base2_start,
TXE_PIO_SEND - dd->base2_start);
if (!dd->kregbase2) {
dd_dev_err(dd, "UC mapping of kregbase2 failed\n");
goto nomem;
}
dd_dev_info(dd, "UC base2: %p for %x\n", dd->kregbase2,
TXE_PIO_SEND - dd->base2_start);
dd->piobase = ioremap_wc(addr + TXE_PIO_SEND, TXE_PIO_SIZE);
if (!dd->piobase) {
iounmap(dd->kregbase);
return -ENOMEM;
dd_dev_err(dd, "WC mapping of send buffers failed\n");
goto nomem;
}
dd_dev_info(dd, "WC piobase: %p\n for %x", dd->piobase, TXE_PIO_SIZE);
dd->flags |= HFI1_PRESENT; /* now register routines work */
dd->kregend = dd->kregbase + TXE_PIO_SEND;
dd->physaddr = addr; /* used for io_remap, etc. */
/*
* Re-map the chip's RcvArray as write-combining to allow us
* Map the chip's RcvArray as write-combining to allow us
* to write an entire cacheline worth of entries in one shot.
* If this re-map fails, just continue - the RcvArray programming
* function will handle both cases.
*/
dd->chip_rcv_array_count = read_csr(dd, RCV_ARRAY_CNT);
dd->rcvarray_wc = ioremap_wc(addr + RCV_ARRAY,
dd->chip_rcv_array_count * 8);
dd_dev_info(dd, "WC Remapped RcvArray: %p\n", dd->rcvarray_wc);
/*
* Save BARs and command to rewrite after device reset.
*/
pci_read_config_dword(dd->pcidev, PCI_BASE_ADDRESS_0, &dd->pcibar0);
pci_read_config_dword(dd->pcidev, PCI_BASE_ADDRESS_1, &dd->pcibar1);
pci_read_config_dword(dd->pcidev, PCI_ROM_ADDRESS, &dd->pci_rom);
pci_read_config_word(dd->pcidev, PCI_COMMAND, &dd->pci_command);
pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL, &dd->pcie_devctl);
pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL, &dd->pcie_lnkctl);
pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL2,
&dd->pcie_devctl2);
pci_read_config_dword(dd->pcidev, PCI_CFG_MSIX0, &dd->pci_msix0);
pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1, &dd->pci_lnkctl3);
pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2);
if (!dd->rcvarray_wc) {
dd_dev_err(dd, "WC mapping of receive array failed\n");
goto nomem;
}
dd_dev_info(dd, "WC RcvArray: %p for %x\n",
dd->rcvarray_wc, dd->chip_rcv_array_count * 8);
dd->flags |= HFI1_PRESENT; /* chip.c CSR routines now work */
return 0;
nomem:
ret = -ENOMEM;
hfi1_pcie_ddcleanup(dd);
return ret;
}
/*
@ -229,59 +237,19 @@ int hfi1_pcie_ddinit(struct hfi1_devdata *dd, struct pci_dev *pdev)
*/
void hfi1_pcie_ddcleanup(struct hfi1_devdata *dd)
{
u64 __iomem *base = (void __iomem *)dd->kregbase;
dd->flags &= ~HFI1_PRESENT;
dd->kregbase = NULL;
iounmap(base);
if (dd->kregbase1)
iounmap(dd->kregbase1);
dd->kregbase1 = NULL;
if (dd->kregbase2)
iounmap(dd->kregbase2);
dd->kregbase2 = NULL;
if (dd->rcvarray_wc)
iounmap(dd->rcvarray_wc);
dd->rcvarray_wc = NULL;
if (dd->piobase)
iounmap(dd->piobase);
}
static void msix_setup(struct hfi1_devdata *dd, int pos, u32 *msixcnt,
struct hfi1_msix_entry *hfi1_msix_entry)
{
int ret;
int nvec = *msixcnt;
struct msix_entry *msix_entry;
int i;
/*
* We can't pass hfi1_msix_entry array to msix_setup
* so use a dummy msix_entry array and copy the allocated
* irq back to the hfi1_msix_entry array.
*/
msix_entry = kmalloc_array(nvec, sizeof(*msix_entry), GFP_KERNEL);
if (!msix_entry) {
ret = -ENOMEM;
goto do_intx;
}
for (i = 0; i < nvec; i++)
msix_entry[i] = hfi1_msix_entry[i].msix;
ret = pci_enable_msix_range(dd->pcidev, msix_entry, 1, nvec);
if (ret < 0)
goto free_msix_entry;
nvec = ret;
for (i = 0; i < nvec; i++)
hfi1_msix_entry[i].msix = msix_entry[i];
kfree(msix_entry);
*msixcnt = nvec;
return;
free_msix_entry:
kfree(msix_entry);
do_intx:
dd_dev_err(dd, "pci_enable_msix_range %d vectors failed: %d, falling back to INTx\n",
nvec, ret);
*msixcnt = 0;
hfi1_enable_intx(dd->pcidev);
dd->piobase = NULL;
}
/* return the PCIe link speed from the given link status */
@ -314,8 +282,14 @@ static u32 extract_width(u16 linkstat)
static void update_lbus_info(struct hfi1_devdata *dd)
{
u16 linkstat;
int ret;
ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKSTA, &linkstat);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
return;
}
pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKSTA, &linkstat);
dd->lbus_width = extract_width(linkstat);
dd->lbus_speed = extract_speed(linkstat);
snprintf(dd->lbus_info, sizeof(dd->lbus_info),
@ -330,6 +304,7 @@ int pcie_speeds(struct hfi1_devdata *dd)
{
u32 linkcap;
struct pci_dev *parent = dd->pcidev->bus->self;
int ret;
if (!pci_is_pcie(dd->pcidev)) {
dd_dev_err(dd, "Can't find PCI Express capability!\n");
@ -339,7 +314,12 @@ int pcie_speeds(struct hfi1_devdata *dd)
/* find if our max speed is Gen3 and parent supports Gen3 speeds */
dd->link_gen3_capable = 1;
pcie_capability_read_dword(dd->pcidev, PCI_EXP_LNKCAP, &linkcap);
ret = pcie_capability_read_dword(dd->pcidev, PCI_EXP_LNKCAP, &linkcap);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
return ret;
}
if ((linkcap & PCI_EXP_LNKCAP_SLS) != GEN3_SPEED_VECTOR) {
dd_dev_info(dd,
"This HFI is not Gen3 capable, max speed 0x%x, need 0x3\n",
@ -364,49 +344,150 @@ int pcie_speeds(struct hfi1_devdata *dd)
}
/*
* Returns in *nent:
* - actual number of interrupts allocated
* Returns:
* - actual number of interrupts allocated or
* - 0 if fell back to INTx.
* - error
*/
void request_msix(struct hfi1_devdata *dd, u32 *nent,
struct hfi1_msix_entry *entry)
int request_msix(struct hfi1_devdata *dd, u32 msireq)
{
int pos;
int nvec, ret;
pos = dd->pcidev->msix_cap;
if (*nent && pos) {
msix_setup(dd, pos, nent, entry);
/* did it, either MSI-X or INTx */
} else {
*nent = 0;
hfi1_enable_intx(dd->pcidev);
nvec = pci_alloc_irq_vectors(dd->pcidev, 1, msireq,
PCI_IRQ_MSIX | PCI_IRQ_LEGACY);
if (nvec < 0) {
dd_dev_err(dd, "pci_alloc_irq_vectors() failed: %d\n", nvec);
return nvec;
}
tune_pcie_caps(dd);
}
ret = tune_pcie_caps(dd);
if (ret) {
dd_dev_err(dd, "tune_pcie_caps() failed: %d\n", ret);
pci_free_irq_vectors(dd->pcidev);
return ret;
}
void hfi1_enable_intx(struct pci_dev *pdev)
{
/* first, turn on INTx */
pci_intx(pdev, 1);
/* then turn off MSI-X */
pci_disable_msix(pdev);
/* check for legacy IRQ */
if (nvec == 1 && !dd->pcidev->msix_enabled)
return 0;
return nvec;
}
/* restore command and BARs after a reset has wiped them out */
void restore_pci_variables(struct hfi1_devdata *dd)
int restore_pci_variables(struct hfi1_devdata *dd)
{
pci_write_config_word(dd->pcidev, PCI_COMMAND, dd->pci_command);
pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_0, dd->pcibar0);
pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_1, dd->pcibar1);
pci_write_config_dword(dd->pcidev, PCI_ROM_ADDRESS, dd->pci_rom);
pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL, dd->pcie_devctl);
pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL, dd->pcie_lnkctl);
pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL2,
dd->pcie_devctl2);
pci_write_config_dword(dd->pcidev, PCI_CFG_MSIX0, dd->pci_msix0);
pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1, dd->pci_lnkctl3);
pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2);
int ret = 0;
ret = pci_write_config_word(dd->pcidev, PCI_COMMAND, dd->pci_command);
if (ret)
goto error;
ret = pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_0,
dd->pcibar0);
if (ret)
goto error;
ret = pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_1,
dd->pcibar1);
if (ret)
goto error;
ret = pci_write_config_dword(dd->pcidev, PCI_ROM_ADDRESS, dd->pci_rom);
if (ret)
goto error;
ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL,
dd->pcie_devctl);
if (ret)
goto error;
ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL,
dd->pcie_lnkctl);
if (ret)
goto error;
ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL2,
dd->pcie_devctl2);
if (ret)
goto error;
ret = pci_write_config_dword(dd->pcidev, PCI_CFG_MSIX0, dd->pci_msix0);
if (ret)
goto error;
ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
dd->pci_lnkctl3);
if (ret)
goto error;
ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2);
if (ret)
goto error;
return 0;
error:
dd_dev_err(dd, "Unable to write to PCI config\n");
return ret;
}
/* Save BARs and command to rewrite after device reset */
int save_pci_variables(struct hfi1_devdata *dd)
{
int ret = 0;
ret = pci_read_config_dword(dd->pcidev, PCI_BASE_ADDRESS_0,
&dd->pcibar0);
if (ret)
goto error;
ret = pci_read_config_dword(dd->pcidev, PCI_BASE_ADDRESS_1,
&dd->pcibar1);
if (ret)
goto error;
ret = pci_read_config_dword(dd->pcidev, PCI_ROM_ADDRESS, &dd->pci_rom);
if (ret)
goto error;
ret = pci_read_config_word(dd->pcidev, PCI_COMMAND, &dd->pci_command);
if (ret)
goto error;
ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL,
&dd->pcie_devctl);
if (ret)
goto error;
ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL,
&dd->pcie_lnkctl);
if (ret)
goto error;
ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL2,
&dd->pcie_devctl2);
if (ret)
goto error;
ret = pci_read_config_dword(dd->pcidev, PCI_CFG_MSIX0, &dd->pci_msix0);
if (ret)
goto error;
ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
&dd->pci_lnkctl3);
if (ret)
goto error;
ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2);
if (ret)
goto error;
return 0;
error:
dd_dev_err(dd, "Unable to read from PCI config\n");
return ret;
}
/*
@ -421,21 +502,33 @@ uint aspm_mode = ASPM_MODE_DISABLED;
module_param_named(aspm, aspm_mode, uint, S_IRUGO);
MODULE_PARM_DESC(aspm, "PCIe ASPM: 0: disable, 1: enable, 2: dynamic");
static void tune_pcie_caps(struct hfi1_devdata *dd)
static int tune_pcie_caps(struct hfi1_devdata *dd)
{
struct pci_dev *parent;
u16 rc_mpss, rc_mps, ep_mpss, ep_mps;
u16 rc_mrrs, ep_mrrs, max_mrrs, ectl;
int ret;
/*
* Turn on extended tags in DevCtl in case the BIOS has turned it off
* to improve WFR SDMA bandwidth
*/
pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL, &ectl);
ret = pcie_capability_read_word(dd->pcidev,
PCI_EXP_DEVCTL, &ectl);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
return ret;
}
if (!(ectl & PCI_EXP_DEVCTL_EXT_TAG)) {
dd_dev_info(dd, "Enabling PCIe extended tags\n");
ectl |= PCI_EXP_DEVCTL_EXT_TAG;
pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL, ectl);
ret = pcie_capability_write_word(dd->pcidev,
PCI_EXP_DEVCTL, ectl);
if (ret) {
dd_dev_err(dd, "Unable to write to PCI config\n");
return ret;
}
}
/* Find out supported and configured values for parent (root) */
parent = dd->pcidev->bus->self;
@ -444,14 +537,14 @@ static void tune_pcie_caps(struct hfi1_devdata *dd)
* access to the upstream component.
*/
if (!parent)
return;
return -EINVAL;
if (!pci_is_root_bus(parent->bus)) {
dd_dev_info(dd, "Parent not root\n");
return;
return -EINVAL;
}
if (!pci_is_pcie(parent) || !pci_is_pcie(dd->pcidev))
return;
return -EINVAL;
rc_mpss = parent->pcie_mpss;
rc_mps = ffs(pcie_get_mps(parent)) - 8;
/* Find out supported and configured values for endpoint (us) */
@ -497,6 +590,8 @@ static void tune_pcie_caps(struct hfi1_devdata *dd)
ep_mrrs = max_mrrs;
pcie_set_readrq(dd->pcidev, ep_mrrs);
}
return 0;
}
/* End of PCIe capability tuning */
@ -728,6 +823,7 @@ static int load_eq_table(struct hfi1_devdata *dd, const u8 eq[11][3], u8 fs,
u32 violation;
u32 i;
u8 c_minus1, c0, c_plus1;
int ret;
for (i = 0; i < 11; i++) {
/* set index */
@ -739,8 +835,14 @@ static int load_eq_table(struct hfi1_devdata *dd, const u8 eq[11][3], u8 fs,
pci_write_config_dword(pdev, PCIE_CFG_REG_PL102,
eq_value(c_minus1, c0, c_plus1));
/* check if these coefficients violate EQ rules */
pci_read_config_dword(dd->pcidev, PCIE_CFG_REG_PL105,
&violation);
ret = pci_read_config_dword(dd->pcidev,
PCIE_CFG_REG_PL105, &violation);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
hit_error = 1;
break;
}
if (violation
& PCIE_CFG_REG_PL105_GEN3_EQ_VIOLATE_COEF_RULES_SMASK){
if (hit_error == 0) {
@ -1194,7 +1296,13 @@ retry:
* that it is Gen3 capable earlier.
*/
dd_dev_info(dd, "%s: setting parent target link speed\n", __func__);
pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
ret = pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
return_error = 1;
goto done;
}
dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
(u32)lnkctl2);
/* only write to parent if target is not as high as ours */
@ -1203,20 +1311,37 @@ retry:
lnkctl2 |= target_vector;
dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
(u32)lnkctl2);
pcie_capability_write_word(parent, PCI_EXP_LNKCTL2, lnkctl2);
ret = pcie_capability_write_word(parent,
PCI_EXP_LNKCTL2, lnkctl2);
if (ret) {
dd_dev_err(dd, "Unable to write to PCI config\n");
return_error = 1;
goto done;
}
} else {
dd_dev_info(dd, "%s: ..target speed is OK\n", __func__);
}
dd_dev_info(dd, "%s: setting target link speed\n", __func__);
pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2);
ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
return_error = 1;
goto done;
}
dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
(u32)lnkctl2);
lnkctl2 &= ~LNKCTL2_TARGET_LINK_SPEED_MASK;
lnkctl2 |= target_vector;
dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
(u32)lnkctl2);
pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2);
ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2);
if (ret) {
dd_dev_err(dd, "Unable to write to PCI config\n");
return_error = 1;
goto done;
}
/* step 5h: arm gasket logic */
/* hold DC in reset across the SBR */
@ -1266,7 +1391,14 @@ retry:
/* restore PCI space registers we know were reset */
dd_dev_info(dd, "%s: calling restore_pci_variables\n", __func__);
restore_pci_variables(dd);
ret = restore_pci_variables(dd);
if (ret) {
dd_dev_err(dd, "%s: Could not restore PCI variables\n",
__func__);
return_error = 1;
goto done;
}
/* restore firmware control */
write_csr(dd, MISC_CFG_FW_CTRL, fw_ctrl);
@ -1296,7 +1428,13 @@ retry:
setextled(dd, 0);
/* check for any per-lane errors */
pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE2, &reg32);
ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE2, &reg32);
if (ret) {
dd_dev_err(dd, "Unable to read from PCI config\n");
return_error = 1;
goto done;
}
dd_dev_info(dd, "%s: per-lane errors: 0x%x\n", __func__, reg32);
/* extract status, look for our HFI */

View File

@ -1012,7 +1012,7 @@ static void sc_wait_for_packet_egress(struct send_context *sc, int pause)
"%s: context %u(%u) timeout waiting for packets to egress, remaining count %u, bouncing link\n",
__func__, sc->sw_index,
sc->hw_context, (u32)reg);
queue_work(dd->pport->hfi1_wq,
queue_work(dd->pport->link_wq,
&dd->pport->link_bounce_work);
break;
}
@ -1568,7 +1568,8 @@ static void sc_piobufavail(struct send_context *sc)
struct rvt_qp *qp;
struct hfi1_qp_priv *priv;
unsigned long flags;
unsigned i, n = 0;
uint i, n = 0, max_idx = 0;
u8 max_starved_cnt = 0;
if (dd->send_contexts[sc->sw_index].type != SC_KERNEL &&
dd->send_contexts[sc->sw_index].type != SC_VL15)
@ -1591,6 +1592,7 @@ static void sc_piobufavail(struct send_context *sc)
priv = qp->priv;
list_del_init(&priv->s_iowait.list);
priv->s_iowait.lock = NULL;
iowait_starve_find_max(wait, &max_starved_cnt, n, &max_idx);
/* refcount held until actual wake up */
qps[n++] = qp;
}
@ -1605,9 +1607,14 @@ static void sc_piobufavail(struct send_context *sc)
}
write_sequnlock_irqrestore(&dev->iowait_lock, flags);
for (i = 0; i < n; i++)
hfi1_qp_wakeup(qps[i],
/* Wake up the most starved one first */
if (n)
hfi1_qp_wakeup(qps[max_idx],
RVT_S_WAIT_PIO | RVT_S_WAIT_PIO_DRAIN);
for (i = 0; i < n; i++)
if (i != max_idx)
hfi1_qp_wakeup(qps[i],
RVT_S_WAIT_PIO | RVT_S_WAIT_PIO_DRAIN);
}
/* translate a send credit update to a bit code of reasons */

View File

@ -45,10 +45,14 @@
*
*/
#include <linux/firmware.h>
#include "hfi.h"
#include "efivar.h"
#include "eprom.h"
#define DEFAULT_PLATFORM_CONFIG_NAME "hfi1_platform.dat"
static int validate_scratch_checksum(struct hfi1_devdata *dd)
{
u64 checksum = 0, temp_scratch = 0;
@ -58,8 +62,13 @@ static int validate_scratch_checksum(struct hfi1_devdata *dd)
version = (temp_scratch & BITMAP_VERSION_SMASK) >> BITMAP_VERSION_SHIFT;
/* Prevent power on default of all zeroes from passing checksum */
if (!version)
if (!version) {
dd_dev_err(dd, "%s: Config bitmap uninitialized\n", __func__);
dd_dev_err(dd,
"%s: Please update your BIOS to support active channels\n",
__func__);
return 0;
}
/*
* ASIC scratch 0 only contains the checksum and bitmap version as
@ -84,6 +93,8 @@ static int validate_scratch_checksum(struct hfi1_devdata *dd)
if (checksum + temp_scratch == 0xFFFF)
return 1;
dd_dev_err(dd, "%s: Configuration bitmap corrupted\n", __func__);
return 0;
}
@ -131,25 +142,22 @@ static void save_platform_config_fields(struct hfi1_devdata *dd)
ppd->max_power_class = (temp_scratch & QSFP_MAX_POWER_SMASK) >>
QSFP_MAX_POWER_SHIFT;
ppd->config_from_scratch = true;
}
void get_platform_config(struct hfi1_devdata *dd)
{
int ret = 0;
unsigned long size = 0;
u8 *temp_platform_config = NULL;
u32 esize;
const struct firmware *platform_config_file = NULL;
if (is_integrated(dd)) {
if (validate_scratch_checksum(dd)) {
save_platform_config_fields(dd);
return;
}
dd_dev_err(dd, "%s: Config bitmap corrupted/uninitialized\n",
__func__);
dd_dev_err(dd,
"%s: Please update your BIOS to support active channels\n",
__func__);
} else {
ret = eprom_read_platform_config(dd,
(void **)&temp_platform_config,
@ -160,36 +168,37 @@ void get_platform_config(struct hfi1_devdata *dd)
dd->platform_config.size = esize;
return;
}
/* fail, try EFI variable */
ret = read_hfi1_efi_var(dd, "configuration", &size,
(void **)&temp_platform_config);
if (!ret) {
dd->platform_config.data = temp_platform_config;
dd->platform_config.size = size;
return;
}
}
dd_dev_err(dd,
"%s: Failed to get platform config, falling back to sub-optimal default file\n",
__func__);
/* fall back to request firmware */
platform_config_load = 1;
ret = request_firmware(&platform_config_file,
DEFAULT_PLATFORM_CONFIG_NAME,
&dd->pcidev->dev);
if (ret) {
dd_dev_err(dd,
"%s: No default platform config file found\n",
__func__);
return;
}
/*
* Allocate separate memory block to store data and free firmware
* structure. This allows free_platform_config to treat EPROM and
* fallback configs in the same manner.
*/
dd->platform_config.data = kmemdup(platform_config_file->data,
platform_config_file->size,
GFP_KERNEL);
dd->platform_config.size = platform_config_file->size;
release_firmware(platform_config_file);
}
void free_platform_config(struct hfi1_devdata *dd)
{
if (!platform_config_load) {
/*
* was loaded from EFI or the EPROM, release memory
* allocated by read_efi_var/eprom_read_platform_config
*/
kfree(dd->platform_config.data);
}
/*
* else do nothing, dispose_firmware will release
* struct firmware platform_config on driver exit
*/
/* Release memory allocated for eprom or fallback file read. */
kfree(dd->platform_config.data);
}
void get_port_type(struct hfi1_pportdata *ppd)
@ -242,7 +251,7 @@ static int qual_power(struct hfi1_pportdata *ppd)
if (ppd->offline_disabled_reason ==
HFI1_ODR_MASK(OPA_LINKDOWN_REASON_POWER_POLICY)) {
dd_dev_info(
dd_dev_err(
ppd->dd,
"%s: Port disabled due to system power restrictions\n",
__func__);
@ -268,7 +277,7 @@ static int qual_bitrate(struct hfi1_pportdata *ppd)
if (ppd->offline_disabled_reason ==
HFI1_ODR_MASK(OPA_LINKDOWN_REASON_LINKSPEED_POLICY)) {
dd_dev_info(
dd_dev_err(
ppd->dd,
"%s: Cable failed bitrate check, disabling port\n",
__func__);
@ -709,15 +718,15 @@ static void apply_tunings(
ret = load_8051_config(ppd->dd, DC_HOST_COMM_SETTINGS,
GENERAL_CONFIG, config_data);
if (ret != HCMD_SUCCESS)
dd_dev_info(ppd->dd,
"%s: Failed set ext device config params\n",
__func__);
dd_dev_err(ppd->dd,
"%s: Failed set ext device config params\n",
__func__);
}
if (tx_preset_index == OPA_INVALID_INDEX) {
if (ppd->port_type == PORT_TYPE_QSFP && limiting_active)
dd_dev_info(ppd->dd, "%s: Invalid Tx preset index\n",
__func__);
dd_dev_err(ppd->dd, "%s: Invalid Tx preset index\n",
__func__);
return;
}
@ -900,7 +909,7 @@ static int tune_qsfp(struct hfi1_pportdata *ppd,
case 0xD: /* fallthrough */
case 0xF:
default:
dd_dev_info(ppd->dd, "%s: Unknown/unsupported cable\n",
dd_dev_warn(ppd->dd, "%s: Unknown/unsupported cable\n",
__func__);
break;
}
@ -935,6 +944,21 @@ void tune_serdes(struct hfi1_pportdata *ppd)
if (loopback != LOOPBACK_NONE ||
ppd->dd->icode == ICODE_FUNCTIONAL_SIMULATOR) {
ppd->driver_link_ready = 1;
if (qsfp_mod_present(ppd)) {
ret = acquire_chip_resource(ppd->dd,
qsfp_resource(ppd->dd),
QSFP_WAIT);
if (ret) {
dd_dev_err(ppd->dd, "%s: hfi%d: cannot lock i2c chain\n",
__func__, (int)ppd->dd->hfi1_id);
goto bail;
}
refresh_qsfp_cache(ppd, &ppd->qsfp_info);
release_chip_resource(ppd->dd, qsfp_resource(ppd->dd));
}
return;
}
@ -942,7 +966,7 @@ void tune_serdes(struct hfi1_pportdata *ppd)
case PORT_TYPE_DISCONNECTED:
ppd->offline_disabled_reason =
HFI1_ODR_MASK(OPA_LINKDOWN_REASON_DISCONNECTED);
dd_dev_info(dd, "%s: Port disconnected, disabling port\n",
dd_dev_warn(dd, "%s: Port disconnected, disabling port\n",
__func__);
goto bail;
case PORT_TYPE_FIXED:
@ -1027,7 +1051,7 @@ void tune_serdes(struct hfi1_pportdata *ppd)
}
break;
default:
dd_dev_info(ppd->dd, "%s: Unknown port type\n", __func__);
dd_dev_warn(ppd->dd, "%s: Unknown port type\n", __func__);
ppd->port_type = PORT_TYPE_UNKNOWN;
tuning_method = OPA_UNKNOWN_TUNING;
total_atten = 0;

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -68,17 +68,12 @@ static int iowait_sleep(
struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *stx,
unsigned seq);
unsigned int seq,
bool pkts_sent);
static void iowait_wakeup(struct iowait *wait, int reason);
static void iowait_sdma_drained(struct iowait *wait);
static void qp_pio_drain(struct rvt_qp *qp);
static inline unsigned mk_qpn(struct rvt_qpn_table *qpt,
struct rvt_qpn_map *map, unsigned off)
{
return (map - qpt->map) * RVT_BITS_PER_PAGE + off;
}
const struct rvt_operation_params hfi1_post_parms[RVT_OPERATION_MAX] = {
[IB_WR_RDMA_WRITE] = {
.length = sizeof(struct ib_rdma_wr),
@ -237,6 +232,31 @@ int hfi1_check_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr,
return 0;
}
/*
* qp_set_16b - Set the hdr_type based on whether the slid or the
* dlid in the connection is extended. Only applicable for RC and UC
* QPs. UD QPs determine this on the fly from the ah in the wqe
*/
static inline void qp_set_16b(struct rvt_qp *qp)
{
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
struct hfi1_qp_priv *priv = qp->priv;
/* Update ah_attr to account for extended LIDs */
hfi1_update_ah_attr(qp->ibqp.device, &qp->remote_ah_attr);
/* Create 32 bit LIDs */
hfi1_make_opa_lid(&qp->remote_ah_attr);
if (!(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH))
return;
ibp = to_iport(qp->ibqp.device, qp->port_num);
ppd = ppd_from_ibp(ibp);
priv->hdr_type = hfi1_get_hdr_type(ppd->lid, &qp->remote_ah_attr);
}
void hfi1_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr,
int attr_mask, struct ib_udata *udata)
{
@ -247,6 +267,7 @@ void hfi1_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr,
priv->s_sc = ah_to_sc(ibqp->device, &qp->remote_ah_attr);
priv->s_sde = qp_to_sdma_engine(qp, priv->s_sc);
priv->s_sendcontext = qp_to_send_context(qp, priv->s_sc);
qp_set_16b(qp);
}
if (attr_mask & IB_QP_PATH_MIG_STATE &&
@ -256,6 +277,7 @@ void hfi1_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr,
priv->s_sc = ah_to_sc(ibqp->device, &qp->remote_ah_attr);
priv->s_sde = qp_to_sdma_engine(qp, priv->s_sc);
priv->s_sendcontext = qp_to_send_context(qp, priv->s_sc);
qp_set_16b(qp);
}
}
@ -377,7 +399,8 @@ static int iowait_sleep(
struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *stx,
unsigned seq)
uint seq,
bool pkts_sent)
{
struct verbs_txreq *tx = container_of(stx, struct verbs_txreq, txreq);
struct rvt_qp *qp;
@ -408,7 +431,8 @@ static int iowait_sleep(
ibp->rvp.n_dmawait++;
qp->s_flags |= RVT_S_WAIT_DMA_DESC;
list_add_tail(&priv->s_iowait.list, &sde->dmawait);
iowait_queue(pkts_sent, &priv->s_iowait,
&sde->dmawait);
priv->s_iowait.lock = &dev->iowait_lock;
trace_hfi1_qpsleep(qp, RVT_S_WAIT_DMA_DESC);
rvt_get_qp(qp);
@ -506,82 +530,6 @@ struct send_context *qp_to_send_context(struct rvt_qp *qp, u8 sc5)
sc5);
}
struct qp_iter {
struct hfi1_ibdev *dev;
struct rvt_qp *qp;
int specials;
int n;
};
struct qp_iter *qp_iter_init(struct hfi1_ibdev *dev)
{
struct qp_iter *iter;
iter = kzalloc(sizeof(*iter), GFP_KERNEL);
if (!iter)
return NULL;
iter->dev = dev;
iter->specials = dev->rdi.ibdev.phys_port_cnt * 2;
return iter;
}
int qp_iter_next(struct qp_iter *iter)
{
struct hfi1_ibdev *dev = iter->dev;
int n = iter->n;
int ret = 1;
struct rvt_qp *pqp = iter->qp;
struct rvt_qp *qp;
/*
* The approach is to consider the special qps
* as an additional table entries before the
* real hash table. Since the qp code sets
* the qp->next hash link to NULL, this works just fine.
*
* iter->specials is 2 * # ports
*
* n = 0..iter->specials is the special qp indices
*
* n = iter->specials..dev->rdi.qp_dev->qp_table_size+iter->specials are
* the potential hash bucket entries
*
*/
for (; n < dev->rdi.qp_dev->qp_table_size + iter->specials; n++) {
if (pqp) {
qp = rcu_dereference(pqp->next);
} else {
if (n < iter->specials) {
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
int pidx;
pidx = n % dev->rdi.ibdev.phys_port_cnt;
ppd = &dd_from_dev(dev)->pport[pidx];
ibp = &ppd->ibport_data;
if (!(n & 1))
qp = rcu_dereference(ibp->rvp.qp[0]);
else
qp = rcu_dereference(ibp->rvp.qp[1]);
} else {
qp = rcu_dereference(
dev->rdi.qp_dev->qp_table[
(n - iter->specials)]);
}
}
pqp = qp;
if (qp) {
iter->qp = qp;
iter->n = n;
return 0;
}
}
return ret;
}
static const char * const qp_type_str[] = {
"SMI", "GSI", "RC", "UC", "UD",
};
@ -595,19 +543,27 @@ static int qp_idle(struct rvt_qp *qp)
qp->s_tail == qp->s_head;
}
void qp_iter_print(struct seq_file *s, struct qp_iter *iter)
/**
* qp_iter_print - print the qp information to seq_file
* @s: the seq_file to emit the qp information on
* @iter: the iterator for the qp hash list
*/
void qp_iter_print(struct seq_file *s, struct rvt_qp_iter *iter)
{
struct rvt_swqe *wqe;
struct rvt_qp *qp = iter->qp;
struct hfi1_qp_priv *priv = qp->priv;
struct sdma_engine *sde;
struct send_context *send_context;
struct rvt_ack_entry *e = NULL;
sde = qp_to_sdma_engine(qp, priv->s_sc);
wqe = rvt_get_swqe_ptr(qp, qp->s_last);
send_context = qp_to_send_context(qp, priv->s_sc);
if (qp->s_ack_queue)
e = &qp->s_ack_queue[qp->s_tail_ack_queue];
seq_printf(s,
"N %d %s QP %x R %u %s %u %u %u f=%x %u %u %u %u %u %u SPSN %x %x %x %x %x RPSN %x (%u %u %u %u %u %u %u) RQP %x LID %x SL %u MTU %u %u %u %u %u SDE %p,%u SC %p,%u SCQ %u %u PID %d\n",
"N %d %s QP %x R %u %s %u %u %u f=%x %u %u %u %u %u %u SPSN %x %x %x %x %x RPSN %x S(%u %u %u %u %u %u %u) R(%u %u %u) RQP %x LID %x SL %u MTU %u %u %u %u %u SDE %p,%u SC %p,%u SCQ %u %u PID %d OS %x %x E %x %x %x\n",
iter->n,
qp_idle(qp) ? "I" : "B",
qp->ibqp.qp_num,
@ -630,6 +586,10 @@ void qp_iter_print(struct seq_file *s, struct qp_iter *iter)
qp->s_last, qp->s_acked, qp->s_cur,
qp->s_tail, qp->s_head, qp->s_size,
qp->s_avail,
/* ack_queue ring pointers, size */
qp->s_tail_ack_queue, qp->r_head_ack_queue,
rvt_max_atomic(&to_idev(qp->ibqp.device)->rdi),
/* remote QP info */
qp->remote_qpn,
rdma_ah_get_dlid(&qp->remote_ah_attr),
rdma_ah_get_sl(&qp->remote_ah_attr),
@ -644,7 +604,13 @@ void qp_iter_print(struct seq_file *s, struct qp_iter *iter)
send_context ? send_context->sw_index : 0,
ibcq_to_rvtcq(qp->ibqp.send_cq)->queue->head,
ibcq_to_rvtcq(qp->ibqp.send_cq)->queue->tail,
qp->pid);
qp->pid,
qp->s_state,
qp->s_ack_state,
/* ack queue information */
e ? e->opcode : 0,
e ? e->psn : 0,
e ? e->lpsn : 0);
}
void *qp_priv_alloc(struct rvt_dev_info *rdi, struct rvt_qp *qp)
@ -750,6 +716,7 @@ void hfi1_migrate_qp(struct rvt_qp *qp)
qp->s_flags |= RVT_S_AHG_CLEAR;
priv->s_sc = ah_to_sc(qp->ibqp.device, &qp->remote_ah_attr);
priv->s_sde = qp_to_sdma_engine(qp, priv->s_sc);
qp_set_16b(qp);
ev.device = qp->ibqp.device;
ev.element.qp = &qp->ibqp;
@ -831,6 +798,45 @@ void notify_error_qp(struct rvt_qp *qp)
}
}
/**
* hfi1_qp_iter_cb - callback for iterator
* @qp - the qp
* @v - the sl in low bits of v
*
* This is called from the iterator callback to work
* on an individual qp.
*/
static void hfi1_qp_iter_cb(struct rvt_qp *qp, u64 v)
{
int lastwqe;
struct ib_event ev;
struct hfi1_ibport *ibp =
to_iport(qp->ibqp.device, qp->port_num);
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
u8 sl = (u8)v;
if (qp->port_num != ppd->port ||
(qp->ibqp.qp_type != IB_QPT_UC &&
qp->ibqp.qp_type != IB_QPT_RC) ||
rdma_ah_get_sl(&qp->remote_ah_attr) != sl ||
!(ib_rvt_state_ops[qp->state] & RVT_POST_SEND_OK))
return;
spin_lock_irq(&qp->r_lock);
spin_lock(&qp->s_hlock);
spin_lock(&qp->s_lock);
lastwqe = rvt_error_qp(qp, IB_WC_WR_FLUSH_ERR);
spin_unlock(&qp->s_lock);
spin_unlock(&qp->s_hlock);
spin_unlock_irq(&qp->r_lock);
if (lastwqe) {
ev.device = qp->ibqp.device;
ev.element.qp = &qp->ibqp;
ev.event = IB_EVENT_QP_LAST_WQE_REACHED;
qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
}
}
/**
* hfi1_error_port_qps - put a port's RC/UC qps into error state
* @ibp: the ibport.
@ -842,44 +848,8 @@ void notify_error_qp(struct rvt_qp *qp)
*/
void hfi1_error_port_qps(struct hfi1_ibport *ibp, u8 sl)
{
struct rvt_qp *qp = NULL;
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
struct hfi1_ibdev *dev = &ppd->dd->verbs_dev;
int n;
int lastwqe;
struct ib_event ev;
rcu_read_lock();
/* Deal only with RC/UC qps that use the given SL. */
for (n = 0; n < dev->rdi.qp_dev->qp_table_size; n++) {
for (qp = rcu_dereference(dev->rdi.qp_dev->qp_table[n]); qp;
qp = rcu_dereference(qp->next)) {
if (qp->port_num == ppd->port &&
(qp->ibqp.qp_type == IB_QPT_UC ||
qp->ibqp.qp_type == IB_QPT_RC) &&
rdma_ah_get_sl(&qp->remote_ah_attr) == sl &&
(ib_rvt_state_ops[qp->state] &
RVT_POST_SEND_OK)) {
spin_lock_irq(&qp->r_lock);
spin_lock(&qp->s_hlock);
spin_lock(&qp->s_lock);
lastwqe = rvt_error_qp(qp,
IB_WC_WR_FLUSH_ERR);
spin_unlock(&qp->s_lock);
spin_unlock(&qp->s_hlock);
spin_unlock_irq(&qp->r_lock);
if (lastwqe) {
ev.device = qp->ibqp.device;
ev.element.qp = &qp->ibqp;
ev.event =
IB_EVENT_QP_LAST_WQE_REACHED;
qp->ibqp.event_handler(&ev,
qp->ibqp.qp_context);
}
}
}
}
rcu_read_unlock();
rvt_qp_iter(&dev->rdi, sl, hfi1_qp_iter_cb);
}

View File

@ -1,7 +1,7 @@
#ifndef _QP_H
#define _QP_H
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -94,26 +94,7 @@ void hfi1_qp_wakeup(struct rvt_qp *qp, u32 flag);
struct sdma_engine *qp_to_sdma_engine(struct rvt_qp *qp, u8 sc5);
struct send_context *qp_to_send_context(struct rvt_qp *qp, u8 sc5);
struct qp_iter;
/**
* qp_iter_init - initialize the iterator for the qp hash list
* @dev: the hfi1_ibdev
*/
struct qp_iter *qp_iter_init(struct hfi1_ibdev *dev);
/**
* qp_iter_next - Find the next qp in the hash list
* @iter: the iterator for the qp hash list
*/
int qp_iter_next(struct qp_iter *iter);
/**
* qp_iter_print - print the qp information to seq_file
* @s: the seq_file to emit the qp information on
* @iter: the iterator for the qp hash list
*/
void qp_iter_print(struct seq_file *s, struct qp_iter *iter);
void qp_iter_print(struct seq_file *s, struct rvt_qp_iter *iter);
void _hfi1_schedule_send(struct rvt_qp *qp);
void hfi1_schedule_send(struct rvt_qp *qp);

View File

@ -100,8 +100,12 @@ static int make_rc_ack(struct hfi1_ibdev *dev, struct rvt_qp *qp,
if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_RECV_OK))
goto bail;
/* header size in 32-bit words LRH+BTH = (8+12)/4. */
hwords = 5;
if (priv->hdr_type == HFI1_PKT_TYPE_9B)
/* header size in 32-bit words LRH+BTH = (8+12)/4. */
hwords = 5;
else
/* header size in 32-bit words 16B LRH+BTH = (16+12)/4. */
hwords = 7;
switch (qp->s_ack_state) {
case OP(RDMA_READ_RESPONSE_LAST):
@ -258,8 +262,7 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
struct ib_other_headers *ohdr;
struct rvt_sge_state *ss;
struct rvt_swqe *wqe;
/* header size in 32-bit words LRH+BTH = (8+12)/4. */
u32 hwords = 5;
u32 hwords;
u32 len;
u32 bth0 = 0;
u32 bth2;
@ -273,9 +276,23 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
if (IS_ERR(ps->s_txreq))
goto bail_no_tx;
ohdr = &ps->s_txreq->phdr.hdr.u.oth;
if (rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)
ohdr = &ps->s_txreq->phdr.hdr.u.l.oth;
ps->s_txreq->phdr.hdr.hdr_type = priv->hdr_type;
if (priv->hdr_type == HFI1_PKT_TYPE_9B) {
/* header size in 32-bit words LRH+BTH = (8+12)/4. */
hwords = 5;
if (rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)
ohdr = &ps->s_txreq->phdr.hdr.ibh.u.l.oth;
else
ohdr = &ps->s_txreq->phdr.hdr.ibh.u.oth;
} else {
/* header size in 32-bit words 16B LRH+BTH = (16+12)/4. */
hwords = 7;
if ((rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH) &&
(hfi1_check_mcast(rdma_ah_get_dlid(&qp->remote_ah_attr))))
ohdr = &ps->s_txreq->phdr.hdr.opah.u.l.oth;
else
ohdr = &ps->s_txreq->phdr.hdr.opah.u.oth;
}
/* Sending responses has higher priority over sending requests. */
if ((qp->s_flags & RVT_S_RESP_PENDING) &&
@ -425,7 +442,7 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
case IB_WR_RDMA_WRITE:
if (newreq && !(qp->s_flags & RVT_S_UNLIMITED_CREDIT))
qp->s_lsn++;
/* FALLTHROUGH */
goto no_flow_control;
case IB_WR_RDMA_WRITE_WITH_IMM:
/* If no credit, return. */
if (!(qp->s_flags & RVT_S_UNLIMITED_CREDIT) &&
@ -433,6 +450,7 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
qp->s_flags |= RVT_S_WAIT_SSN_CREDIT;
goto bail;
}
no_flow_control:
put_ib_reth_vaddr(
wqe->rdma_wr.remote_addr,
&ohdr->u.rc.reth);
@ -703,109 +721,27 @@ bail_no_tx:
return 0;
}
/**
* hfi1_send_rc_ack - Construct an ACK packet and send it
* @qp: a pointer to the QP
*
* This is called from hfi1_rc_rcv() and handle_receive_interrupt().
* Note that RDMA reads and atomics are handled in the
* send side QP state and send engine.
*/
void hfi1_send_rc_ack(struct hfi1_ctxtdata *rcd, struct rvt_qp *qp,
int is_fecn)
static inline void hfi1_make_bth_aeth(struct rvt_qp *qp,
struct ib_other_headers *ohdr,
u32 bth0, u32 bth1)
{
struct hfi1_ibport *ibp = rcd_to_iport(rcd);
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
u64 pbc, pbc_flags = 0;
u16 lrh0;
u16 sc5;
u32 bth0;
u32 hwords;
u32 vl, plen;
struct send_context *sc;
struct pio_buf *pbuf;
struct ib_header hdr;
struct ib_other_headers *ohdr;
unsigned long flags;
/* clear the defer count */
qp->r_adefered = 0;
/* Don't send ACK or NAK if a RDMA read or atomic is pending. */
if (qp->s_flags & RVT_S_RESP_PENDING)
goto queue_ack;
/* Ensure s_rdma_ack_cnt changes are committed */
smp_read_barrier_depends();
if (qp->s_rdma_ack_cnt)
goto queue_ack;
/* Construct the header */
/* header size in 32-bit words LRH+BTH+AETH = (8+12+4)/4 */
hwords = 6;
if (unlikely(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)) {
hwords += hfi1_make_grh(ibp, &hdr.u.l.grh,
rdma_ah_read_grh(&qp->remote_ah_attr),
hwords, 0);
ohdr = &hdr.u.l.oth;
lrh0 = HFI1_LRH_GRH;
} else {
ohdr = &hdr.u.oth;
lrh0 = HFI1_LRH_BTH;
}
/* read pkey_index w/o lock (its atomic) */
bth0 = hfi1_get_pkey(ibp, qp->s_pkey_index) | (OP(ACKNOWLEDGE) << 24);
if (qp->s_mig_state == IB_MIG_MIGRATED)
bth0 |= IB_BTH_MIG_REQ;
if (qp->r_nak_state)
ohdr->u.aeth = cpu_to_be32((qp->r_msn & IB_MSN_MASK) |
(qp->r_nak_state <<
IB_AETH_CREDIT_SHIFT));
else
ohdr->u.aeth = rvt_compute_aeth(qp);
sc5 = ibp->sl_to_sc[rdma_ah_get_sl(&qp->remote_ah_attr)];
/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
pbc_flags |= ((!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT);
lrh0 |= (sc5 & 0xf) << 12 | (rdma_ah_get_sl(&qp->remote_ah_attr)
& 0xf) << 4;
hdr.lrh[0] = cpu_to_be16(lrh0);
hdr.lrh[1] = cpu_to_be16(rdma_ah_get_dlid(&qp->remote_ah_attr));
hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
hdr.lrh[3] = cpu_to_be16(ppd->lid |
rdma_ah_get_path_bits(&qp->remote_ah_attr));
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
ohdr->bth[1] |= cpu_to_be32((!!is_fecn) << IB_BECN_SHIFT);
ohdr->bth[1] = cpu_to_be32(bth1 | qp->remote_qpn);
ohdr->bth[2] = cpu_to_be32(mask_psn(qp->r_ack_psn));
}
/* Don't try to send ACKs if the link isn't ACTIVE */
if (driver_lstate(ppd) != IB_PORT_ACTIVE)
return;
static inline void hfi1_queue_rc_ack(struct rvt_qp *qp, bool is_fecn)
{
struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
unsigned long flags;
sc = rcd->sc;
plen = 2 /* PBC */ + hwords;
vl = sc_to_vlt(ppd->dd, sc5);
pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
pbuf = sc_buffer_alloc(sc, plen, NULL, NULL);
if (!pbuf) {
/*
* We have no room to send at the moment. Pass
* responsibility for sending the ACK to the send engine
* so that when enough buffer space becomes available,
* the ACK is sent ahead of other outgoing packets.
*/
goto queue_ack;
}
trace_ack_output_ibhdr(dd_from_ibdev(qp->ibqp.device), &hdr);
/* write the pbc and data */
ppd->dd->pio_inline_send(ppd->dd, pbuf, pbc, &hdr, hwords);
return;
queue_ack:
spin_lock_irqsave(&qp->s_lock, flags);
if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_RECV_OK))
goto unlock;
@ -816,12 +752,194 @@ queue_ack:
if (is_fecn)
qp->s_flags |= RVT_S_ECN;
/* Schedule the send engine. */
/* Schedule the send tasklet. */
hfi1_schedule_send(qp);
unlock:
spin_unlock_irqrestore(&qp->s_lock, flags);
}
static inline void hfi1_make_rc_ack_9B(struct rvt_qp *qp,
struct hfi1_opa_header *opa_hdr,
u8 sc5, bool is_fecn,
u64 *pbc_flags, u32 *hwords,
u32 *nwords)
{
struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
struct ib_header *hdr = &opa_hdr->ibh;
struct ib_other_headers *ohdr;
u16 lrh0 = HFI1_LRH_BTH;
u16 pkey;
u32 bth0, bth1;
opa_hdr->hdr_type = HFI1_PKT_TYPE_9B;
ohdr = &hdr->u.oth;
/* header size in 32-bit words LRH+BTH+AETH = (8+12+4)/4 */
*hwords = 6;
if (unlikely(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)) {
*hwords += hfi1_make_grh(ibp, &hdr->u.l.grh,
rdma_ah_read_grh(&qp->remote_ah_attr),
*hwords - 2, SIZE_OF_CRC);
ohdr = &hdr->u.l.oth;
lrh0 = HFI1_LRH_GRH;
}
/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
*pbc_flags |= ((!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT);
/* read pkey_index w/o lock (its atomic) */
pkey = hfi1_get_pkey(ibp, qp->s_pkey_index);
lrh0 |= (sc5 & IB_SC_MASK) << IB_SC_SHIFT |
(rdma_ah_get_sl(&qp->remote_ah_attr) & IB_SL_MASK) <<
IB_SL_SHIFT;
hfi1_make_ib_hdr(hdr, lrh0, *hwords + SIZE_OF_CRC,
opa_get_lid(rdma_ah_get_dlid(&qp->remote_ah_attr), 9B),
ppd->lid | rdma_ah_get_path_bits(&qp->remote_ah_attr));
bth0 = pkey | (OP(ACKNOWLEDGE) << 24);
if (qp->s_mig_state == IB_MIG_MIGRATED)
bth0 |= IB_BTH_MIG_REQ;
bth1 = (!!is_fecn) << IB_BECN_SHIFT;
hfi1_make_bth_aeth(qp, ohdr, bth0, bth1);
}
static inline void hfi1_make_rc_ack_16B(struct rvt_qp *qp,
struct hfi1_opa_header *opa_hdr,
u8 sc5, bool is_fecn,
u64 *pbc_flags, u32 *hwords,
u32 *nwords)
{
struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
struct hfi1_16b_header *hdr = &opa_hdr->opah;
struct ib_other_headers *ohdr;
u32 bth0, bth1;
u16 len, pkey;
u8 becn = !!is_fecn;
u8 l4 = OPA_16B_L4_IB_LOCAL;
u8 extra_bytes;
opa_hdr->hdr_type = HFI1_PKT_TYPE_16B;
ohdr = &hdr->u.oth;
/* header size in 32-bit words 16B LRH+BTH+AETH = (16+12+4)/4 */
*hwords = 8;
extra_bytes = hfi1_get_16b_padding(*hwords << 2, 0);
*nwords = SIZE_OF_CRC + ((extra_bytes + SIZE_OF_LT) >> 2);
if (unlikely(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH) &&
hfi1_check_mcast(rdma_ah_get_dlid(&qp->remote_ah_attr))) {
*hwords += hfi1_make_grh(ibp, &hdr->u.l.grh,
rdma_ah_read_grh(&qp->remote_ah_attr),
*hwords - 4, *nwords);
ohdr = &hdr->u.l.oth;
l4 = OPA_16B_L4_IB_GLOBAL;
}
*pbc_flags |= PBC_PACKET_BYPASS | PBC_INSERT_BYPASS_ICRC;
/* read pkey_index w/o lock (its atomic) */
pkey = hfi1_get_pkey(ibp, qp->s_pkey_index);
/* Convert dwords to flits */
len = (*hwords + *nwords) >> 1;
hfi1_make_16b_hdr(hdr,
ppd->lid | rdma_ah_get_path_bits(&qp->remote_ah_attr),
opa_get_lid(rdma_ah_get_dlid(&qp->remote_ah_attr),
16B),
len, pkey, becn, 0, l4, sc5);
bth0 = pkey | (OP(ACKNOWLEDGE) << 24);
bth0 |= extra_bytes << 20;
if (qp->s_mig_state == IB_MIG_MIGRATED)
bth1 = OPA_BTH_MIG_REQ;
hfi1_make_bth_aeth(qp, ohdr, bth0, bth1);
}
typedef void (*hfi1_make_rc_ack)(struct rvt_qp *qp,
struct hfi1_opa_header *opa_hdr,
u8 sc5, bool is_fecn,
u64 *pbc_flags, u32 *hwords,
u32 *nwords);
/* We support only two types - 9B and 16B for now */
static const hfi1_make_rc_ack hfi1_make_rc_ack_tbl[2] = {
[HFI1_PKT_TYPE_9B] = &hfi1_make_rc_ack_9B,
[HFI1_PKT_TYPE_16B] = &hfi1_make_rc_ack_16B
};
/**
* hfi1_send_rc_ack - Construct an ACK packet and send it
* @qp: a pointer to the QP
*
* This is called from hfi1_rc_rcv() and handle_receive_interrupt().
* Note that RDMA reads and atomics are handled in the
* send side QP state and send engine.
*/
void hfi1_send_rc_ack(struct hfi1_ctxtdata *rcd,
struct rvt_qp *qp, bool is_fecn)
{
struct hfi1_ibport *ibp = rcd_to_iport(rcd);
struct hfi1_qp_priv *priv = qp->priv;
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
u8 sc5 = ibp->sl_to_sc[rdma_ah_get_sl(&qp->remote_ah_attr)];
u64 pbc, pbc_flags = 0;
u32 hwords = 0;
u32 nwords = 0;
u32 plen;
struct pio_buf *pbuf;
struct hfi1_opa_header opa_hdr;
/* clear the defer count */
qp->r_adefered = 0;
/* Don't send ACK or NAK if a RDMA read or atomic is pending. */
if (qp->s_flags & RVT_S_RESP_PENDING) {
hfi1_queue_rc_ack(qp, is_fecn);
return;
}
/* Ensure s_rdma_ack_cnt changes are committed */
smp_read_barrier_depends();
if (qp->s_rdma_ack_cnt) {
hfi1_queue_rc_ack(qp, is_fecn);
return;
}
/* Don't try to send ACKs if the link isn't ACTIVE */
if (driver_lstate(ppd) != IB_PORT_ACTIVE)
return;
/* Make the appropriate header */
hfi1_make_rc_ack_tbl[priv->hdr_type](qp, &opa_hdr, sc5, is_fecn,
&pbc_flags, &hwords, &nwords);
plen = 2 /* PBC */ + hwords + nwords;
pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps,
sc_to_vlt(ppd->dd, sc5), plen);
pbuf = sc_buffer_alloc(rcd->sc, plen, NULL, NULL);
if (!pbuf) {
/*
* We have no room to send at the moment. Pass
* responsibility for sending the ACK to the send engine
* so that when enough buffer space becomes available,
* the ACK is sent ahead of other outgoing packets.
*/
hfi1_queue_rc_ack(qp, is_fecn);
return;
}
trace_ack_output_ibhdr(dd_from_ibdev(qp->ibqp.device),
&opa_hdr, ib_is_sc5(sc5));
/* write the pbc and data */
ppd->dd->pio_inline_send(ppd->dd, pbuf, pbc,
(priv->hdr_type == HFI1_PKT_TYPE_9B ?
(void *)&opa_hdr.ibh :
(void *)&opa_hdr.opah), hwords);
return;
}
/**
* reset_psn - reset the QP state to send starting from PSN
* @qp: the QP
@ -984,10 +1102,13 @@ static void reset_sending_psn(struct rvt_qp *qp, u32 psn)
/*
* This should be called with the QP s_lock held and interrupts disabled.
*/
void hfi1_rc_send_complete(struct rvt_qp *qp, struct ib_header *hdr)
void hfi1_rc_send_complete(struct rvt_qp *qp, struct hfi1_opa_header *opah)
{
struct ib_other_headers *ohdr;
struct hfi1_qp_priv *priv = qp->priv;
struct rvt_swqe *wqe;
struct ib_header *hdr = NULL;
struct hfi1_16b_header *hdr_16b = NULL;
u32 opcode;
u32 psn;
@ -996,10 +1117,22 @@ void hfi1_rc_send_complete(struct rvt_qp *qp, struct ib_header *hdr)
return;
/* Find out where the BTH is */
if (ib_get_lnh(hdr) == HFI1_LRH_BTH)
ohdr = &hdr->u.oth;
else
ohdr = &hdr->u.l.oth;
if (priv->hdr_type == HFI1_PKT_TYPE_9B) {
hdr = &opah->ibh;
if (ib_get_lnh(hdr) == HFI1_LRH_BTH)
ohdr = &hdr->u.oth;
else
ohdr = &hdr->u.l.oth;
} else {
u8 l4;
hdr_16b = &opah->opah;
l4 = hfi1_16B_get_l4(hdr_16b);
if (l4 == OPA_16B_L4_IB_LOCAL)
ohdr = &hdr_16b->u.oth;
else
ohdr = &hdr_16b->u.l.oth;
}
opcode = ib_bth_get_opcode(ohdr);
if (opcode >= OP(RDMA_READ_RESPONSE_FIRST) &&
@ -1009,7 +1142,7 @@ void hfi1_rc_send_complete(struct rvt_qp *qp, struct ib_header *hdr)
return;
}
psn = be32_to_cpu(ohdr->bth[2]);
psn = ib_bth_get_psn(ohdr);
reset_sending_psn(qp, psn);
/*
@ -1399,36 +1532,34 @@ static void rdma_seq_err(struct rvt_qp *qp, struct hfi1_ibport *ibp, u32 psn,
/**
* rc_rcv_resp - process an incoming RC response packet
* @ibp: the port this packet came in on
* @ohdr: the other headers for this packet
* @data: the packet data
* @tlen: the packet length
* @qp: the QP for this packet
* @opcode: the opcode for this packet
* @psn: the packet sequence number for this packet
* @hdrsize: the header length
* @pmtu: the path MTU
* @packet: data packet information
*
* This is called from hfi1_rc_rcv() to process an incoming RC response
* packet for the given QP.
* Called at interrupt level.
*/
static void rc_rcv_resp(struct hfi1_ibport *ibp,
struct ib_other_headers *ohdr,
void *data, u32 tlen, struct rvt_qp *qp,
u32 opcode, u32 psn, u32 hdrsize, u32 pmtu,
struct hfi1_ctxtdata *rcd)
static void rc_rcv_resp(struct hfi1_packet *packet)
{
struct hfi1_ctxtdata *rcd = packet->rcd;
void *data = packet->payload;
u32 tlen = packet->tlen;
struct rvt_qp *qp = packet->qp;
struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
struct ib_other_headers *ohdr = packet->ohdr;
struct rvt_swqe *wqe;
enum ib_wc_status status;
unsigned long flags;
int diff;
u32 pad;
u32 aeth;
u64 val;
u32 aeth;
u32 psn = ib_bth_get_psn(packet->ohdr);
u32 pmtu = qp->pmtu;
u16 hdrsize = packet->hlen;
u8 opcode = packet->opcode;
u8 pad = packet->pad;
u8 extra_bytes = pad + packet->extra_byte + (SIZE_OF_CRC << 2);
spin_lock_irqsave(&qp->s_lock, flags);
trace_hfi1_ack(qp, psn);
/* Ignore invalid responses. */
@ -1494,7 +1625,7 @@ static void rc_rcv_resp(struct hfi1_ibport *ibp,
if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
goto ack_op_err;
read_middle:
if (unlikely(tlen != (hdrsize + pmtu + 4)))
if (unlikely(tlen != (hdrsize + pmtu + extra_bytes)))
goto ack_len_err;
if (unlikely(pmtu >= qp->s_rdma_read_len))
goto ack_len_err;
@ -1526,13 +1657,11 @@ read_middle:
aeth = be32_to_cpu(ohdr->u.aeth);
if (!do_rc_ack(qp, aeth, psn, opcode, 0, rcd))
goto ack_done;
/* Get the number of bytes the message was padded by. */
pad = ib_bth_get_pad(ohdr);
/*
* Check that the data size is >= 0 && <= pmtu.
* Remember to account for ICRC (4).
*/
if (unlikely(tlen < (hdrsize + pad + 4)))
if (unlikely(tlen < (hdrsize + extra_bytes)))
goto ack_len_err;
/*
* If this is a response to a resent RDMA read, we
@ -1550,16 +1679,14 @@ read_middle:
goto ack_seq_err;
if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
goto ack_op_err;
/* Get the number of bytes the message was padded by. */
pad = ib_bth_get_pad(ohdr);
/*
* Check that the data size is >= 1 && <= pmtu.
* Remember to account for ICRC (4).
*/
if (unlikely(tlen <= (hdrsize + pad + 4)))
if (unlikely(tlen <= (hdrsize + extra_bytes)))
goto ack_len_err;
read_last:
tlen -= hdrsize + pad + 4;
tlen -= hdrsize + extra_bytes;
if (unlikely(tlen != qp->s_rdma_read_len))
goto ack_len_err;
aeth = be32_to_cpu(ohdr->u.aeth);
@ -1844,7 +1971,7 @@ static void log_cca_event(struct hfi1_pportdata *ppd, u8 sl, u32 rlid,
spin_unlock_irqrestore(&ppd->cc_log_lock, flags);
}
void process_becn(struct hfi1_pportdata *ppd, u8 sl, u16 rlid, u32 lqpn,
void process_becn(struct hfi1_pportdata *ppd, u8 sl, u32 rlid, u32 lqpn,
u32 rqpn, u8 svc_type)
{
struct cca_timer *cca_timer;
@ -1901,12 +2028,7 @@ void process_becn(struct hfi1_pportdata *ppd, u8 sl, u16 rlid, u32 lqpn,
/**
* hfi1_rc_rcv - process an incoming RC packet
* @rcd: the context pointer
* @hdr: the header of this packet
* @rcv_flags: flags relevant to rcv processing
* @data: the packet data
* @tlen: the packet length
* @qp: the QP for this packet
* @packet: data packet information
*
* This is called from qp_rcv() to process an incoming RC packet
* for the given QP.
@ -1915,17 +2037,16 @@ void process_becn(struct hfi1_pportdata *ppd, u8 sl, u16 rlid, u32 lqpn,
void hfi1_rc_rcv(struct hfi1_packet *packet)
{
struct hfi1_ctxtdata *rcd = packet->rcd;
struct ib_header *hdr = packet->hdr;
u32 rcv_flags = packet->rcv_flags;
void *data = packet->ebuf;
void *data = packet->payload;
u32 tlen = packet->tlen;
struct rvt_qp *qp = packet->qp;
struct hfi1_ibport *ibp = rcd_to_iport(rcd);
struct ib_other_headers *ohdr = packet->ohdr;
u32 bth0, opcode;
u32 bth0 = be32_to_cpu(ohdr->bth[0]);
u32 opcode = packet->opcode;
u32 hdrsize = packet->hlen;
u32 psn;
u32 pad;
u32 psn = ib_bth_get_psn(packet->ohdr);
u32 pad = packet->pad;
struct ib_wc wc;
u32 pmtu = qp->pmtu;
int diff;
@ -1935,17 +2056,15 @@ void hfi1_rc_rcv(struct hfi1_packet *packet)
bool is_fecn = false;
bool copy_last = false;
u32 rkey;
u8 extra_bytes = pad + packet->extra_byte + (SIZE_OF_CRC << 2);
lockdep_assert_held(&qp->r_lock);
bth0 = be32_to_cpu(ohdr->bth[0]);
if (hfi1_ruc_check_hdr(ibp, hdr, rcv_flags & HFI1_HAS_GRH, qp, bth0))
if (hfi1_ruc_check_hdr(ibp, packet))
return;
is_fecn = process_ecn(qp, packet, false);
psn = be32_to_cpu(ohdr->bth[2]);
opcode = ib_bth_get_opcode(ohdr);
/*
* Process responses (ACKs) before anything else. Note that the
* packet sequence number will be for something in the send work
@ -1954,8 +2073,7 @@ void hfi1_rc_rcv(struct hfi1_packet *packet)
*/
if (opcode >= OP(RDMA_READ_RESPONSE_FIRST) &&
opcode <= OP(ATOMIC_ACKNOWLEDGE)) {
rc_rcv_resp(ibp, ohdr, data, tlen, qp, opcode, psn,
hdrsize, pmtu, rcd);
rc_rcv_resp(packet);
if (is_fecn)
goto send_ack;
return;
@ -2022,7 +2140,12 @@ void hfi1_rc_rcv(struct hfi1_packet *packet)
case OP(RDMA_WRITE_MIDDLE):
send_middle:
/* Check for invalid length PMTU or posted rwqe len. */
if (unlikely(tlen != (hdrsize + pmtu + 4)))
/*
* There will be no padding for 9B packet but 16B packets
* will come in with some padding since we always add
* CRC and LT bytes which will need to be flit aligned
*/
if (unlikely(tlen != (hdrsize + pmtu + extra_bytes)))
goto nack_inv;
qp->r_rcv_len += pmtu;
if (unlikely(qp->r_rcv_len > qp->r_len))
@ -2074,14 +2197,12 @@ no_immediate_data:
wc.wc_flags = 0;
wc.ex.imm_data = 0;
send_last:
/* Get the number of bytes the message was padded by. */
pad = ib_bth_get_pad(ohdr);
/* Check for invalid length. */
/* LAST len should be >= 1 */
if (unlikely(tlen < (hdrsize + pad + 4)))
if (unlikely(tlen < (hdrsize + extra_bytes)))
goto nack_inv;
/* Don't count the CRC. */
tlen -= (hdrsize + pad + 4);
/* Don't count the CRC(and padding and LT byte for 16B). */
tlen -= (hdrsize + extra_bytes);
wc.byte_len = tlen + qp->r_rcv_len;
if (unlikely(wc.byte_len > qp->r_len))
goto nack_inv;
@ -2368,28 +2489,19 @@ send_ack:
void hfi1_rc_hdrerr(
struct hfi1_ctxtdata *rcd,
struct ib_header *hdr,
u32 rcv_flags,
struct hfi1_packet *packet,
struct rvt_qp *qp)
{
int has_grh = rcv_flags & HFI1_HAS_GRH;
struct ib_other_headers *ohdr;
struct hfi1_ibport *ibp = rcd_to_iport(rcd);
int diff;
u32 opcode;
u32 psn, bth0;
u32 psn;
/* Check for GRH */
ohdr = &hdr->u.oth;
if (has_grh)
ohdr = &hdr->u.l.oth;
bth0 = be32_to_cpu(ohdr->bth[0]);
if (hfi1_ruc_check_hdr(ibp, hdr, has_grh, qp, bth0))
if (hfi1_ruc_check_hdr(ibp, packet))
return;
psn = be32_to_cpu(ohdr->bth[2]);
opcode = ib_bth_get_opcode(ohdr);
psn = ib_bth_get_psn(packet->ohdr);
opcode = ib_bth_get_opcode(packet->ohdr);
/* Only deal with RDMA Writes for now */
if (opcode < IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST) {

View File

@ -74,8 +74,10 @@ static int init_sge(struct rvt_qp *qp, struct rvt_rwqe *wqe)
if (wqe->sg_list[i].length == 0)
continue;
/* Check LKEY */
if (!rvt_lkey_ok(rkt, pd, j ? &ss->sg_list[j - 1] : &ss->sge,
&wqe->sg_list[i], IB_ACCESS_LOCAL_WRITE))
ret = rvt_lkey_ok(rkt, pd, j ? &ss->sg_list[j - 1] : &ss->sge,
NULL, &wqe->sg_list[i],
IB_ACCESS_LOCAL_WRITE);
if (unlikely(ret <= 0))
goto bad_lkey;
qp->r_len += wqe->sg_list[i].length;
j++;
@ -214,100 +216,104 @@ static int gid_ok(union ib_gid *gid, __be64 gid_prefix, __be64 id)
*
* The s_lock will be acquired around the hfi1_migrate_qp() call.
*/
int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct ib_header *hdr,
int has_grh, struct rvt_qp *qp, u32 bth0)
int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct hfi1_packet *packet)
{
__be64 guid;
unsigned long flags;
struct rvt_qp *qp = packet->qp;
u8 sc5 = ibp->sl_to_sc[rdma_ah_get_sl(&qp->remote_ah_attr)];
u32 dlid = packet->dlid;
u32 slid = packet->slid;
u32 sl = packet->sl;
int migrated;
u32 bth0, bth1;
u16 pkey;
if (qp->s_mig_state == IB_MIG_ARMED && (bth0 & IB_BTH_MIG_REQ)) {
if (!has_grh) {
if (rdma_ah_get_ah_flags(&qp->alt_ah_attr) &
IB_AH_GRH)
goto err;
bth0 = be32_to_cpu(packet->ohdr->bth[0]);
bth1 = be32_to_cpu(packet->ohdr->bth[1]);
if (packet->etype == RHF_RCV_TYPE_BYPASS) {
pkey = hfi1_16B_get_pkey(packet->hdr);
migrated = bth1 & OPA_BTH_MIG_REQ;
} else {
pkey = ib_bth_get_pkey(packet->ohdr);
migrated = bth0 & IB_BTH_MIG_REQ;
}
if (qp->s_mig_state == IB_MIG_ARMED && migrated) {
if (!packet->grh) {
if ((rdma_ah_get_ah_flags(&qp->alt_ah_attr) &
IB_AH_GRH) &&
(packet->etype != RHF_RCV_TYPE_BYPASS))
return 1;
} else {
const struct ib_global_route *grh;
if (!(rdma_ah_get_ah_flags(&qp->alt_ah_attr) &
IB_AH_GRH))
goto err;
return 1;
grh = rdma_ah_read_grh(&qp->alt_ah_attr);
guid = get_sguid(ibp, grh->sgid_index);
if (!gid_ok(&hdr->u.l.grh.dgid, ibp->rvp.gid_prefix,
if (!gid_ok(&packet->grh->dgid, ibp->rvp.gid_prefix,
guid))
goto err;
return 1;
if (!gid_ok(
&hdr->u.l.grh.sgid,
&packet->grh->sgid,
grh->dgid.global.subnet_prefix,
grh->dgid.global.interface_id))
goto err;
return 1;
}
if (unlikely(rcv_pkey_check(ppd_from_ibp(ibp), (u16)bth0, sc5,
ib_get_slid(hdr)))) {
hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_P_KEY,
(u16)bth0,
ib_get_sl(hdr),
0, qp->ibqp.qp_num,
ib_get_slid(hdr),
ib_get_dlid(hdr));
goto err;
if (unlikely(rcv_pkey_check(ppd_from_ibp(ibp), pkey,
sc5, slid))) {
hfi1_bad_pkey(ibp, pkey, sl, 0, qp->ibqp.qp_num,
slid, dlid);
return 1;
}
/* Validate the SLID. See Ch. 9.6.1.5 and 17.2.8 */
if (ib_get_slid(hdr) !=
rdma_ah_get_dlid(&qp->alt_ah_attr) ||
if (slid != rdma_ah_get_dlid(&qp->alt_ah_attr) ||
ppd_from_ibp(ibp)->port !=
rdma_ah_get_port_num(&qp->alt_ah_attr))
goto err;
return 1;
spin_lock_irqsave(&qp->s_lock, flags);
hfi1_migrate_qp(qp);
spin_unlock_irqrestore(&qp->s_lock, flags);
} else {
if (!has_grh) {
if (rdma_ah_get_ah_flags(&qp->remote_ah_attr) &
IB_AH_GRH)
goto err;
if (!packet->grh) {
if ((rdma_ah_get_ah_flags(&qp->remote_ah_attr) &
IB_AH_GRH) &&
(packet->etype != RHF_RCV_TYPE_BYPASS))
return 1;
} else {
const struct ib_global_route *grh;
if (!(rdma_ah_get_ah_flags(&qp->remote_ah_attr) &
IB_AH_GRH))
goto err;
return 1;
grh = rdma_ah_read_grh(&qp->remote_ah_attr);
guid = get_sguid(ibp, grh->sgid_index);
if (!gid_ok(&hdr->u.l.grh.dgid, ibp->rvp.gid_prefix,
if (!gid_ok(&packet->grh->dgid, ibp->rvp.gid_prefix,
guid))
goto err;
return 1;
if (!gid_ok(
&hdr->u.l.grh.sgid,
&packet->grh->sgid,
grh->dgid.global.subnet_prefix,
grh->dgid.global.interface_id))
goto err;
return 1;
}
if (unlikely(rcv_pkey_check(ppd_from_ibp(ibp), (u16)bth0, sc5,
ib_get_slid(hdr)))) {
hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_P_KEY,
(u16)bth0,
ib_get_sl(hdr),
0, qp->ibqp.qp_num,
ib_get_slid(hdr),
ib_get_dlid(hdr));
goto err;
if (unlikely(rcv_pkey_check(ppd_from_ibp(ibp), pkey,
sc5, slid))) {
hfi1_bad_pkey(ibp, pkey, sl, 0, qp->ibqp.qp_num,
slid, dlid);
return 1;
}
/* Validate the SLID. See Ch. 9.6.1.5 */
if (ib_get_slid(hdr) !=
rdma_ah_get_dlid(&qp->remote_ah_attr) ||
if ((slid != rdma_ah_get_dlid(&qp->remote_ah_attr)) ||
ppd_from_ibp(ibp)->port != qp->port_num)
goto err;
if (qp->s_mig_state == IB_MIG_REARM &&
!(bth0 & IB_BTH_MIG_REQ))
return 1;
if (qp->s_mig_state == IB_MIG_REARM && !migrated)
qp->s_mig_state = IB_MIG_ARMED;
}
return 0;
err:
return 1;
}
/**
@ -643,7 +649,7 @@ done:
* @ibp: a pointer to the IB port
* @hdr: a pointer to the GRH header being constructed
* @grh: the global route address to send to
* @hwords: the number of 32 bit words of header being sent
* @hwords: size of header after grh being sent in dwords
* @nwords: the number of 32 bit words of data being sent
*
* Return the size of the header in 32 bit words.
@ -655,7 +661,7 @@ u32 hfi1_make_grh(struct hfi1_ibport *ibp, struct ib_grh *hdr,
cpu_to_be32((IB_GRH_VERSION << IB_GRH_VERSION_SHIFT) |
(grh->traffic_class << IB_GRH_TCLASS_SHIFT) |
(grh->flow_label << IB_GRH_FLOW_SHIFT));
hdr->paylen = cpu_to_be16((hwords - 2 + nwords + SIZE_OF_CRC) << 2);
hdr->paylen = cpu_to_be16((hwords + nwords) << 2);
/* next_hdr is defined by C8-7 in ch. 8.4.1 */
hdr->next_hdr = IB_GRH_NEXT_HDR;
hdr->hop_limit = grh->hop_limit;
@ -671,7 +677,8 @@ u32 hfi1_make_grh(struct hfi1_ibport *ibp, struct ib_grh *hdr,
return sizeof(struct ib_grh) / sizeof(u32);
}
#define BTH2_OFFSET (offsetof(struct hfi1_sdma_header, hdr.u.oth.bth[2]) / 4)
#define BTH2_OFFSET (offsetof(struct hfi1_sdma_header, \
hdr.ibh.u.oth.bth[2]) / 4)
/**
* build_ahg - create ahg in s_ahg
@ -728,32 +735,169 @@ static inline void build_ahg(struct rvt_qp *qp, u32 npsn)
}
}
static inline void hfi1_make_ruc_bth(struct rvt_qp *qp,
struct ib_other_headers *ohdr,
u32 bth0, u32 bth1, u32 bth2)
{
bth1 |= qp->remote_qpn;
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(bth1);
ohdr->bth[2] = cpu_to_be32(bth2);
}
static inline void hfi1_make_ruc_header_16B(struct rvt_qp *qp,
struct ib_other_headers *ohdr,
u32 bth0, u32 bth2, int middle,
struct hfi1_pkt_state *ps)
{
struct hfi1_qp_priv *priv = qp->priv;
struct hfi1_ibport *ibp = ps->ibp;
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
u32 bth1 = 0;
u32 slid;
u16 pkey = hfi1_get_pkey(ibp, qp->s_pkey_index);
u8 l4 = OPA_16B_L4_IB_LOCAL;
u8 extra_bytes = hfi1_get_16b_padding((qp->s_hdrwords << 2),
ps->s_txreq->s_cur_size);
u32 nwords = SIZE_OF_CRC + ((ps->s_txreq->s_cur_size +
extra_bytes + SIZE_OF_LT) >> 2);
u8 becn = 0;
if (unlikely(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH) &&
hfi1_check_mcast(rdma_ah_get_dlid(&qp->remote_ah_attr))) {
struct ib_grh *grh;
struct ib_global_route *grd =
rdma_ah_retrieve_grh(&qp->remote_ah_attr);
int hdrwords;
/*
* Ensure OPA GIDs are transformed to IB gids
* before creating the GRH.
*/
if (grd->sgid_index == OPA_GID_INDEX)
grd->sgid_index = 0;
grh = &ps->s_txreq->phdr.hdr.opah.u.l.grh;
l4 = OPA_16B_L4_IB_GLOBAL;
hdrwords = qp->s_hdrwords - 4;
qp->s_hdrwords += hfi1_make_grh(ibp, grh, grd,
hdrwords, nwords);
middle = 0;
}
if (qp->s_mig_state == IB_MIG_MIGRATED)
bth1 |= OPA_BTH_MIG_REQ;
else
middle = 0;
if (middle)
build_ahg(qp, bth2);
else
qp->s_flags &= ~RVT_S_AHG_VALID;
bth0 |= pkey;
bth0 |= extra_bytes << 20;
if (qp->s_flags & RVT_S_ECN) {
qp->s_flags &= ~RVT_S_ECN;
/* we recently received a FECN, so return a BECN */
becn = 1;
}
hfi1_make_ruc_bth(qp, ohdr, bth0, bth1, bth2);
if (!ppd->lid)
slid = be32_to_cpu(OPA_LID_PERMISSIVE);
else
slid = ppd->lid |
(rdma_ah_get_path_bits(&qp->remote_ah_attr) &
((1 << ppd->lmc) - 1));
hfi1_make_16b_hdr(&ps->s_txreq->phdr.hdr.opah,
slid,
opa_get_lid(rdma_ah_get_dlid(&qp->remote_ah_attr),
16B),
(qp->s_hdrwords + nwords) >> 1,
pkey, becn, 0, l4, priv->s_sc);
}
static inline void hfi1_make_ruc_header_9B(struct rvt_qp *qp,
struct ib_other_headers *ohdr,
u32 bth0, u32 bth2, int middle,
struct hfi1_pkt_state *ps)
{
struct hfi1_qp_priv *priv = qp->priv;
struct hfi1_ibport *ibp = ps->ibp;
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
u32 bth1 = 0;
u16 pkey = hfi1_get_pkey(ibp, qp->s_pkey_index);
u16 lrh0 = HFI1_LRH_BTH;
u16 slid;
u8 extra_bytes = -ps->s_txreq->s_cur_size & 3;
u32 nwords = SIZE_OF_CRC + ((ps->s_txreq->s_cur_size +
extra_bytes) >> 2);
if (unlikely(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)) {
struct ib_grh *grh = &ps->s_txreq->phdr.hdr.ibh.u.l.grh;
int hdrwords = qp->s_hdrwords - 2;
lrh0 = HFI1_LRH_GRH;
qp->s_hdrwords +=
hfi1_make_grh(ibp, grh,
rdma_ah_read_grh(&qp->remote_ah_attr),
hdrwords, nwords);
middle = 0;
}
lrh0 |= (priv->s_sc & 0xf) << 12 |
(rdma_ah_get_sl(&qp->remote_ah_attr) & 0xf) << 4;
if (qp->s_mig_state == IB_MIG_MIGRATED)
bth0 |= IB_BTH_MIG_REQ;
else
middle = 0;
if (middle)
build_ahg(qp, bth2);
else
qp->s_flags &= ~RVT_S_AHG_VALID;
bth0 |= pkey;
bth0 |= extra_bytes << 20;
if (qp->s_flags & RVT_S_ECN) {
qp->s_flags &= ~RVT_S_ECN;
/* we recently received a FECN, so return a BECN */
bth1 |= (IB_BECN_MASK << IB_BECN_SHIFT);
}
hfi1_make_ruc_bth(qp, ohdr, bth0, bth1, bth2);
if (!ppd->lid)
slid = be16_to_cpu(IB_LID_PERMISSIVE);
else
slid = ppd->lid |
(rdma_ah_get_path_bits(&qp->remote_ah_attr) &
((1 << ppd->lmc) - 1));
hfi1_make_ib_hdr(&ps->s_txreq->phdr.hdr.ibh,
lrh0,
qp->s_hdrwords + nwords,
opa_get_lid(rdma_ah_get_dlid(&qp->remote_ah_attr), 9B),
ppd_from_ibp(ibp)->lid |
rdma_ah_get_path_bits(&qp->remote_ah_attr));
}
typedef void (*hfi1_make_ruc_hdr)(struct rvt_qp *qp,
struct ib_other_headers *ohdr,
u32 bth0, u32 bth2, int middle,
struct hfi1_pkt_state *ps);
/* We support only two types - 9B and 16B for now */
static const hfi1_make_ruc_hdr hfi1_ruc_header_tbl[2] = {
[HFI1_PKT_TYPE_9B] = &hfi1_make_ruc_header_9B,
[HFI1_PKT_TYPE_16B] = &hfi1_make_ruc_header_16B
};
void hfi1_make_ruc_header(struct rvt_qp *qp, struct ib_other_headers *ohdr,
u32 bth0, u32 bth2, int middle,
struct hfi1_pkt_state *ps)
{
struct hfi1_qp_priv *priv = qp->priv;
struct hfi1_ibport *ibp = ps->ibp;
u16 lrh0;
u32 nwords;
u32 extra_bytes;
u32 bth1;
/* Construct the header. */
extra_bytes = -ps->s_txreq->s_cur_size & 3;
nwords = (ps->s_txreq->s_cur_size + extra_bytes) >> 2;
lrh0 = HFI1_LRH_BTH;
if (unlikely(rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)) {
qp->s_hdrwords +=
hfi1_make_grh(ibp,
&ps->s_txreq->phdr.hdr.u.l.grh,
rdma_ah_read_grh(&qp->remote_ah_attr),
qp->s_hdrwords, nwords);
lrh0 = HFI1_LRH_GRH;
middle = 0;
}
lrh0 |= (priv->s_sc & 0xf) << 12 |
(rdma_ah_get_sl(&qp->remote_ah_attr) & 0xf) << 4;
/*
* reset s_ahg/AHG fields
*
@ -768,33 +912,9 @@ void hfi1_make_ruc_header(struct rvt_qp *qp, struct ib_other_headers *ohdr,
priv->s_ahg->tx_flags = 0;
priv->s_ahg->ahgcount = 0;
priv->s_ahg->ahgidx = 0;
if (qp->s_mig_state == IB_MIG_MIGRATED)
bth0 |= IB_BTH_MIG_REQ;
else
middle = 0;
if (middle)
build_ahg(qp, bth2);
else
qp->s_flags &= ~RVT_S_AHG_VALID;
ps->s_txreq->phdr.hdr.lrh[0] = cpu_to_be16(lrh0);
ps->s_txreq->phdr.hdr.lrh[1] =
cpu_to_be16(rdma_ah_get_dlid(&qp->remote_ah_attr));
ps->s_txreq->phdr.hdr.lrh[2] =
cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
ps->s_txreq->phdr.hdr.lrh[3] =
cpu_to_be16(ppd_from_ibp(ibp)->lid |
rdma_ah_get_path_bits(&qp->remote_ah_attr));
bth0 |= hfi1_get_pkey(ibp, qp->s_pkey_index);
bth0 |= extra_bytes << 20;
ohdr->bth[0] = cpu_to_be32(bth0);
bth1 = qp->remote_qpn;
if (qp->s_flags & RVT_S_ECN) {
qp->s_flags &= ~RVT_S_ECN;
/* we recently received a FECN, so return a BECN */
bth1 |= (IB_BECN_MASK << IB_BECN_SHIFT);
}
ohdr->bth[1] = cpu_to_be32(bth1);
ohdr->bth[2] = cpu_to_be32(bth2);
/* Make the appropriate header */
hfi1_ruc_header_tbl[priv->hdr_type](qp, ohdr, bth0, bth2, middle, ps);
}
/* when sending, force a reschedule every one of these periods */
@ -816,6 +936,8 @@ void hfi1_make_ruc_header(struct rvt_qp *qp, struct ib_other_headers *ohdr,
static bool schedule_send_yield(struct rvt_qp *qp,
struct hfi1_pkt_state *ps)
{
ps->pkts_sent = true;
if (unlikely(time_after(jiffies, ps->timeout))) {
if (!ps->in_thread ||
workqueue_congested(ps->cpu, ps->ppd->hfi1_wq)) {
@ -912,6 +1034,7 @@ void hfi1_do_send(struct rvt_qp *qp, bool in_thread)
ps.timeout = jiffies + ps.timeout_int;
ps.cpu = priv->s_sde ? priv->s_sde->cpu :
cpumask_first(cpumask_of_node(ps.ppd->dd->node));
ps.pkts_sent = false;
/* insure a pre-built packet is handled */
ps.s_txreq = get_waiting_verbs_txreq(qp);
@ -934,7 +1057,7 @@ void hfi1_do_send(struct rvt_qp *qp, bool in_thread)
spin_lock_irqsave(&qp->s_lock, ps.flags);
}
} while (make_req(qp, &ps));
iowait_starve_clear(ps.pkts_sent, &priv->s_iowait);
spin_unlock_irqrestore(&qp->s_lock, ps.flags);
}

View File

@ -246,7 +246,7 @@ static void __sdma_process_event(
enum sdma_events event);
static void dump_sdma_state(struct sdma_engine *sde);
static void sdma_make_progress(struct sdma_engine *sde, u64 status);
static void sdma_desc_avail(struct sdma_engine *sde, unsigned avail);
static void sdma_desc_avail(struct sdma_engine *sde, uint avail);
static void sdma_flush_descq(struct sdma_engine *sde);
/**
@ -325,7 +325,7 @@ static void sdma_wait_for_packet_egress(struct sdma_engine *sde,
/* timed out - bounce the link */
dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u, bouncing link\n",
__func__, sde->this_idx, (u32)reg);
queue_work(dd->pport->hfi1_wq,
queue_work(dd->pport->link_wq,
&dd->pport->link_bounce_work);
break;
}
@ -1340,10 +1340,8 @@ static void sdma_clean(struct hfi1_devdata *dd, size_t num_engines)
* @dd: hfi1_devdata
* @port: port number (currently only zero)
*
* sdma_init initializes the specified number of engines.
*
* The code initializes each sde, its csrs. Interrupts
* are not required to be enabled.
* Initializes each sde and its csrs.
* Interrupts are not required to be enabled.
*
* Returns:
* 0 - success, -errno on failure
@ -1764,13 +1762,14 @@ retry:
*
* This is called with head_lock held.
*/
static void sdma_desc_avail(struct sdma_engine *sde, unsigned avail)
static void sdma_desc_avail(struct sdma_engine *sde, uint avail)
{
struct iowait *wait, *nw;
struct iowait *waits[SDMA_WAIT_BATCH_SIZE];
unsigned i, n = 0, seq;
uint i, n = 0, seq, max_idx = 0;
struct sdma_txreq *stx;
struct hfi1_ibdev *dev = &sde->dd->verbs_dev;
u8 max_starved_cnt = 0;
#ifdef CONFIG_SDMA_VERBOSITY
dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n", sde->this_idx,
@ -1805,6 +1804,9 @@ static void sdma_desc_avail(struct sdma_engine *sde, unsigned avail)
if (num_desc > avail)
break;
avail -= num_desc;
/* Find the most starved wait memeber */
iowait_starve_find_max(wait, &max_starved_cnt,
n, &max_idx);
list_del_init(&wait->list);
waits[n++] = wait;
}
@ -1813,8 +1815,13 @@ static void sdma_desc_avail(struct sdma_engine *sde, unsigned avail)
}
} while (read_seqretry(&dev->iowait_lock, seq));
/* Schedule the most starved one first */
if (n)
waits[max_idx]->wakeup(waits[max_idx], SDMA_AVAIL_REASON);
for (i = 0; i < n; i++)
waits[i]->wakeup(waits[i], SDMA_AVAIL_REASON);
if (i != max_idx)
waits[i]->wakeup(waits[i], SDMA_AVAIL_REASON);
}
/* head_lock must be held */
@ -2351,7 +2358,8 @@ static inline u16 submit_tx(struct sdma_engine *sde, struct sdma_txreq *tx)
static int sdma_check_progress(
struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *tx)
struct sdma_txreq *tx,
bool pkts_sent)
{
int ret;
@ -2364,7 +2372,7 @@ static int sdma_check_progress(
seq = raw_seqcount_begin(
(const seqcount_t *)&sde->head_lock.seqcount);
ret = wait->sleep(sde, wait, tx, seq);
ret = wait->sleep(sde, wait, tx, seq, pkts_sent);
if (ret == -EAGAIN)
sde->desc_avail = sdma_descq_freecnt(sde);
} else {
@ -2378,6 +2386,7 @@ static int sdma_check_progress(
* @sde: sdma engine to use
* @wait: wait structure to use when full (may be NULL)
* @tx: sdma_txreq to submit
* @pkts_sent: has any packet been sent yet?
*
* The call submits the tx into the ring. If a iowait structure is non-NULL
* the packet will be queued to the list in wait.
@ -2389,7 +2398,8 @@ static int sdma_check_progress(
*/
int sdma_send_txreq(struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *tx)
struct sdma_txreq *tx,
bool pkts_sent)
{
int ret = 0;
u16 tail;
@ -2431,7 +2441,7 @@ unlock_noconn:
ret = -ECOMM;
goto unlock;
nodesc:
ret = sdma_check_progress(sde, wait, tx);
ret = sdma_check_progress(sde, wait, tx, pkts_sent);
if (ret == -EAGAIN) {
ret = 0;
goto retry;
@ -2500,8 +2510,10 @@ retry:
}
update_tail:
total_count = submit_count + flush_count;
if (wait)
if (wait) {
iowait_sdma_add(wait, total_count);
iowait_starve_clear(submit_count > 0, wait);
}
if (tail != INVALID_TAIL)
sdma_update_tail(sde, tail);
spin_unlock_irqrestore(&sde->tail_lock, flags);
@ -2529,7 +2541,7 @@ unlock_noconn:
ret = -ECOMM;
goto update_tail;
nodesc:
ret = sdma_check_progress(sde, wait, tx);
ret = sdma_check_progress(sde, wait, tx, submit_count > 0);
if (ret == -EAGAIN) {
ret = 0;
goto retry;

View File

@ -852,7 +852,8 @@ struct iowait;
int sdma_send_txreq(struct sdma_engine *sde,
struct iowait *wait,
struct sdma_txreq *tx);
struct sdma_txreq *tx,
bool pkts_sent);
int sdma_send_txlist(struct sdma_engine *sde,
struct iowait *wait,
struct list_head *tx_list,

View File

@ -95,7 +95,7 @@ static void port_release(struct kobject *kobj)
/* nothing to do since memory is freed by hfi1_free_devdata() */
}
static struct bin_attribute cc_table_bin_attr = {
static const struct bin_attribute cc_table_bin_attr = {
.attr = {.name = "cc_table_bin", .mode = 0444},
.read = read_cc_table_bin,
.size = PAGE_SIZE,
@ -137,7 +137,7 @@ static ssize_t read_cc_setting_bin(struct file *filp, struct kobject *kobj,
return count;
}
static struct bin_attribute cc_setting_bin_attr = {
static const struct bin_attribute cc_setting_bin_attr = {
.attr = {.name = "cc_settings_bin", .mode = 0444},
.read = read_cc_setting_bin,
.size = PAGE_SIZE,

View File

@ -47,7 +47,7 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
u8 ibhdr_exhdr_len(struct ib_header *hdr)
static u8 __get_ib_hdr_len(struct ib_header *hdr)
{
struct ib_other_headers *ohdr;
u8 opcode;
@ -61,13 +61,69 @@ u8 ibhdr_exhdr_len(struct ib_header *hdr)
0 : hdr_len_by_opcode[opcode] - (12 + 8);
}
#define IMM_PRN "imm %d"
#define RETH_PRN "reth vaddr 0x%.16llx rkey 0x%.8x dlen 0x%.8x"
#define AETH_PRN "aeth syn 0x%.2x %s msn 0x%.8x"
#define DETH_PRN "deth qkey 0x%.8x sqpn 0x%.6x"
#define IETH_PRN "ieth rkey 0x%.8x"
#define ATOMICACKETH_PRN "origdata %llx"
#define ATOMICETH_PRN "vaddr 0x%llx rkey 0x%.8x sdata %llx cdata %llx"
static u8 __get_16b_hdr_len(struct hfi1_16b_header *hdr)
{
struct ib_other_headers *ohdr;
u8 opcode;
if (hfi1_16B_get_l4(hdr) == OPA_16B_L4_IB_LOCAL)
ohdr = &hdr->u.oth;
else
ohdr = &hdr->u.l.oth;
opcode = ib_bth_get_opcode(ohdr);
return hdr_len_by_opcode[opcode] == 0 ?
0 : hdr_len_by_opcode[opcode] - (12 + 8 + 8);
}
u8 hfi1_trace_packet_hdr_len(struct hfi1_packet *packet)
{
if (packet->etype != RHF_RCV_TYPE_BYPASS)
return __get_ib_hdr_len(packet->hdr);
else
return __get_16b_hdr_len(packet->hdr);
}
u8 hfi1_trace_opa_hdr_len(struct hfi1_opa_header *opa_hdr)
{
if (!opa_hdr->hdr_type)
return __get_ib_hdr_len(&opa_hdr->ibh);
else
return __get_16b_hdr_len(&opa_hdr->opah);
}
const char *hfi1_trace_get_packet_str(struct hfi1_packet *packet)
{
if (packet->etype != RHF_RCV_TYPE_BYPASS)
return "IB";
switch (hfi1_16B_get_l2(packet->hdr)) {
case 0:
return "0";
case 1:
return "1";
case 2:
return "16B";
case 3:
return "9B";
}
return "";
}
const char *hfi1_trace_get_packet_type_str(u8 l4)
{
if (l4)
return "16B";
else
return "9B";
}
#define IMM_PRN "imm:%d"
#define RETH_PRN "reth vaddr:0x%.16llx rkey:0x%.8x dlen:0x%.8x"
#define AETH_PRN "aeth syn:0x%.2x %s msn:0x%.8x"
#define DETH_PRN "deth qkey:0x%.8x sqpn:0x%.6x"
#define IETH_PRN "ieth rkey:0x%.8x"
#define ATOMICACKETH_PRN "origdata:%llx"
#define ATOMICETH_PRN "vaddr:0x%llx rkey:0x%.8x sdata:%llx cdata:%llx"
#define OP(transport, op) IB_OPCODE_## transport ## _ ## op
@ -84,6 +140,125 @@ static const char *parse_syndrome(u8 syndrome)
return "";
}
void hfi1_trace_parse_9b_bth(struct ib_other_headers *ohdr,
u8 *ack, u8 *becn, u8 *fecn, u8 *mig,
u8 *se, u8 *pad, u8 *opcode, u8 *tver,
u16 *pkey, u32 *psn, u32 *qpn)
{
*ack = ib_bth_get_ackreq(ohdr);
*becn = ib_bth_get_becn(ohdr);
*fecn = ib_bth_get_fecn(ohdr);
*mig = ib_bth_get_migreq(ohdr);
*se = ib_bth_get_se(ohdr);
*pad = ib_bth_get_pad(ohdr);
*opcode = ib_bth_get_opcode(ohdr);
*tver = ib_bth_get_tver(ohdr);
*pkey = ib_bth_get_pkey(ohdr);
*psn = ib_bth_get_psn(ohdr);
*qpn = ib_bth_get_qpn(ohdr);
}
void hfi1_trace_parse_16b_bth(struct ib_other_headers *ohdr,
u8 *ack, u8 *mig, u8 *opcode,
u8 *pad, u8 *se, u8 *tver,
u32 *psn, u32 *qpn)
{
*ack = ib_bth_get_ackreq(ohdr);
*mig = ib_bth_get_migreq(ohdr);
*opcode = ib_bth_get_opcode(ohdr);
*pad = ib_bth_get_pad(ohdr);
*se = ib_bth_get_se(ohdr);
*tver = ib_bth_get_tver(ohdr);
*psn = ib_bth_get_psn(ohdr);
*qpn = ib_bth_get_qpn(ohdr);
}
void hfi1_trace_parse_9b_hdr(struct ib_header *hdr, bool sc5,
u8 *lnh, u8 *lver, u8 *sl, u8 *sc,
u16 *len, u32 *dlid, u32 *slid)
{
*lnh = ib_get_lnh(hdr);
*lver = ib_get_lver(hdr);
*sl = ib_get_sl(hdr);
*sc = ib_get_sc(hdr) | (sc5 << 4);
*len = ib_get_len(hdr);
*dlid = ib_get_dlid(hdr);
*slid = ib_get_slid(hdr);
}
void hfi1_trace_parse_16b_hdr(struct hfi1_16b_header *hdr,
u8 *age, u8 *becn, u8 *fecn,
u8 *l4, u8 *rc, u8 *sc,
u16 *entropy, u16 *len, u16 *pkey,
u32 *dlid, u32 *slid)
{
*age = hfi1_16B_get_age(hdr);
*becn = hfi1_16B_get_becn(hdr);
*fecn = hfi1_16B_get_fecn(hdr);
*l4 = hfi1_16B_get_l4(hdr);
*rc = hfi1_16B_get_rc(hdr);
*sc = hfi1_16B_get_sc(hdr);
*entropy = hfi1_16B_get_entropy(hdr);
*len = hfi1_16B_get_len(hdr);
*pkey = hfi1_16B_get_pkey(hdr);
*dlid = hfi1_16B_get_dlid(hdr);
*slid = hfi1_16B_get_slid(hdr);
}
#define LRH_PRN "len:%d sc:%d dlid:0x%.4x slid:0x%.4x "
#define LRH_9B_PRN "lnh:%d,%s lver:%d sl:%d"
#define LRH_16B_PRN "age:%d becn:%d fecn:%d l4:%d " \
"rc:%d sc:%d pkey:0x%.4x entropy:0x%.4x"
const char *hfi1_trace_fmt_lrh(struct trace_seq *p, bool bypass,
u8 age, u8 becn, u8 fecn, u8 l4,
u8 lnh, const char *lnh_name, u8 lver,
u8 rc, u8 sc, u8 sl, u16 entropy,
u16 len, u16 pkey, u32 dlid, u32 slid)
{
const char *ret = trace_seq_buffer_ptr(p);
trace_seq_printf(p, LRH_PRN, len, sc, dlid, slid);
if (bypass)
trace_seq_printf(p, LRH_16B_PRN,
age, becn, fecn, l4, rc, sc, pkey, entropy);
else
trace_seq_printf(p, LRH_9B_PRN,
lnh, lnh_name, lver, sl);
trace_seq_putc(p, 0);
return ret;
}
#define BTH_9B_PRN \
"op:0x%.2x,%s se:%d m:%d pad:%d tver:%d pkey:0x%.4x " \
"f:%d b:%d qpn:0x%.6x a:%d psn:0x%.8x"
#define BTH_16B_PRN \
"op:0x%.2x,%s se:%d m:%d pad:%d tver:%d " \
"qpn:0x%.6x a:%d psn:0x%.8x"
const char *hfi1_trace_fmt_bth(struct trace_seq *p, bool bypass,
u8 ack, u8 becn, u8 fecn, u8 mig,
u8 se, u8 pad, u8 opcode, const char *opname,
u8 tver, u16 pkey, u32 psn, u32 qpn)
{
const char *ret = trace_seq_buffer_ptr(p);
if (bypass)
trace_seq_printf(p, BTH_16B_PRN,
opcode, opname,
se, mig, pad, tver, qpn, ack, psn);
else
trace_seq_printf(p, BTH_9B_PRN,
opcode, opname,
se, mig, pad, tver, pkey, fecn, becn,
qpn, ack, psn);
trace_seq_putc(p, 0);
return ret;
}
const char *parse_everbs_hdrs(
struct trace_seq *p,
u8 opcode,

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -51,3 +51,4 @@
#include "trace_rc.h"
#include "trace_rx.h"
#include "trace_tx.h"
#include "trace_mmu.h"

View File

@ -55,8 +55,79 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM hfi1_ibhdrs
#define ib_opcode_name(opcode) { IB_OPCODE_##opcode, #opcode }
#define show_ib_opcode(opcode) \
__print_symbolic(opcode, \
ib_opcode_name(RC_SEND_FIRST), \
ib_opcode_name(RC_SEND_MIDDLE), \
ib_opcode_name(RC_SEND_LAST), \
ib_opcode_name(RC_SEND_LAST_WITH_IMMEDIATE), \
ib_opcode_name(RC_SEND_ONLY), \
ib_opcode_name(RC_SEND_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(RC_RDMA_WRITE_FIRST), \
ib_opcode_name(RC_RDMA_WRITE_MIDDLE), \
ib_opcode_name(RC_RDMA_WRITE_LAST), \
ib_opcode_name(RC_RDMA_WRITE_LAST_WITH_IMMEDIATE), \
ib_opcode_name(RC_RDMA_WRITE_ONLY), \
ib_opcode_name(RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(RC_RDMA_READ_REQUEST), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_FIRST), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_MIDDLE), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_LAST), \
ib_opcode_name(RC_RDMA_READ_RESPONSE_ONLY), \
ib_opcode_name(RC_ACKNOWLEDGE), \
ib_opcode_name(RC_ATOMIC_ACKNOWLEDGE), \
ib_opcode_name(RC_COMPARE_SWAP), \
ib_opcode_name(RC_FETCH_ADD), \
ib_opcode_name(UC_SEND_FIRST), \
ib_opcode_name(UC_SEND_MIDDLE), \
ib_opcode_name(UC_SEND_LAST), \
ib_opcode_name(UC_SEND_LAST_WITH_IMMEDIATE), \
ib_opcode_name(UC_SEND_ONLY), \
ib_opcode_name(UC_SEND_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(UC_RDMA_WRITE_FIRST), \
ib_opcode_name(UC_RDMA_WRITE_MIDDLE), \
ib_opcode_name(UC_RDMA_WRITE_LAST), \
ib_opcode_name(UC_RDMA_WRITE_LAST_WITH_IMMEDIATE), \
ib_opcode_name(UC_RDMA_WRITE_ONLY), \
ib_opcode_name(UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(UD_SEND_ONLY), \
ib_opcode_name(UD_SEND_ONLY_WITH_IMMEDIATE), \
ib_opcode_name(CNP))
u8 ibhdr_exhdr_len(struct ib_header *hdr);
const char *parse_everbs_hdrs(struct trace_seq *p, u8 opcode, void *ehdrs);
u8 hfi1_trace_opa_hdr_len(struct hfi1_opa_header *opah);
u8 hfi1_trace_packet_hdr_len(struct hfi1_packet *packet);
const char *hfi1_trace_get_packet_type_str(u8 l4);
const char *hfi1_trace_get_packet_str(struct hfi1_packet *packet);
void hfi1_trace_parse_9b_bth(struct ib_other_headers *ohdr,
u8 *ack, u8 *becn, u8 *fecn, u8 *mig,
u8 *se, u8 *pad, u8 *opcode, u8 *tver,
u16 *pkey, u32 *psn, u32 *qpn);
void hfi1_trace_parse_9b_hdr(struct ib_header *hdr, bool sc5,
u8 *lnh, u8 *lver, u8 *sl, u8 *sc,
u16 *len, u32 *dlid, u32 *slid);
void hfi1_trace_parse_16b_bth(struct ib_other_headers *ohdr,
u8 *ack, u8 *mig, u8 *opcode,
u8 *pad, u8 *se, u8 *tver,
u32 *psn, u32 *qpn);
void hfi1_trace_parse_16b_hdr(struct hfi1_16b_header *hdr,
u8 *age, u8 *becn, u8 *fecn,
u8 *l4, u8 *rc, u8 *sc,
u16 *entropy, u16 *len, u16 *pkey,
u32 *dlid, u32 *slid);
const char *hfi1_trace_fmt_lrh(struct trace_seq *p, bool bypass,
u8 age, u8 becn, u8 fecn, u8 l4,
u8 lnh, const char *lnh_name, u8 lver,
u8 rc, u8 sc, u8 sl, u16 entropy,
u16 len, u16 pkey, u32 dlid, u32 slid);
const char *hfi1_trace_fmt_bth(struct trace_seq *p, bool bypass,
u8 ack, u8 becn, u8 fecn, u8 mig,
u8 se, u8 pad, u8 opcode, const char *opname,
u8 tver, u16 pkey, u32 psn, u32 qpn);
#define __parse_ib_ehdrs(op, ehdrs) parse_everbs_hdrs(p, op, ehdrs)
@ -65,140 +136,303 @@ const char *parse_everbs_hdrs(struct trace_seq *p, u8 opcode, void *ehdrs);
__print_symbolic(lrh, \
lrh_name(LRH_BTH), \
lrh_name(LRH_GRH))
#define PKT_ENTRY(pkt) __string(ptype, hfi1_trace_get_packet_str(packet))
#define PKT_ASSIGN(pkt) __assign_str(ptype, hfi1_trace_get_packet_str(packet))
#define LRH_PRN "vl %d lver %d sl %d lnh %d,%s dlid %.4x len %d slid %.4x"
#define BTH_PRN \
"op 0x%.2x,%s se %d m %d pad %d tver %d pkey 0x%.4x " \
"f %d b %d qpn 0x%.6x a %d psn 0x%.8x"
#define EHDR_PRN "%s"
DECLARE_EVENT_CLASS(hfi1_ibhdr_template,
DECLARE_EVENT_CLASS(hfi1_input_ibhdr_template,
TP_PROTO(struct hfi1_devdata *dd,
struct ib_header *hdr),
TP_ARGS(dd, hdr),
struct hfi1_packet *packet,
bool sc5),
TP_ARGS(dd, packet, sc5),
TP_STRUCT__entry(
DD_DEV_ENTRY(dd)
/* LRH */
__field(u8, vl)
__field(u8, lver)
__field(u8, sl)
PKT_ENTRY(packet)
__field(bool, bypass)
__field(u8, ack)
__field(u8, age)
__field(u8, becn)
__field(u8, fecn)
__field(u8, l4)
__field(u8, lnh)
__field(u16, dlid)
__field(u16, len)
__field(u16, slid)
/* BTH */
__field(u8, lver)
__field(u8, mig)
__field(u8, opcode)
__field(u8, se)
__field(u8, m)
__field(u8, pad)
__field(u8, rc)
__field(u8, sc)
__field(u8, se)
__field(u8, sl)
__field(u8, tver)
__field(u16, entropy)
__field(u16, len)
__field(u16, pkey)
__field(u8, f)
__field(u8, b)
__field(u32, qpn)
__field(u8, a)
__field(u32, dlid)
__field(u32, psn)
__field(u32, qpn)
__field(u32, slid)
/* extended headers */
__dynamic_array(u8, ehdrs, ibhdr_exhdr_len(hdr))
__dynamic_array(u8, ehdrs,
hfi1_trace_packet_hdr_len(packet))
),
TP_fast_assign(
TP_fast_assign(
DD_DEV_ASSIGN(dd);
PKT_ASSIGN(packet);
if (packet->etype == RHF_RCV_TYPE_BYPASS) {
__entry->bypass = true;
hfi1_trace_parse_16b_hdr(packet->hdr,
&__entry->age,
&__entry->becn,
&__entry->fecn,
&__entry->l4,
&__entry->rc,
&__entry->sc,
&__entry->entropy,
&__entry->len,
&__entry->pkey,
&__entry->dlid,
&__entry->slid);
hfi1_trace_parse_16b_bth(packet->ohdr,
&__entry->ack,
&__entry->mig,
&__entry->opcode,
&__entry->pad,
&__entry->se,
&__entry->tver,
&__entry->psn,
&__entry->qpn);
} else {
__entry->bypass = false;
hfi1_trace_parse_9b_hdr(packet->hdr, sc5,
&__entry->lnh,
&__entry->lver,
&__entry->sl,
&__entry->sc,
&__entry->len,
&__entry->dlid,
&__entry->slid);
hfi1_trace_parse_9b_bth(packet->ohdr,
&__entry->ack,
&__entry->becn,
&__entry->fecn,
&__entry->mig,
&__entry->se,
&__entry->pad,
&__entry->opcode,
&__entry->tver,
&__entry->pkey,
&__entry->psn,
&__entry->qpn);
}
/* extended headers */
memcpy(__get_dynamic_array(ehdrs),
&packet->ohdr->u,
__get_dynamic_array_len(ehdrs));
),
TP_printk("[%s] (%s) %s %s hlen:%d %s",
__get_str(dev),
__get_str(ptype),
hfi1_trace_fmt_lrh(p,
__entry->bypass,
__entry->age,
__entry->becn,
__entry->fecn,
__entry->l4,
__entry->lnh,
show_lnh(__entry->lnh),
__entry->lver,
__entry->rc,
__entry->sc,
__entry->sl,
__entry->entropy,
__entry->len,
__entry->pkey,
__entry->dlid,
__entry->slid),
hfi1_trace_fmt_bth(p,
__entry->bypass,
__entry->ack,
__entry->becn,
__entry->fecn,
__entry->mig,
__entry->se,
__entry->pad,
__entry->opcode,
show_ib_opcode(__entry->opcode),
__entry->tver,
__entry->pkey,
__entry->psn,
__entry->qpn),
/* extended headers */
__get_dynamic_array_len(ehdrs),
__parse_ib_ehdrs(
__entry->opcode,
(void *)__get_dynamic_array(ehdrs))
)
);
DEFINE_EVENT(hfi1_input_ibhdr_template, input_ibhdr,
TP_PROTO(struct hfi1_devdata *dd,
struct hfi1_packet *packet, bool sc5),
TP_ARGS(dd, packet, sc5));
DECLARE_EVENT_CLASS(hfi1_output_ibhdr_template,
TP_PROTO(struct hfi1_devdata *dd,
struct hfi1_opa_header *opah, bool sc5),
TP_ARGS(dd, opah, sc5),
TP_STRUCT__entry(
DD_DEV_ENTRY(dd)
__field(bool, bypass)
__field(u8, ack)
__field(u8, age)
__field(u8, becn)
__field(u8, fecn)
__field(u8, l4)
__field(u8, lnh)
__field(u8, lver)
__field(u8, mig)
__field(u8, opcode)
__field(u8, pad)
__field(u8, rc)
__field(u8, sc)
__field(u8, se)
__field(u8, sl)
__field(u8, tver)
__field(u16, entropy)
__field(u16, len)
__field(u16, pkey)
__field(u32, dlid)
__field(u32, psn)
__field(u32, qpn)
__field(u32, slid)
/* extended headers */
__dynamic_array(u8, ehdrs,
hfi1_trace_opa_hdr_len(opah))
),
TP_fast_assign(
struct ib_other_headers *ohdr;
DD_DEV_ASSIGN(dd);
/* LRH */
__entry->vl =
(u8)(be16_to_cpu(hdr->lrh[0]) >> 12);
__entry->lver =
(u8)(be16_to_cpu(hdr->lrh[0]) >> 8) & 0xf;
__entry->sl =
(u8)(be16_to_cpu(hdr->lrh[0]) >> 4) & 0xf;
__entry->lnh =
(u8)(be16_to_cpu(hdr->lrh[0]) & 3);
__entry->dlid =
be16_to_cpu(hdr->lrh[1]);
/* allow for larger len */
__entry->len =
be16_to_cpu(hdr->lrh[2]);
__entry->slid =
be16_to_cpu(hdr->lrh[3]);
/* BTH */
if (__entry->lnh == HFI1_LRH_BTH)
ohdr = &hdr->u.oth;
else
ohdr = &hdr->u.l.oth;
__entry->opcode =
(be32_to_cpu(ohdr->bth[0]) >> 24) & 0xff;
__entry->se =
(be32_to_cpu(ohdr->bth[0]) >> 23) & 1;
__entry->m =
(be32_to_cpu(ohdr->bth[0]) >> 22) & 1;
__entry->pad =
(be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
__entry->tver =
(be32_to_cpu(ohdr->bth[0]) >> 16) & 0xf;
__entry->pkey =
be32_to_cpu(ohdr->bth[0]) & 0xffff;
__entry->f =
(be32_to_cpu(ohdr->bth[1]) >> IB_FECN_SHIFT) &
IB_FECN_MASK;
__entry->b =
(be32_to_cpu(ohdr->bth[1]) >> IB_BECN_SHIFT) &
IB_BECN_MASK;
__entry->qpn =
be32_to_cpu(ohdr->bth[1]) & RVT_QPN_MASK;
__entry->a =
(be32_to_cpu(ohdr->bth[2]) >> 31) & 1;
/* allow for larger PSN */
__entry->psn =
be32_to_cpu(ohdr->bth[2]) & 0x7fffffff;
if (opah->hdr_type) {
__entry->bypass = true;
hfi1_trace_parse_16b_hdr(&opah->opah,
&__entry->age,
&__entry->becn,
&__entry->fecn,
&__entry->l4,
&__entry->rc,
&__entry->sc,
&__entry->entropy,
&__entry->len,
&__entry->pkey,
&__entry->dlid,
&__entry->slid);
if (entry->l4 == OPA_16B_L4_IB_LOCAL)
ohdr = &opah->opah.u.oth;
else
ohdr = &opah->opah.u.l.oth;
hfi1_trace_parse_16b_bth(ohdr,
&__entry->ack,
&__entry->mig,
&__entry->opcode,
&__entry->pad,
&__entry->se,
&__entry->tver,
&__entry->psn,
&__entry->qpn);
} else {
__entry->bypass = false;
hfi1_trace_parse_9b_hdr(&opah->ibh, sc5,
&__entry->lnh,
&__entry->lver,
&__entry->sl,
&__entry->sc,
&__entry->len,
&__entry->dlid,
&__entry->slid);
if (entry->lnh == HFI1_LRH_BTH)
ohdr = &opah->ibh.u.oth;
else
ohdr = &opah->ibh.u.l.oth;
hfi1_trace_parse_9b_bth(ohdr,
&__entry->ack,
&__entry->becn,
&__entry->fecn,
&__entry->mig,
&__entry->se,
&__entry->pad,
&__entry->opcode,
&__entry->tver,
&__entry->pkey,
&__entry->psn,
&__entry->qpn);
}
/* extended headers */
memcpy(__get_dynamic_array(ehdrs), &ohdr->u,
ibhdr_exhdr_len(hdr));
),
TP_printk("[%s] " LRH_PRN " " BTH_PRN " " EHDR_PRN,
__get_str(dev),
/* LRH */
__entry->vl,
__entry->lver,
__entry->sl,
__entry->lnh, show_lnh(__entry->lnh),
__entry->dlid,
__entry->len,
__entry->slid,
/* BTH */
__entry->opcode, show_ib_opcode(__entry->opcode),
__entry->se,
__entry->m,
__entry->pad,
__entry->tver,
__entry->pkey,
__entry->f,
__entry->b,
__entry->qpn,
__entry->a,
__entry->psn,
/* extended headers */
__parse_ib_ehdrs(
__entry->opcode,
(void *)__get_dynamic_array(ehdrs))
)
memcpy(__get_dynamic_array(ehdrs),
&ohdr->u, __get_dynamic_array_len(ehdrs));
),
TP_printk("[%s] (%s) %s %s hlen:%d %s",
__get_str(dev),
hfi1_trace_get_packet_type_str(__entry->l4),
hfi1_trace_fmt_lrh(p,
__entry->bypass,
__entry->age,
__entry->becn,
__entry->fecn,
__entry->l4,
__entry->lnh,
show_lnh(__entry->lnh),
__entry->lver,
__entry->rc,
__entry->sc,
__entry->sl,
__entry->entropy,
__entry->len,
__entry->pkey,
__entry->dlid,
__entry->slid),
hfi1_trace_fmt_bth(p,
__entry->bypass,
__entry->ack,
__entry->becn,
__entry->fecn,
__entry->mig,
__entry->se,
__entry->pad,
__entry->opcode,
show_ib_opcode(__entry->opcode),
__entry->tver,
__entry->pkey,
__entry->psn,
__entry->qpn),
/* extended headers */
__get_dynamic_array_len(ehdrs),
__parse_ib_ehdrs(
__entry->opcode,
(void *)__get_dynamic_array(ehdrs))
)
);
DEFINE_EVENT(hfi1_ibhdr_template, input_ibhdr,
TP_PROTO(struct hfi1_devdata *dd, struct ib_header *hdr),
TP_ARGS(dd, hdr));
DEFINE_EVENT(hfi1_output_ibhdr_template, pio_output_ibhdr,
TP_PROTO(struct hfi1_devdata *dd,
struct hfi1_opa_header *opah, bool sc5),
TP_ARGS(dd, opah, sc5));
DEFINE_EVENT(hfi1_ibhdr_template, pio_output_ibhdr,
TP_PROTO(struct hfi1_devdata *dd, struct ib_header *hdr),
TP_ARGS(dd, hdr));
DEFINE_EVENT(hfi1_output_ibhdr_template, ack_output_ibhdr,
TP_PROTO(struct hfi1_devdata *dd,
struct hfi1_opa_header *opah, bool sc5),
TP_ARGS(dd, opah, sc5));
DEFINE_EVENT(hfi1_ibhdr_template, ack_output_ibhdr,
TP_PROTO(struct hfi1_devdata *dd, struct ib_header *hdr),
TP_ARGS(dd, hdr));
DEFINE_EVENT(hfi1_output_ibhdr_template, sdma_output_ibhdr,
TP_PROTO(struct hfi1_devdata *dd,
struct hfi1_opa_header *opah, bool sc5),
TP_ARGS(dd, opah, sc5));
DEFINE_EVENT(hfi1_ibhdr_template, sdma_output_ibhdr,
TP_PROTO(struct hfi1_devdata *dd, struct ib_header *hdr),
TP_ARGS(dd, hdr));
#endif /* __HFI1_TRACE_IBHDRS_H */

View File

@ -72,6 +72,26 @@ TRACE_EVENT(hfi1_interrupt,
__entry->src)
);
DECLARE_EVENT_CLASS(
hfi1_csr_template,
TP_PROTO(void __iomem *addr, u64 value),
TP_ARGS(addr, value),
TP_STRUCT__entry(
__field(void __iomem *, addr)
__field(u64, value)
),
TP_fast_assign(
__entry->addr = addr;
__entry->value = value;
),
TP_printk("addr %p value %llx", __entry->addr, __entry->value)
);
DEFINE_EVENT(
hfi1_csr_template, hfi1_write_rcvarray,
TP_PROTO(void __iomem *addr, u64 value),
TP_ARGS(addr, value));
#ifdef CONFIG_FAULT_INJECTION
TRACE_EVENT(hfi1_fault_opcode,
TP_PROTO(struct rvt_qp *qp, u8 opcode),

View File

@ -0,0 +1,95 @@
/*
* Copyright(c) 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* BSD LICENSE
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* - Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
#if !defined(__HFI1_TRACE_MMU_H) || defined(TRACE_HEADER_MULTI_READ)
#define __HFI1_TRACE_MMU_H
#include <linux/tracepoint.h>
#include <linux/trace_seq.h>
#include "hfi.h"
#undef TRACE_SYSTEM
#define TRACE_SYSTEM hfi1_mmu
DECLARE_EVENT_CLASS(hfi1_mmu_rb_template,
TP_PROTO(unsigned long addr, unsigned long len),
TP_ARGS(addr, len),
TP_STRUCT__entry(__field(unsigned long, addr)
__field(unsigned long, len)
),
TP_fast_assign(__entry->addr = addr;
__entry->len = len;
),
TP_printk("MMU node addr 0x%lx, len %lu",
__entry->addr,
__entry->len
)
);
DEFINE_EVENT(hfi1_mmu_rb_template, hfi1_mmu_rb_insert,
TP_PROTO(unsigned long addr, unsigned long len),
TP_ARGS(addr, len));
DEFINE_EVENT(hfi1_mmu_rb_template, hfi1_mmu_rb_search,
TP_PROTO(unsigned long addr, unsigned long len),
TP_ARGS(addr, len));
DEFINE_EVENT(hfi1_mmu_rb_template, hfi1_mmu_rb_remove,
TP_PROTO(unsigned long addr, unsigned long len),
TP_ARGS(addr, len));
DEFINE_EVENT(hfi1_mmu_rb_template, hfi1_mmu_mem_invalidate,
TP_PROTO(unsigned long addr, unsigned long len),
TP_ARGS(addr, len));
#endif /* __HFI1_TRACE_RC_H */
#undef TRACE_INCLUDE_PATH
#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_PATH .
#define TRACE_INCLUDE_FILE trace_mmu
#include <trace/define_trace.h>

View File

@ -52,9 +52,25 @@
#include "hfi.h"
#define tidtype_name(type) { PT_##type, #type }
#define show_tidtype(type) \
__print_symbolic(type, \
tidtype_name(EXPECTED), \
tidtype_name(EAGER), \
tidtype_name(INVALID)) \
#undef TRACE_SYSTEM
#define TRACE_SYSTEM hfi1_rx
#define packettype_name(etype) { RHF_RCV_TYPE_##etype, #etype }
#define show_packettype(etype) \
__print_symbolic(etype, \
packettype_name(EXPECTED), \
packettype_name(EAGER), \
packettype_name(IB), \
packettype_name(ERROR), \
packettype_name(BYPASS))
TRACE_EVENT(hfi1_rcvhdr,
TP_PROTO(struct hfi1_devdata *dd,
u32 ctxt,
@ -98,24 +114,24 @@ TRACE_EVENT(hfi1_rcvhdr,
);
TRACE_EVENT(hfi1_receive_interrupt,
TP_PROTO(struct hfi1_devdata *dd, u32 ctxt),
TP_ARGS(dd, ctxt),
TP_PROTO(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd),
TP_ARGS(dd, rcd),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(u32, ctxt)
__field(u8, slow_path)
__field(u8, dma_rtail)
),
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
if (dd->rcd[ctxt]->do_interrupt ==
__entry->ctxt = rcd->ctxt;
if (rcd->do_interrupt ==
&handle_receive_interrupt) {
__entry->slow_path = 1;
__entry->dma_rtail = 0xFF;
} else if (dd->rcd[ctxt]->do_interrupt ==
} else if (rcd->do_interrupt ==
&handle_receive_interrupt_dma_rtail){
__entry->dma_rtail = 1;
__entry->slow_path = 0;
} else if (dd->rcd[ctxt]->do_interrupt ==
} else if (rcd->do_interrupt ==
&handle_receive_interrupt_nodma_rtail) {
__entry->dma_rtail = 0;
__entry->slow_path = 0;
@ -129,7 +145,8 @@ TRACE_EVENT(hfi1_receive_interrupt,
)
);
TRACE_EVENT(hfi1_exp_tid_reg,
DECLARE_EVENT_CLASS(
hfi1_exp_tid_reg_unreg,
TP_PROTO(unsigned int ctxt, u16 subctxt, u32 rarr,
u32 npages, unsigned long va, unsigned long pa,
dma_addr_t dma),
@ -163,38 +180,45 @@ TRACE_EVENT(hfi1_exp_tid_reg,
)
);
TRACE_EVENT(hfi1_exp_tid_unreg,
TP_PROTO(unsigned int ctxt, u16 subctxt, u32 rarr, u32 npages,
unsigned long va, unsigned long pa, dma_addr_t dma),
TP_ARGS(ctxt, subctxt, rarr, npages, va, pa, dma),
TP_STRUCT__entry(
__field(unsigned int, ctxt)
__field(u16, subctxt)
__field(u32, rarr)
__field(u32, npages)
__field(unsigned long, va)
__field(unsigned long, pa)
__field(dma_addr_t, dma)
),
TP_fast_assign(
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
__entry->rarr = rarr;
__entry->npages = npages;
__entry->va = va;
__entry->pa = pa;
__entry->dma = dma;
),
TP_printk("[%u:%u] entry:%u, %u pages @ 0x%lx, va:0x%lx dma:0x%llx",
__entry->ctxt,
__entry->subctxt,
__entry->rarr,
__entry->npages,
__entry->pa,
__entry->va,
__entry->dma
)
);
DEFINE_EVENT(
hfi1_exp_tid_reg_unreg, hfi1_exp_tid_unreg,
TP_PROTO(unsigned int ctxt, u16 subctxt, u32 rarr, u32 npages,
unsigned long va, unsigned long pa, dma_addr_t dma),
TP_ARGS(ctxt, subctxt, rarr, npages, va, pa, dma));
DEFINE_EVENT(
hfi1_exp_tid_reg_unreg, hfi1_exp_tid_reg,
TP_PROTO(unsigned int ctxt, u16 subctxt, u32 rarr, u32 npages,
unsigned long va, unsigned long pa, dma_addr_t dma),
TP_ARGS(ctxt, subctxt, rarr, npages, va, pa, dma));
TRACE_EVENT(
hfi1_put_tid,
TP_PROTO(struct hfi1_devdata *dd,
u32 index, u32 type, unsigned long pa, u16 order),
TP_ARGS(dd, index, type, pa, order),
TP_STRUCT__entry(
DD_DEV_ENTRY(dd)
__field(unsigned long, pa);
__field(u32, index);
__field(u32, type);
__field(u16, order);
),
TP_fast_assign(
DD_DEV_ASSIGN(dd);
__entry->pa = pa;
__entry->index = index;
__entry->type = type;
__entry->order = order;
),
TP_printk("[%s] type %s pa %lx index %u order %u",
__get_str(dev),
show_tidtype(__entry->type),
__entry->pa,
__entry->index,
__entry->order
)
);
TRACE_EVENT(hfi1_exp_tid_inval,
TP_PROTO(unsigned int ctxt, u16 subctxt, unsigned long va, u32 rarr,

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015, 2016 Intel Corporation.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -198,6 +198,140 @@ TRACE_EVENT(hfi1_sdma_engine_select,
)
);
TRACE_EVENT(hfi1_sdma_user_free_queues,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt),
TP_ARGS(dd, ctxt, subctxt),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(u16, ctxt)
__field(u16, subctxt)
),
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
),
TP_printk("[%s] SDMA [%u:%u] Freeing user SDMA queues",
__get_str(dev),
__entry->ctxt,
__entry->subctxt
)
);
TRACE_EVENT(hfi1_sdma_user_process_request,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt,
u16 comp_idx),
TP_ARGS(dd, ctxt, subctxt, comp_idx),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(u16, ctxt)
__field(u16, subctxt)
__field(u16, comp_idx)
),
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
__entry->comp_idx = comp_idx;
),
TP_printk("[%s] SDMA [%u:%u] Using req/comp entry: %u",
__get_str(dev),
__entry->ctxt,
__entry->subctxt,
__entry->comp_idx
)
);
DECLARE_EVENT_CLASS(
hfi1_sdma_value_template,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt, u16 comp_idx,
u32 value),
TP_ARGS(dd, ctxt, subctxt, comp_idx, value),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(u16, ctxt)
__field(u16, subctxt)
__field(u16, comp_idx)
__field(u32, value)
),
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
__entry->comp_idx = comp_idx;
__entry->value = value;
),
TP_printk("[%s] SDMA [%u:%u:%u] value: %u",
__get_str(dev),
__entry->ctxt,
__entry->subctxt,
__entry->comp_idx,
__entry->value
)
);
DEFINE_EVENT(hfi1_sdma_value_template, hfi1_sdma_user_initial_tidoffset,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt,
u16 comp_idx, u32 tidoffset),
TP_ARGS(dd, ctxt, subctxt, comp_idx, tidoffset));
DEFINE_EVENT(hfi1_sdma_value_template, hfi1_sdma_user_data_length,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt,
u16 comp_idx, u32 data_len),
TP_ARGS(dd, ctxt, subctxt, comp_idx, data_len));
DEFINE_EVENT(hfi1_sdma_value_template, hfi1_sdma_user_compute_length,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt,
u16 comp_idx, u32 data_len),
TP_ARGS(dd, ctxt, subctxt, comp_idx, data_len));
TRACE_EVENT(hfi1_sdma_user_tid_info,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt,
u16 comp_idx, u32 tidoffset, u32 units, u8 shift),
TP_ARGS(dd, ctxt, subctxt, comp_idx, tidoffset, units, shift),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(u16, ctxt)
__field(u16, subctxt)
__field(u16, comp_idx)
__field(u32, tidoffset)
__field(u32, units)
__field(u8, shift)
),
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
__entry->comp_idx = comp_idx;
__entry->tidoffset = tidoffset;
__entry->units = units;
__entry->shift = shift;
),
TP_printk("[%s] SDMA [%u:%u:%u] TID offset %ubytes %uunits om %u",
__get_str(dev),
__entry->ctxt,
__entry->subctxt,
__entry->comp_idx,
__entry->tidoffset,
__entry->units,
__entry->shift
)
);
TRACE_EVENT(hfi1_sdma_request,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u16 subctxt,
unsigned long dim),
TP_ARGS(dd, ctxt, subctxt, dim),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(u16, ctxt)
__field(u16, subctxt)
__field(unsigned long, dim)
),
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
__entry->dim = dim;
),
TP_printk("[%s] SDMA from %u:%u (%lu)",
__get_str(dev),
__entry->ctxt,
__entry->subctxt,
__entry->dim
)
);
DECLARE_EVENT_CLASS(hfi1_sdma_engine_class,
TP_PROTO(struct sdma_engine *sde, u64 status),
TP_ARGS(sde, status),

View File

@ -65,7 +65,7 @@ int hfi1_make_uc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
struct hfi1_qp_priv *priv = qp->priv;
struct ib_other_headers *ohdr;
struct rvt_swqe *wqe;
u32 hwords = 5;
u32 hwords;
u32 bth0 = 0;
u32 len;
u32 pmtu = qp->pmtu;
@ -93,9 +93,23 @@ int hfi1_make_uc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
goto done_free_tx;
}
ohdr = &ps->s_txreq->phdr.hdr.u.oth;
if (rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)
ohdr = &ps->s_txreq->phdr.hdr.u.l.oth;
ps->s_txreq->phdr.hdr.hdr_type = priv->hdr_type;
if (priv->hdr_type == HFI1_PKT_TYPE_9B) {
/* header size in 32-bit words LRH+BTH = (8+12)/4. */
hwords = 5;
if (rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH)
ohdr = &ps->s_txreq->phdr.hdr.ibh.u.l.oth;
else
ohdr = &ps->s_txreq->phdr.hdr.ibh.u.oth;
} else {
/* header size in 32-bit words 16B LRH+BTH = (16+12)/4. */
hwords = 7;
if ((rdma_ah_get_ah_flags(&qp->remote_ah_attr) & IB_AH_GRH) &&
(hfi1_check_mcast(rdma_ah_get_dlid(&qp->remote_ah_attr))))
ohdr = &ps->s_txreq->phdr.hdr.opah.u.l.oth;
else
ohdr = &ps->s_txreq->phdr.hdr.opah.u.oth;
}
/* Get the next send request. */
wqe = rvt_get_swqe_ptr(qp, qp->s_cur);
@ -297,31 +311,26 @@ bail_no_tx:
void hfi1_uc_rcv(struct hfi1_packet *packet)
{
struct hfi1_ibport *ibp = rcd_to_iport(packet->rcd);
struct ib_header *hdr = packet->hdr;
u32 rcv_flags = packet->rcv_flags;
void *data = packet->ebuf;
void *data = packet->payload;
u32 tlen = packet->tlen;
struct rvt_qp *qp = packet->qp;
struct ib_other_headers *ohdr = packet->ohdr;
u32 bth0, opcode;
u32 opcode = packet->opcode;
u32 hdrsize = packet->hlen;
u32 psn;
u32 pad;
u32 pad = packet->pad;
struct ib_wc wc;
u32 pmtu = qp->pmtu;
struct ib_reth *reth;
int has_grh = rcv_flags & HFI1_HAS_GRH;
int ret;
u8 extra_bytes = pad + packet->extra_byte + (SIZE_OF_CRC << 2);
bth0 = be32_to_cpu(ohdr->bth[0]);
if (hfi1_ruc_check_hdr(ibp, hdr, has_grh, qp, bth0))
if (hfi1_ruc_check_hdr(ibp, packet))
return;
process_ecn(qp, packet, true);
psn = be32_to_cpu(ohdr->bth[2]);
opcode = ib_bth_get_opcode(ohdr);
psn = ib_bth_get_psn(ohdr);
/* Compare the PSN verses the expected PSN. */
if (unlikely(cmp_psn(psn, qp->r_psn) != 0)) {
/*
@ -414,7 +423,12 @@ send_first:
/* FALLTHROUGH */
case OP(SEND_MIDDLE):
/* Check for invalid length PMTU or posted rwqe len. */
if (unlikely(tlen != (hdrsize + pmtu + 4)))
/*
* There will be no padding for 9B packet but 16B packets
* will come in with some padding since we always add
* CRC and LT bytes which will need to be flit aligned
*/
if (unlikely(tlen != (hdrsize + pmtu + extra_bytes)))
goto rewind;
qp->r_rcv_len += pmtu;
if (unlikely(qp->r_rcv_len > qp->r_len))
@ -432,14 +446,12 @@ no_immediate_data:
wc.ex.imm_data = 0;
wc.wc_flags = 0;
send_last:
/* Get the number of bytes the message was padded by. */
pad = ib_bth_get_pad(ohdr);
/* Check for invalid length. */
/* LAST len should be >= 1 */
if (unlikely(tlen < (hdrsize + pad + 4)))
if (unlikely(tlen < (hdrsize + extra_bytes)))
goto rewind;
/* Don't count the CRC. */
tlen -= (hdrsize + pad + 4);
tlen -= (hdrsize + extra_bytes);
wc.byte_len = tlen + qp->r_rcv_len;
if (unlikely(wc.byte_len > qp->r_len))
goto rewind;
@ -527,14 +539,12 @@ rdma_first:
rdma_last_imm:
wc.wc_flags = IB_WC_WITH_IMM;
/* Get the number of bytes the message was padded by. */
pad = ib_bth_get_pad(ohdr);
/* Check for invalid length. */
/* LAST len should be >= 1 */
if (unlikely(tlen < (hdrsize + pad + 4)))
goto drop;
/* Don't count the CRC. */
tlen -= (hdrsize + pad + 4);
tlen -= (hdrsize + extra_bytes);
if (unlikely(tlen + qp->r_rcv_len != qp->r_len))
goto drop;
if (test_and_clear_bit(RVT_R_REWIND_SGE, &qp->r_aflags)) {
@ -554,14 +564,12 @@ rdma_last_imm:
case OP(RDMA_WRITE_LAST):
rdma_last:
/* Get the number of bytes the message was padded by. */
pad = ib_bth_get_pad(ohdr);
/* Check for invalid length. */
/* LAST len should be >= 1 */
if (unlikely(tlen < (hdrsize + pad + 4)))
goto drop;
/* Don't count the CRC. */
tlen -= (hdrsize + pad + 4);
tlen -= (hdrsize + extra_bytes);
if (unlikely(tlen + qp->r_rcv_len != qp->r_len))
goto drop;
hfi1_copy_sge(&qp->r_sge, data, tlen, true, false);

View File

@ -53,6 +53,12 @@
#include "verbs_txreq.h"
#include "qp.h"
/* We support only two types - 9B and 16B for now */
static const hfi1_make_req hfi1_make_ud_req_tbl[2] = {
[HFI1_PKT_TYPE_9B] = &hfi1_make_ud_req_9B,
[HFI1_PKT_TYPE_16B] = &hfi1_make_ud_req_16B
};
/**
* ud_loopback - handle send on loopback QPs
* @sqp: the sending QP
@ -67,6 +73,7 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
{
struct hfi1_ibport *ibp = to_iport(sqp->ibqp.device, sqp->port_num);
struct hfi1_pportdata *ppd;
struct hfi1_qp_priv *priv = sqp->priv;
struct rvt_qp *qp;
struct rdma_ah_attr *ah_attr;
unsigned long flags;
@ -102,18 +109,19 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
if (qp->ibqp.qp_num > 1) {
u16 pkey;
u16 slid;
u32 slid;
u8 sc5 = ibp->sl_to_sc[rdma_ah_get_sl(ah_attr)];
pkey = hfi1_get_pkey(ibp, sqp->s_pkey_index);
slid = ppd->lid | (rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1));
if (unlikely(ingress_pkey_check(ppd, pkey, sc5,
qp->s_pkey_index, slid))) {
hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_P_KEY, pkey,
rdma_ah_get_sl(ah_attr),
sqp->ibqp.qp_num, qp->ibqp.qp_num,
slid, rdma_ah_get_dlid(ah_attr));
qp->s_pkey_index,
slid, false))) {
hfi1_bad_pkey(ibp, pkey,
rdma_ah_get_sl(ah_attr),
sqp->ibqp.qp_num, qp->ibqp.qp_num,
slid, rdma_ah_get_dlid(ah_attr));
goto drop;
}
}
@ -128,18 +136,8 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
qkey = (int)swqe->ud_wr.remote_qkey < 0 ?
sqp->qkey : swqe->ud_wr.remote_qkey;
if (unlikely(qkey != qp->qkey)) {
u16 lid;
lid = ppd->lid | (rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1));
hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_Q_KEY, qkey,
rdma_ah_get_sl(ah_attr),
sqp->ibqp.qp_num, qp->ibqp.qp_num,
lid,
rdma_ah_get_dlid(ah_attr));
goto drop;
}
if (unlikely(qkey != qp->qkey))
goto drop; /* silently drop per IBTA spec */
}
/*
@ -185,9 +183,33 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
if (rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) {
struct ib_grh grh;
const struct ib_global_route *grd = rdma_ah_read_grh(ah_attr);
struct ib_global_route grd = *(rdma_ah_read_grh(ah_attr));
hfi1_make_grh(ibp, &grh, grd, 0, 0);
/*
* For loopback packets with extended LIDs, the
* sgid_index in the GRH is 0 and the dgid is
* OPA GID of the sender. While creating a response
* to the loopback packet, IB core creates the new
* sgid_index from the DGID and that will be the
* OPA_GID_INDEX. The new dgid is from the sgid
* index and that will be in the IB GID format.
*
* We now have a case where the sent packet had a
* different sgid_index and dgid compared to the
* one that was received in response.
*
* Fix this inconsistency.
*/
if (priv->hdr_type == HFI1_PKT_TYPE_16B) {
if (grd.sgid_index == 0)
grd.sgid_index = OPA_GID_INDEX;
if (ib_is_opa_gid(&grd.dgid))
grd.dgid.global.interface_id =
cpu_to_be64(ppd->guids[HFI1_PORT_GUID_INDEX]);
}
hfi1_make_grh(ibp, &grh, &grd, 0, 0);
hfi1_copy_sge(&qp->r_sge, &grh,
sizeof(grh), true, false);
wc.wc_flags |= IB_WC_GRH;
@ -244,7 +266,7 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
wc.pkey_index = 0;
}
wc.slid = ppd->lid | (rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1));
((1 << ppd->lmc) - 1));
/* Check for loopback when the port lid is not set */
if (wc.slid == 0 && sqp->ibqp.qp_type == IB_QPT_GSI)
wc.slid = be16_to_cpu(IB_LID_PERMISSIVE);
@ -261,6 +283,183 @@ drop:
rcu_read_unlock();
}
static void hfi1_make_bth_deth(struct rvt_qp *qp, struct rvt_swqe *wqe,
struct ib_other_headers *ohdr,
u16 *pkey, u32 extra_bytes, bool bypass)
{
u32 bth0;
struct hfi1_ibport *ibp;
ibp = to_iport(qp->ibqp.device, qp->port_num);
if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
ohdr->u.ud.imm_data = wqe->wr.ex.imm_data;
bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24;
} else {
bth0 = IB_OPCODE_UD_SEND_ONLY << 24;
}
if (wqe->wr.send_flags & IB_SEND_SOLICITED)
bth0 |= IB_BTH_SOLICITED;
bth0 |= extra_bytes << 20;
if (qp->ibqp.qp_type == IB_QPT_GSI || qp->ibqp.qp_type == IB_QPT_SMI)
*pkey = hfi1_get_pkey(ibp, wqe->ud_wr.pkey_index);
else
*pkey = hfi1_get_pkey(ibp, qp->s_pkey_index);
if (!bypass)
bth0 |= *pkey;
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(wqe->ud_wr.remote_qpn);
ohdr->bth[2] = cpu_to_be32(mask_psn(wqe->psn));
/*
* Qkeys with the high order bit set mean use the
* qkey from the QP context instead of the WR (see 10.2.5).
*/
ohdr->u.ud.deth[0] = cpu_to_be32((int)wqe->ud_wr.remote_qkey < 0 ?
qp->qkey : wqe->ud_wr.remote_qkey);
ohdr->u.ud.deth[1] = cpu_to_be32(qp->ibqp.qp_num);
}
void hfi1_make_ud_req_9B(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
struct rvt_swqe *wqe)
{
u32 nwords, extra_bytes;
u16 len, slid, dlid, pkey;
u16 lrh0 = 0;
u8 sc5;
struct hfi1_qp_priv *priv = qp->priv;
struct ib_other_headers *ohdr;
struct rdma_ah_attr *ah_attr;
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
struct ib_grh *grh;
ibp = to_iport(qp->ibqp.device, qp->port_num);
ppd = ppd_from_ibp(ibp);
ah_attr = &ibah_to_rvtah(wqe->ud_wr.ah)->attr;
extra_bytes = -wqe->length & 3;
nwords = ((wqe->length + extra_bytes) >> 2) + SIZE_OF_CRC;
/* header size in dwords LRH+BTH+DETH = (8+12+8)/4. */
qp->s_hdrwords = 7;
if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM)
qp->s_hdrwords++;
if (rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) {
grh = &ps->s_txreq->phdr.hdr.ibh.u.l.grh;
qp->s_hdrwords += hfi1_make_grh(ibp, grh,
rdma_ah_read_grh(ah_attr),
qp->s_hdrwords - 2, nwords);
lrh0 = HFI1_LRH_GRH;
ohdr = &ps->s_txreq->phdr.hdr.ibh.u.l.oth;
} else {
lrh0 = HFI1_LRH_BTH;
ohdr = &ps->s_txreq->phdr.hdr.ibh.u.oth;
}
sc5 = ibp->sl_to_sc[rdma_ah_get_sl(ah_attr)];
lrh0 |= (rdma_ah_get_sl(ah_attr) & 0xf) << 4;
if (qp->ibqp.qp_type == IB_QPT_SMI) {
lrh0 |= 0xF000; /* Set VL (see ch. 13.5.3.1) */
priv->s_sc = 0xf;
} else {
lrh0 |= (sc5 & 0xf) << 12;
priv->s_sc = sc5;
}
dlid = opa_get_lid(rdma_ah_get_dlid(ah_attr), 9B);
if (dlid == be16_to_cpu(IB_LID_PERMISSIVE)) {
slid = be16_to_cpu(IB_LID_PERMISSIVE);
} else {
u16 lid = (u16)ppd->lid;
if (lid) {
lid |= rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1);
slid = lid;
} else {
slid = be16_to_cpu(IB_LID_PERMISSIVE);
}
}
hfi1_make_bth_deth(qp, wqe, ohdr, &pkey, extra_bytes, false);
len = qp->s_hdrwords + nwords;
/* Setup the packet */
ps->s_txreq->phdr.hdr.hdr_type = HFI1_PKT_TYPE_9B;
hfi1_make_ib_hdr(&ps->s_txreq->phdr.hdr.ibh,
lrh0, len, dlid, slid);
}
void hfi1_make_ud_req_16B(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
struct rvt_swqe *wqe)
{
struct hfi1_qp_priv *priv = qp->priv;
struct ib_other_headers *ohdr;
struct rdma_ah_attr *ah_attr;
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
u32 dlid, slid, nwords, extra_bytes;
u16 len, pkey;
u8 l4, sc5;
ibp = to_iport(qp->ibqp.device, qp->port_num);
ppd = ppd_from_ibp(ibp);
ah_attr = &ibah_to_rvtah(wqe->ud_wr.ah)->attr;
/* header size in dwords 16B LRH+BTH+DETH = (16+12+8)/4. */
qp->s_hdrwords = 9;
if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM)
qp->s_hdrwords++;
/* SW provides space for CRC and LT for bypass packets. */
extra_bytes = hfi1_get_16b_padding((qp->s_hdrwords << 2),
wqe->length);
nwords = ((wqe->length + extra_bytes + SIZE_OF_LT) >> 2) + SIZE_OF_CRC;
if ((rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) &&
hfi1_check_mcast(rdma_ah_get_dlid(ah_attr))) {
struct ib_grh *grh;
struct ib_global_route *grd = rdma_ah_retrieve_grh(ah_attr);
/*
* Ensure OPA GIDs are transformed to IB gids
* before creating the GRH.
*/
if (grd->sgid_index == OPA_GID_INDEX) {
dd_dev_warn(ppd->dd, "Bad sgid_index. sgid_index: %d\n",
grd->sgid_index);
grd->sgid_index = 0;
}
grh = &ps->s_txreq->phdr.hdr.opah.u.l.grh;
qp->s_hdrwords += hfi1_make_grh(ibp, grh, grd,
qp->s_hdrwords - 4, nwords);
ohdr = &ps->s_txreq->phdr.hdr.opah.u.l.oth;
l4 = OPA_16B_L4_IB_GLOBAL;
} else {
ohdr = &ps->s_txreq->phdr.hdr.opah.u.oth;
l4 = OPA_16B_L4_IB_LOCAL;
}
sc5 = ibp->sl_to_sc[rdma_ah_get_sl(ah_attr)];
if (qp->ibqp.qp_type == IB_QPT_SMI)
priv->s_sc = 0xf;
else
priv->s_sc = sc5;
dlid = opa_get_lid(rdma_ah_get_dlid(ah_attr), 16B);
if (!ppd->lid)
slid = be32_to_cpu(OPA_LID_PERMISSIVE);
else
slid = ppd->lid | (rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1));
hfi1_make_bth_deth(qp, wqe, ohdr, &pkey, extra_bytes, true);
/* Convert dwords to flits */
len = (qp->s_hdrwords + nwords) >> 1;
/* Setup the packet */
ps->s_txreq->phdr.hdr.hdr_type = HFI1_PKT_TYPE_16B;
hfi1_make_16b_hdr(&ps->s_txreq->phdr.hdr.opah,
slid, dlid, len, pkey, 0, 0, l4, priv->s_sc);
}
/**
* hfi1_make_ud_req - construct a UD request packet
* @qp: the QP
@ -272,18 +471,12 @@ drop:
int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
{
struct hfi1_qp_priv *priv = qp->priv;
struct ib_other_headers *ohdr;
struct rdma_ah_attr *ah_attr;
struct hfi1_pportdata *ppd;
struct hfi1_ibport *ibp;
struct rvt_swqe *wqe;
u32 nwords;
u32 extra_bytes;
u32 bth0;
u16 lrh0;
u16 lid;
int next_cur;
u8 sc5;
u32 lid;
ps->s_txreq = get_txreq(ps->dev, qp);
if (IS_ERR(ps->s_txreq))
@ -320,13 +513,14 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
ibp = to_iport(qp->ibqp.device, qp->port_num);
ppd = ppd_from_ibp(ibp);
ah_attr = &ibah_to_rvtah(wqe->ud_wr.ah)->attr;
if (rdma_ah_get_dlid(ah_attr) < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
rdma_ah_get_dlid(ah_attr) == be16_to_cpu(IB_LID_PERMISSIVE)) {
priv->hdr_type = hfi1_get_hdr_type(ppd->lid, ah_attr);
if ((!hfi1_check_mcast(rdma_ah_get_dlid(ah_attr))) ||
(rdma_ah_get_dlid(ah_attr) == be32_to_cpu(OPA_LID_PERMISSIVE))) {
lid = rdma_ah_get_dlid(ah_attr) & ~((1 << ppd->lmc) - 1);
if (unlikely(!loopback &&
(lid == ppd->lid ||
(lid == be16_to_cpu(IB_LID_PERMISSIVE) &&
qp->ibqp.qp_type == IB_QPT_GSI)))) {
((lid == ppd->lid) ||
((lid == be32_to_cpu(OPA_LID_PERMISSIVE)) &&
(qp->ibqp.qp_type == IB_QPT_GSI))))) {
unsigned long tflags = ps->flags;
/*
* If DMAs are in progress, we can't generate
@ -350,11 +544,6 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
}
qp->s_cur = next_cur;
extra_bytes = -wqe->length & 3;
nwords = (wqe->length + extra_bytes) >> 2;
/* header size in 32-bit words LRH+BTH+DETH = (8+12+8)/4. */
qp->s_hdrwords = 7;
ps->s_txreq->s_cur_size = wqe->length;
ps->s_txreq->ss = &qp->s_sge;
qp->s_srate = rdma_ah_get_static_rate(ah_attr);
@ -365,77 +554,12 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
qp->s_sge.num_sge = wqe->wr.num_sge;
qp->s_sge.total_len = wqe->length;
if (rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) {
/* Header size in 32-bit words. */
qp->s_hdrwords += hfi1_make_grh(ibp,
&ps->s_txreq->phdr.hdr.u.l.grh,
rdma_ah_read_grh(ah_attr),
qp->s_hdrwords, nwords);
lrh0 = HFI1_LRH_GRH;
ohdr = &ps->s_txreq->phdr.hdr.u.l.oth;
/*
* Don't worry about sending to locally attached multicast
* QPs. It is unspecified by the spec. what happens.
*/
} else {
/* Header size in 32-bit words. */
lrh0 = HFI1_LRH_BTH;
ohdr = &ps->s_txreq->phdr.hdr.u.oth;
}
if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
qp->s_hdrwords++;
ohdr->u.ud.imm_data = wqe->wr.ex.imm_data;
bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24;
} else {
bth0 = IB_OPCODE_UD_SEND_ONLY << 24;
}
sc5 = ibp->sl_to_sc[rdma_ah_get_sl(ah_attr)];
lrh0 |= (rdma_ah_get_sl(ah_attr) & 0xf) << 4;
if (qp->ibqp.qp_type == IB_QPT_SMI) {
lrh0 |= 0xF000; /* Set VL (see ch. 13.5.3.1) */
priv->s_sc = 0xf;
} else {
lrh0 |= (sc5 & 0xf) << 12;
priv->s_sc = sc5;
}
/* Make the appropriate header */
hfi1_make_ud_req_tbl[priv->hdr_type](qp, ps, qp->s_wqe);
priv->s_sde = qp_to_sdma_engine(qp, priv->s_sc);
ps->s_txreq->sde = priv->s_sde;
priv->s_sendcontext = qp_to_send_context(qp, priv->s_sc);
ps->s_txreq->psc = priv->s_sendcontext;
ps->s_txreq->phdr.hdr.lrh[0] = cpu_to_be16(lrh0);
ps->s_txreq->phdr.hdr.lrh[1] =
cpu_to_be16(rdma_ah_get_dlid(ah_attr));
ps->s_txreq->phdr.hdr.lrh[2] =
cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
if (rdma_ah_get_dlid(ah_attr) == be16_to_cpu(IB_LID_PERMISSIVE)) {
ps->s_txreq->phdr.hdr.lrh[3] = IB_LID_PERMISSIVE;
} else {
lid = ppd->lid;
if (lid) {
lid |= rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1);
ps->s_txreq->phdr.hdr.lrh[3] = cpu_to_be16(lid);
} else {
ps->s_txreq->phdr.hdr.lrh[3] = IB_LID_PERMISSIVE;
}
}
if (wqe->wr.send_flags & IB_SEND_SOLICITED)
bth0 |= IB_BTH_SOLICITED;
bth0 |= extra_bytes << 20;
if (qp->ibqp.qp_type == IB_QPT_GSI || qp->ibqp.qp_type == IB_QPT_SMI)
bth0 |= hfi1_get_pkey(ibp, wqe->ud_wr.pkey_index);
else
bth0 |= hfi1_get_pkey(ibp, qp->s_pkey_index);
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(wqe->ud_wr.remote_qpn);
ohdr->bth[2] = cpu_to_be32(mask_psn(wqe->psn));
/*
* Qkeys with the high order bit set mean use the
* qkey from the QP context instead of the WR (see 10.2.5).
*/
ohdr->u.ud.deth[0] = cpu_to_be32((int)wqe->ud_wr.remote_qkey < 0 ?
qp->qkey : wqe->ud_wr.remote_qkey);
ohdr->u.ud.deth[1] = cpu_to_be32(qp->ibqp.qp_num);
/* disarm any ahg */
priv->s_ahg->ahgcount = 0;
priv->s_ahg->ahgidx = 0;
@ -505,6 +629,64 @@ int hfi1_lookup_pkey_idx(struct hfi1_ibport *ibp, u16 pkey)
return -1;
}
void return_cnp_16B(struct hfi1_ibport *ibp, struct rvt_qp *qp,
u32 remote_qpn, u32 pkey, u32 slid, u32 dlid,
u8 sc5, const struct ib_grh *old_grh)
{
u64 pbc, pbc_flags = 0;
u32 bth0, plen, vl, hwords = 7;
u16 len;
u8 l4;
struct hfi1_16b_header hdr;
struct ib_other_headers *ohdr;
struct pio_buf *pbuf;
struct send_context *ctxt = qp_to_send_context(qp, sc5);
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
u32 nwords;
/* Populate length */
nwords = ((hfi1_get_16b_padding(hwords << 2, 0) +
SIZE_OF_LT) >> 2) + SIZE_OF_CRC;
if (old_grh) {
struct ib_grh *grh = &hdr.u.l.grh;
grh->version_tclass_flow = old_grh->version_tclass_flow;
grh->paylen = cpu_to_be16((hwords - 4 + nwords) << 2);
grh->hop_limit = 0xff;
grh->sgid = old_grh->dgid;
grh->dgid = old_grh->sgid;
ohdr = &hdr.u.l.oth;
l4 = OPA_16B_L4_IB_GLOBAL;
hwords += sizeof(struct ib_grh) / sizeof(u32);
} else {
ohdr = &hdr.u.oth;
l4 = OPA_16B_L4_IB_LOCAL;
}
/* BIT 16 to 19 is TVER. Bit 20 to 22 is pad cnt */
bth0 = (IB_OPCODE_CNP << 24) | (1 << 16) |
(hfi1_get_16b_padding(hwords << 2, 0) << 20);
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(remote_qpn);
ohdr->bth[2] = 0; /* PSN 0 */
/* Convert dwords to flits */
len = (hwords + nwords) >> 1;
hfi1_make_16b_hdr(&hdr, slid, dlid, len, pkey, 1, 0, l4, sc5);
plen = 2 /* PBC */ + hwords + nwords;
pbc_flags |= PBC_PACKET_BYPASS | PBC_INSERT_BYPASS_ICRC;
vl = sc_to_vlt(ppd->dd, sc5);
pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
if (ctxt) {
pbuf = sc_buffer_alloc(ctxt, plen, NULL, NULL);
if (pbuf)
ppd->dd->pio_inline_send(ppd->dd, pbuf, pbc,
&hdr, hwords);
}
}
void return_cnp(struct hfi1_ibport *ibp, struct rvt_qp *qp, u32 remote_qpn,
u32 pkey, u32 slid, u32 dlid, u8 sc5,
const struct ib_grh *old_grh)
@ -543,13 +725,9 @@ void return_cnp(struct hfi1_ibport *ibp, struct rvt_qp *qp, u32 remote_qpn,
ohdr->bth[1] = cpu_to_be32(remote_qpn | (1 << IB_BECN_SHIFT));
ohdr->bth[2] = 0; /* PSN 0 */
hdr.lrh[0] = cpu_to_be16(lrh0);
hdr.lrh[1] = cpu_to_be16(dlid);
hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
hdr.lrh[3] = cpu_to_be16(slid);
hfi1_make_ib_hdr(&hdr, lrh0, hwords + SIZE_OF_CRC, dlid, slid);
plen = 2 /* PBC */ + hwords;
pbc_flags |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
pbc_flags |= (ib_is_sc5(sc5) << PBC_DC_INFO_SHIFT);
vl = sc_to_vlt(ppd->dd, sc5);
pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
if (ctxt) {
@ -668,37 +846,45 @@ static int opa_smp_check(struct hfi1_ibport *ibp, u16 pkey, u8 sc5,
void hfi1_ud_rcv(struct hfi1_packet *packet)
{
struct ib_other_headers *ohdr = packet->ohdr;
int opcode;
u32 hdrsize = packet->hlen;
struct ib_wc wc;
u32 qkey;
u32 src_qp;
u16 dlid, pkey;
u16 pkey;
int mgmt_pkey_idx = -1;
struct hfi1_ibport *ibp = rcd_to_iport(packet->rcd);
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
struct ib_header *hdr = packet->hdr;
u32 rcv_flags = packet->rcv_flags;
void *data = packet->ebuf;
void *data = packet->payload;
u32 tlen = packet->tlen;
struct rvt_qp *qp = packet->qp;
bool has_grh = rcv_flags & HFI1_HAS_GRH;
u8 sc5 = hfi1_9B_get_sc5(hdr, packet->rhf);
u32 bth1;
u8 sl_from_sc, sl;
u16 slid;
u8 sc5 = packet->sc;
u8 sl_from_sc;
u8 opcode = packet->opcode;
u8 sl = packet->sl;
u32 dlid = packet->dlid;
u32 slid = packet->slid;
u8 extra_bytes;
bool dlid_is_permissive;
bool slid_is_permissive;
qkey = be32_to_cpu(ohdr->u.ud.deth[0]);
src_qp = be32_to_cpu(ohdr->u.ud.deth[1]) & RVT_QPN_MASK;
dlid = ib_get_dlid(hdr);
bth1 = be32_to_cpu(ohdr->bth[1]);
slid = ib_get_slid(hdr);
pkey = ib_bth_get_pkey(ohdr);
opcode = ib_bth_get_opcode(ohdr);
sl = ib_get_sl(hdr);
extra_bytes = ib_bth_get_pad(ohdr);
extra_bytes += (SIZE_OF_CRC << 2);
extra_bytes = packet->pad + packet->extra_byte + (SIZE_OF_CRC << 2);
qkey = ib_get_qkey(ohdr);
src_qp = ib_get_sqpn(ohdr);
if (packet->etype == RHF_RCV_TYPE_BYPASS) {
u32 permissive_lid =
opa_get_lid(be32_to_cpu(OPA_LID_PERMISSIVE), 16B);
pkey = hfi1_16B_get_pkey(packet->hdr);
dlid_is_permissive = (dlid == permissive_lid);
slid_is_permissive = (slid == permissive_lid);
} else {
hdr = packet->hdr;
pkey = ib_bth_get_pkey(ohdr);
dlid_is_permissive = (dlid == be16_to_cpu(IB_LID_PERMISSIVE));
slid_is_permissive = (slid == be16_to_cpu(IB_LID_PERMISSIVE));
}
sl_from_sc = ibp->sc_to_sl[sc5];
process_ecn(qp, packet, (opcode != IB_OPCODE_CNP));
@ -716,8 +902,7 @@ void hfi1_ud_rcv(struct hfi1_packet *packet)
* and the QKEY matches (see 9.6.1.4.1 and 9.6.1.5.1).
*/
if (qp->ibqp.qp_num) {
if (unlikely(hdr->lrh[1] == IB_LID_PERMISSIVE ||
hdr->lrh[3] == IB_LID_PERMISSIVE))
if (unlikely(dlid_is_permissive || slid_is_permissive))
goto drop;
if (qp->ibqp.qp_num > 1) {
if (unlikely(rcv_pkey_check(ppd, pkey, sc5, slid))) {
@ -727,10 +912,10 @@ void hfi1_ud_rcv(struct hfi1_packet *packet)
* for invalid pkeys is optional according to
* IB spec (release 1.3, section 10.9.4)
*/
hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_P_KEY,
pkey, sl,
src_qp, qp->ibqp.qp_num,
slid, dlid);
hfi1_bad_pkey(ibp,
pkey, sl,
src_qp, qp->ibqp.qp_num,
slid, dlid);
return;
}
} else {
@ -739,12 +924,9 @@ void hfi1_ud_rcv(struct hfi1_packet *packet)
if (mgmt_pkey_idx < 0)
goto drop;
}
if (unlikely(qkey != qp->qkey)) {
hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_Q_KEY, qkey, sl,
src_qp, qp->ibqp.qp_num,
slid, dlid);
if (unlikely(qkey != qp->qkey)) /* Silent drop */
return;
}
/* Drop invalid MAD packets (see 13.5.3.1). */
if (unlikely(qp->ibqp.qp_num == 1 &&
(tlen > 2048 || (sc5 == 0xF))))
@ -758,8 +940,7 @@ void hfi1_ud_rcv(struct hfi1_packet *packet)
if (tlen > 2048)
goto drop;
if ((hdr->lrh[1] == IB_LID_PERMISSIVE ||
hdr->lrh[3] == IB_LID_PERMISSIVE) &&
if ((dlid_is_permissive || slid_is_permissive) &&
smp->mgmt_class != IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
goto drop;
@ -811,8 +992,19 @@ void hfi1_ud_rcv(struct hfi1_packet *packet)
qp->r_flags |= RVT_R_REUSE_SGE;
goto drop;
}
if (has_grh) {
hfi1_copy_sge(&qp->r_sge, &hdr->u.l.grh,
if (packet->grh) {
hfi1_copy_sge(&qp->r_sge, packet->grh,
sizeof(struct ib_grh), true, false);
wc.wc_flags |= IB_WC_GRH;
} else if (packet->etype == RHF_RCV_TYPE_BYPASS) {
struct ib_grh grh;
/*
* Assuming we only created 16B on the send side
* if we want to use large LIDs, since GRH was stripped
* out when creating 16B, add back the GRH here.
*/
hfi1_make_ext_grh(packet, &grh, slid, dlid);
hfi1_copy_sge(&qp->r_sge, &grh,
sizeof(struct ib_grh), true, false);
wc.wc_flags |= IB_WC_GRH;
} else {
@ -845,14 +1037,15 @@ void hfi1_ud_rcv(struct hfi1_packet *packet)
} else {
wc.pkey_index = 0;
}
if (slid_is_permissive)
slid = be32_to_cpu(OPA_LID_PERMISSIVE);
wc.slid = slid;
wc.sl = sl_from_sc;
/*
* Save the LMC lower bits if the destination LID is a unicast LID.
*/
wc.dlid_path_bits = dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE) ? 0 :
wc.dlid_path_bits = hfi1_check_mcast(dlid) ? 0 :
dlid & ((1 << ppd_from_ibp(ibp)->lmc) - 1);
wc.port_num = qp->port_num;
/* Signal completion event if the solicited bit is set. */

View File

@ -47,58 +47,28 @@
#include <asm/page.h>
#include <linux/string.h>
#include "mmu_rb.h"
#include "user_exp_rcv.h"
#include "trace.h"
#include "mmu_rb.h"
struct tid_group {
struct list_head list;
u32 base;
u8 size;
u8 used;
u8 map;
};
struct tid_rb_node {
struct mmu_rb_node mmu;
unsigned long phys;
struct tid_group *grp;
u32 rcventry;
dma_addr_t dma_addr;
bool freed;
unsigned npages;
struct page *pages[0];
};
struct tid_pageset {
u16 idx;
u16 count;
};
#define EXP_TID_SET_EMPTY(set) (set.count == 0 && list_empty(&set.list))
#define num_user_pages(vaddr, len) \
(1 + (((((unsigned long)(vaddr) + \
(unsigned long)(len) - 1) & PAGE_MASK) - \
((unsigned long)vaddr & PAGE_MASK)) >> PAGE_SHIFT))
static void unlock_exp_tids(struct hfi1_ctxtdata *uctxt,
struct exp_tid_set *set,
struct hfi1_filedata *fd);
static u32 find_phys_blocks(struct page **pages, unsigned npages,
struct tid_pageset *list);
static int set_rcvarray_entry(struct hfi1_filedata *fd, unsigned long vaddr,
static u32 find_phys_blocks(struct tid_user_buf *tidbuf, unsigned int npages);
static int set_rcvarray_entry(struct hfi1_filedata *fd,
struct tid_user_buf *tbuf,
u32 rcventry, struct tid_group *grp,
struct page **pages, unsigned npages);
u16 pageidx, unsigned int npages);
static int tid_rb_insert(void *arg, struct mmu_rb_node *node);
static void cacheless_tid_rb_remove(struct hfi1_filedata *fdata,
struct tid_rb_node *tnode);
static void tid_rb_remove(void *arg, struct mmu_rb_node *node);
static int tid_rb_invalidate(void *arg, struct mmu_rb_node *mnode);
static int program_rcvarray(struct hfi1_filedata *fd, unsigned long vaddr,
struct tid_group *grp, struct tid_pageset *sets,
unsigned start, u16 count, struct page **pages,
u32 *tidlist, unsigned *tididx, unsigned *pmapped);
static int program_rcvarray(struct hfi1_filedata *fd, struct tid_user_buf *,
struct tid_group *grp,
unsigned int start, u16 count,
u32 *tidlist, unsigned int *tididx,
unsigned int *pmapped);
static int unprogram_rcvarray(struct hfi1_filedata *fd, u32 tidinfo,
struct tid_group **grp);
static void clear_tid_node(struct hfi1_filedata *fd, struct tid_rb_node *node);
@ -109,96 +79,14 @@ static struct mmu_rb_ops tid_rb_ops = {
.invalidate = tid_rb_invalidate
};
static inline u32 rcventry2tidinfo(u32 rcventry)
{
u32 pair = rcventry & ~0x1;
return EXP_TID_SET(IDX, pair >> 1) |
EXP_TID_SET(CTRL, 1 << (rcventry - pair));
}
static inline void exp_tid_group_init(struct exp_tid_set *set)
{
INIT_LIST_HEAD(&set->list);
set->count = 0;
}
static inline void tid_group_remove(struct tid_group *grp,
struct exp_tid_set *set)
{
list_del_init(&grp->list);
set->count--;
}
static inline void tid_group_add_tail(struct tid_group *grp,
struct exp_tid_set *set)
{
list_add_tail(&grp->list, &set->list);
set->count++;
}
static inline struct tid_group *tid_group_pop(struct exp_tid_set *set)
{
struct tid_group *grp =
list_first_entry(&set->list, struct tid_group, list);
list_del_init(&grp->list);
set->count--;
return grp;
}
static inline void tid_group_move(struct tid_group *group,
struct exp_tid_set *s1,
struct exp_tid_set *s2)
{
tid_group_remove(group, s1);
tid_group_add_tail(group, s2);
}
int hfi1_user_exp_rcv_grp_init(struct hfi1_filedata *fd)
{
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_devdata *dd = fd->dd;
u32 tidbase;
u32 i;
struct tid_group *grp, *gptr;
exp_tid_group_init(&uctxt->tid_group_list);
exp_tid_group_init(&uctxt->tid_used_list);
exp_tid_group_init(&uctxt->tid_full_list);
tidbase = uctxt->expected_base;
for (i = 0; i < uctxt->expected_count /
dd->rcv_entries.group_size; i++) {
grp = kzalloc(sizeof(*grp), GFP_KERNEL);
if (!grp)
goto grp_failed;
grp->size = dd->rcv_entries.group_size;
grp->base = tidbase;
tid_group_add_tail(grp, &uctxt->tid_group_list);
tidbase += dd->rcv_entries.group_size;
}
return 0;
grp_failed:
list_for_each_entry_safe(grp, gptr, &uctxt->tid_group_list.list,
list) {
list_del_init(&grp->list);
kfree(grp);
}
return -ENOMEM;
}
/*
* Initialize context and file private data needed for Expected
* receive caching. This needs to be done after the context has
* been configured with the eager/expected RcvEntry counts.
*/
int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd)
int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt)
{
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_devdata *dd = uctxt->dd;
int ret = 0;
@ -266,18 +154,6 @@ int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd)
return ret;
}
void hfi1_user_exp_rcv_grp_free(struct hfi1_ctxtdata *uctxt)
{
struct tid_group *grp, *gptr;
list_for_each_entry_safe(grp, gptr, &uctxt->tid_group_list.list,
list) {
list_del_init(&grp->list);
kfree(grp);
}
hfi1_clear_tids(uctxt);
}
void hfi1_user_exp_rcv_free(struct hfi1_filedata *fd)
{
struct hfi1_ctxtdata *uctxt = fd->uctxt;
@ -302,21 +178,90 @@ void hfi1_user_exp_rcv_free(struct hfi1_filedata *fd)
fd->entry_to_rb = NULL;
}
/*
* Write an "empty" RcvArray entry.
* This function exists so the TID registaration code can use it
* to write to unused/unneeded entries and still take advantage
* of the WC performance improvements. The HFI will ignore this
* write to the RcvArray entry.
/**
* Release pinned receive buffer pages.
*
* @mapped - true if the pages have been DMA mapped. false otherwise.
* @idx - Index of the first page to unpin.
* @npages - No of pages to unpin.
*
* If the pages have been DMA mapped (indicated by mapped parameter), their
* info will be passed via a struct tid_rb_node. If they haven't been mapped,
* their info will be passed via a struct tid_user_buf.
*/
static inline void rcv_array_wc_fill(struct hfi1_devdata *dd, u32 index)
static void unpin_rcv_pages(struct hfi1_filedata *fd,
struct tid_user_buf *tidbuf,
struct tid_rb_node *node,
unsigned int idx,
unsigned int npages,
bool mapped)
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->mmu.len, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
} else {
pages = &tidbuf->pages[idx];
}
hfi1_release_user_pages(fd->mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
/**
* Pin receive buffer pages.
*/
static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
{
int pinned;
unsigned int npages;
unsigned long vaddr = tidbuf->vaddr;
struct page **pages = NULL;
struct hfi1_devdata *dd = fd->uctxt->dd;
/* Get the number of pages the user buffer spans */
npages = num_user_pages(vaddr, tidbuf->length);
if (!npages)
return -EINVAL;
if (npages > fd->uctxt->expected_count) {
dd_dev_err(dd, "Expected buffer too big\n");
return -EINVAL;
}
/* Verify that access is OK for the user buffer */
if (!access_ok(VERIFY_WRITE, (void __user *)vaddr,
npages * PAGE_SIZE)) {
dd_dev_err(dd, "Fail vaddr %p, %u pages, !access_ok\n",
(void *)vaddr, npages);
return -EFAULT;
}
/* Allocate the array of struct page pointers needed for pinning */
pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
if (!pages)
return -ENOMEM;
/*
* Doing the WC fill writes only makes sense if the device is
* present and the RcvArray has been mapped as WC memory.
* Pin all the pages of the user buffer. If we can't pin all the
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
if ((dd->flags & HFI1_PRESENT) && dd->rcvarray_wc)
writeq(0, dd->rcvarray_wc + (index * 8));
if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
}
tidbuf->pages = pages;
tidbuf->npages = npages;
fd->tid_n_pinned += pinned;
return pinned;
}
/*
@ -374,62 +319,33 @@ int hfi1_user_exp_rcv_setup(struct hfi1_filedata *fd,
int ret = 0, need_group = 0, pinned;
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_devdata *dd = uctxt->dd;
unsigned npages, ngroups, pageidx = 0, pageset_count, npagesets,
unsigned int ngroups, pageidx = 0, pageset_count,
tididx = 0, mapped, mapped_pages = 0;
unsigned long vaddr = tinfo->vaddr;
struct page **pages = NULL;
u32 *tidlist = NULL;
struct tid_pageset *pagesets = NULL;
struct tid_user_buf *tidbuf;
/* Get the number of pages the user buffer spans */
npages = num_user_pages(vaddr, tinfo->length);
if (!npages)
return -EINVAL;
if (npages > uctxt->expected_count) {
dd_dev_err(dd, "Expected buffer too big\n");
return -EINVAL;
}
/* Verify that access is OK for the user buffer */
if (!access_ok(VERIFY_WRITE, (void __user *)vaddr,
npages * PAGE_SIZE)) {
dd_dev_err(dd, "Fail vaddr %p, %u pages, !access_ok\n",
(void *)vaddr, npages);
return -EFAULT;
}
pagesets = kcalloc(uctxt->expected_count, sizeof(*pagesets),
GFP_KERNEL);
if (!pagesets)
tidbuf = kzalloc(sizeof(*tidbuf), GFP_KERNEL);
if (!tidbuf)
return -ENOMEM;
/* Allocate the array of struct page pointers needed for pinning */
pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
if (!pages) {
ret = -ENOMEM;
goto bail;
tidbuf->vaddr = tinfo->vaddr;
tidbuf->length = tinfo->length;
tidbuf->psets = kcalloc(uctxt->expected_count, sizeof(*tidbuf->psets),
GFP_KERNEL);
if (!tidbuf->psets) {
kfree(tidbuf);
return -ENOMEM;
}
/*
* Pin all the pages of the user buffer. If we can't pin all the
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
ret = -ENOMEM;
goto bail;
}
pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
pinned = pin_rcv_pages(fd, tidbuf);
if (pinned <= 0) {
ret = pinned;
goto bail;
kfree(tidbuf->psets);
kfree(tidbuf);
return pinned;
}
fd->tid_n_pinned += npages;
/* Find sets of physically contiguous pages */
npagesets = find_phys_blocks(pages, pinned, pagesets);
tidbuf->n_psets = find_phys_blocks(tidbuf, pinned);
/*
* We don't need to access this under a lock since tid_used is per
@ -437,10 +353,10 @@ int hfi1_user_exp_rcv_setup(struct hfi1_filedata *fd,
* and hfi1_user_exp_rcv_setup() at the same time.
*/
spin_lock(&fd->tid_lock);
if (fd->tid_used + npagesets > fd->tid_limit)
if (fd->tid_used + tidbuf->n_psets > fd->tid_limit)
pageset_count = fd->tid_limit - fd->tid_used;
else
pageset_count = npagesets;
pageset_count = tidbuf->n_psets;
spin_unlock(&fd->tid_lock);
if (!pageset_count)
@ -468,9 +384,9 @@ int hfi1_user_exp_rcv_setup(struct hfi1_filedata *fd,
struct tid_group *grp =
tid_group_pop(&uctxt->tid_group_list);
ret = program_rcvarray(fd, vaddr, grp, pagesets,
ret = program_rcvarray(fd, tidbuf, grp,
pageidx, dd->rcv_entries.group_size,
pages, tidlist, &tididx, &mapped);
tidlist, &tididx, &mapped);
/*
* If there was a failure to program the RcvArray
* entries for the entire group, reset the grp fields
@ -514,8 +430,8 @@ int hfi1_user_exp_rcv_setup(struct hfi1_filedata *fd,
unsigned use = min_t(unsigned, pageset_count - pageidx,
grp->size - grp->used);
ret = program_rcvarray(fd, vaddr, grp, pagesets,
pageidx, use, pages, tidlist,
ret = program_rcvarray(fd, tidbuf, grp,
pageidx, use, tidlist,
&tididx, &mapped);
if (ret < 0) {
hfi1_cdbg(TID,
@ -575,16 +491,14 @@ nomem:
* If not everything was mapped (due to insufficient RcvArray entries,
* for example), unpin all unmapped pages so we can pin them nex time.
*/
if (mapped_pages != pinned) {
hfi1_release_user_pages(fd->mm, &pages[mapped_pages],
pinned - mapped_pages,
false);
fd->tid_n_pinned -= pinned - mapped_pages;
}
if (mapped_pages != pinned)
unpin_rcv_pages(fd, tidbuf, NULL, mapped_pages,
(pinned - mapped_pages), false);
bail:
kfree(pagesets);
kfree(pages);
kfree(tidbuf->psets);
kfree(tidlist);
kfree(tidbuf->pages);
kfree(tidbuf);
return ret > 0 ? 0 : ret;
}
@ -674,11 +588,12 @@ int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
return ret;
}
static u32 find_phys_blocks(struct page **pages, unsigned npages,
struct tid_pageset *list)
static u32 find_phys_blocks(struct tid_user_buf *tidbuf, unsigned int npages)
{
unsigned pagecount, pageidx, setcount = 0, i;
unsigned long pfn, this_pfn;
struct page **pages = tidbuf->pages;
struct tid_pageset *list = tidbuf->psets;
if (!npages)
return 0;
@ -741,13 +656,13 @@ static u32 find_phys_blocks(struct page **pages, unsigned npages,
/**
* program_rcvarray() - program an RcvArray group with receive buffers
* @fd: filedata pointer
* @vaddr: starting user virtual address
* @tbuf: pointer to struct tid_user_buf that has the user buffer starting
* virtual address, buffer length, page pointers, pagesets (array of
* struct tid_pageset holding information on physically contiguous
* chunks from the user buffer), and other fields.
* @grp: RcvArray group
* @sets: array of struct tid_pageset holding information on physically
* contiguous chunks from the user buffer
* @start: starting index into sets array
* @count: number of struct tid_pageset's to program
* @pages: an array of struct page * for the user buffer
* @tidlist: the array of u32 elements when the information about the
* programmed RcvArray entries is to be encoded.
* @tididx: starting offset into tidlist
@ -765,11 +680,11 @@ static u32 find_phys_blocks(struct page **pages, unsigned npages,
* -ENOMEM or -EFAULT on error from set_rcvarray_entry(), or
* number of RcvArray entries programmed.
*/
static int program_rcvarray(struct hfi1_filedata *fd, unsigned long vaddr,
static int program_rcvarray(struct hfi1_filedata *fd, struct tid_user_buf *tbuf,
struct tid_group *grp,
struct tid_pageset *sets,
unsigned start, u16 count, struct page **pages,
u32 *tidlist, unsigned *tididx, unsigned *pmapped)
unsigned int start, u16 count,
u32 *tidlist, unsigned int *tididx,
unsigned int *pmapped)
{
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_devdata *dd = uctxt->dd;
@ -808,11 +723,11 @@ static int program_rcvarray(struct hfi1_filedata *fd, unsigned long vaddr,
}
rcventry = grp->base + useidx;
npages = sets[setidx].count;
pageidx = sets[setidx].idx;
npages = tbuf->psets[setidx].count;
pageidx = tbuf->psets[setidx].idx;
ret = set_rcvarray_entry(fd, vaddr + (pageidx * PAGE_SIZE),
rcventry, grp, pages + pageidx,
ret = set_rcvarray_entry(fd, tbuf,
rcventry, grp, pageidx,
npages);
if (ret)
return ret;
@ -833,15 +748,17 @@ static int program_rcvarray(struct hfi1_filedata *fd, unsigned long vaddr,
return idx;
}
static int set_rcvarray_entry(struct hfi1_filedata *fd, unsigned long vaddr,
static int set_rcvarray_entry(struct hfi1_filedata *fd,
struct tid_user_buf *tbuf,
u32 rcventry, struct tid_group *grp,
struct page **pages, unsigned npages)
u16 pageidx, unsigned int npages)
{
int ret;
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct tid_rb_node *node;
struct hfi1_devdata *dd = uctxt->dd;
dma_addr_t phys;
struct page **pages = tbuf->pages + pageidx;
/*
* Allocate the node first so we can handle a potential
@ -862,7 +779,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd, unsigned long vaddr,
return -EFAULT;
}
node->mmu.addr = vaddr;
node->mmu.addr = tbuf->vaddr + (pageidx * PAGE_SIZE);
node->mmu.len = npages * PAGE_SIZE;
node->phys = page_to_phys(pages[0]);
node->npages = npages;
@ -935,17 +852,13 @@ static void clear_tid_node(struct hfi1_filedata *fd, struct tid_rb_node *node)
node->npages, node->mmu.addr, node->phys,
node->dma_addr);
hfi1_put_tid(dd, node->rcventry, PT_INVALID, 0, 0);
/*
* Make sure device has seen the write before we unpin the
* pages.
*/
flush_wc();
hfi1_put_tid(dd, node->rcventry, PT_INVALID_FLUSH, 0, 0);
pci_unmap_single(dd->pcidev, node->dma_addr, node->mmu.len,
PCI_DMA_FROMDEVICE);
hfi1_release_user_pages(fd->mm, node->pages, node->npages, true);
fd->tid_n_pinned -= node->npages;
unpin_rcv_pages(fd, NULL, node, 0, node->npages, true);
node->grp->used--;
node->grp->map &= ~(1 << (node->rcventry - node->grp->base));

View File

@ -49,30 +49,44 @@
#include "hfi.h"
#define EXP_TID_TIDLEN_MASK 0x7FFULL
#define EXP_TID_TIDLEN_SHIFT 0
#define EXP_TID_TIDCTRL_MASK 0x3ULL
#define EXP_TID_TIDCTRL_SHIFT 20
#define EXP_TID_TIDIDX_MASK 0x3FFULL
#define EXP_TID_TIDIDX_SHIFT 22
#define EXP_TID_GET(tid, field) \
(((tid) >> EXP_TID_TID##field##_SHIFT) & EXP_TID_TID##field##_MASK)
#include "exp_rcv.h"
#define EXP_TID_SET(field, value) \
(((value) & EXP_TID_TID##field##_MASK) << \
EXP_TID_TID##field##_SHIFT)
#define EXP_TID_CLEAR(tid, field) ({ \
(tid) &= ~(EXP_TID_TID##field##_MASK << \
EXP_TID_TID##field##_SHIFT); \
})
#define EXP_TID_RESET(tid, field, value) do { \
EXP_TID_CLEAR(tid, field); \
(tid) |= EXP_TID_SET(field, (value)); \
} while (0)
struct tid_pageset {
u16 idx;
u16 count;
};
void hfi1_user_exp_rcv_grp_free(struct hfi1_ctxtdata *uctxt);
int hfi1_user_exp_rcv_grp_init(struct hfi1_filedata *fd);
int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd);
struct tid_user_buf {
unsigned long vaddr;
unsigned long length;
unsigned int npages;
struct page **pages;
struct tid_pageset *psets;
unsigned int n_psets;
};
struct tid_rb_node {
struct mmu_rb_node mmu;
unsigned long phys;
struct tid_group *grp;
u32 rcventry;
dma_addr_t dma_addr;
bool freed;
unsigned int npages;
struct page *pages[0];
};
static inline int num_user_pages(unsigned long addr,
unsigned long len)
{
const unsigned long spage = addr & PAGE_MASK;
const unsigned long epage = (addr + len - 1) & PAGE_MASK;
return 1 + ((epage - spage) >> PAGE_SHIFT);
}
int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt);
void hfi1_user_exp_rcv_free(struct hfi1_filedata *fd);
int hfi1_user_exp_rcv_setup(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);

Some files were not shown because too many files have changed in this diff Show More