3965 Commits

Author SHA1 Message Date
Haggai Eran
8605933a22 IB/mlx5: Add MR to radix tree in reg_mr_callback
For memory regions that are allocated using reg_umr, the suffix of
mlx5_core_create_mkey isn't being called.  Instead the creation is
completed in a callback function (reg_mr_callback).  This means that
these MRs aren't being added to the MR radix tree.  Add them in the
callback.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:05 -07:00
Haggai Eran
096f7e72c6 IB/mlx5: Fix error handling in reg_umr
If ib_post_send fails when posting the UMR work request in reg_umr,
the code doesn't release the temporary pas buffer allocated, and
doesn't dma_unmap it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:05 -07:00
Sagi Grimberg
c7f44fbda6 mlx5_core: Copy DIF fields only when input and output space values match
Some DIF implementations (SCSI initiator/target) may want to use different
input/output values for application tag and/or reference tag. So in
case memory/wire domain values don't match HW must not copy them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:02 -07:00
Sagi Grimberg
5c273b1677 mlx5_core: Simplify signature handover wqe for interleaved buffers
No need for repetition format pattern in case the data and protection
are already interleaved in the memory domain since the pattern
already exists. A single key entry is sufficient and may save some
extra fetch ops.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:52:58 -07:00
Sagi Grimberg
8524867b9c mlx5_core: Fix signature handover operation for interleaved buffers
When the data and protection are interleaved in the memory domain, no
need to expand the mkey total length.

At the moment no Linux user works (iSER initiator & target) in
interleaved mode. This may change in the future as for SCSI
pass-through devices there is no real point in target performing
de-interleaving and re-interleaving of the protection data in the PT
stage. Regardless, signature verbs support this mode.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:52:54 -07:00
Or Gerlitz
c7ca4b69d9 IB/iser: Bump version to 1.4
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Roi Dayan
e7eeffa4a0 IB/iser: Add missing newlines to logging messages
Logging messages need terminating newlines to avoid possible message
interleaving.  Add them.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Ariel Nahum
66d4e62d27 IB/iser: Fix a possible race in iser connection states transition
In some circumstances (multiple targets), RDMA_CM ESTABLISHED event
and ep_disconnect may race. In this case, the iser connection state
may transition to UP (after ep_disconnect transitioned it to
TERMINATING), while the connection is being torn down.

Upon RDMA_CM event ESTABLISHED we allow iser connection state to
transition to UP only from PENDING. We also make sure to protect this
state change (done under the connection lock).

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Ariel Nahum
b73c3adabd IB/iser: Simplify connection management
iSER relies on refcounting to manage iser connections establishment
and teardown.

Following commit 39ff05dbbbdb ("IB/iser: Enhance disconnection logic
for multi-pathing"), iser connection maintain 3 references:

 - iscsi_endpoint (at creation stage)
 - cma_id (at connection request stage)
 - iscsi_conn (at bind stage)

We can avoid taking explicit refcounts by correctly serializing iser
teardown flows (graceful and non-graceful).

Our approach is to trigger a scheduled work to handle ordered teardown
by gracefully waiting for 2 cleanup stages to complete:

 1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion
 2. Flush errors processing

Each completed stage will notify a waiting worker thread when it is
done to allow teardwon continuation.

Since iSCSI connection establishment may trigger endpoint disconnect
without a successful endpoint connect, we rely on the iscsi <-> iser
binding (.conn_bind) to learn about the teardown policy we should take
wrt cleanup stages.

Since all cleanup worker threads are scheduled (release_wq) in
.ep_disconnect it is safe to assume that when module_exit is called,
all cleanup workers are already scheduled. Thus proper module unload
shall flush all scheduled works before allowing safe exit, to
guarantee no resources got left behind.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Linus Torvalds
5fa6a683c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "It looks like a sizeble collection but this is nearly 3 weeks of bug
  fixing while you were away.

   1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute
      the packet through a non-IPSEC protected path and the code has to
      be able to handle SKBs attached to routes lacking an attached xfrm
      state.  From Steffen Klassert.

   2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported
      sub-protocols, also from Steffen Klassert.

   3) Set local_df on fragmented netfilter skbs otherwise we won't be
      able to forward successfully, from Florian Westphal.

   4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without
      holding RCU lock, from Bjorn Mork.

   5) local_df test in ip_may_fragment is inverted, from Florian
      Westphal.

   6) jme driver doesn't check for DMA mapping failures, from Neil
      Horman.

   7) qlogic driver doesn't calculate number of TX queues properly, from
      Shahed Shaikh.

   8) fib_info_cnt can drift irreversibly positive if we fail to
      allocate the fi->fib_metrics array, from Sergey Popovich.

   9) Fix use after free in ip6_route_me_harder(), also from Sergey
      Popovich.

  10) When SYSCTL is disabled, we don't handle local_port_range and
      ping_group_range defaults properly at all, from Cong Wang.

  11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim
      driver, fix from Bjorn Mork.

  12) cassini driver needs nested lock annotations for TX locking, from
      Emil Goode.

  13) On init error ipv6 VTI driver can unregister pernet ops twice,
      oops.  Fix from Mahtias Krause.

  14) If macvlan device is down, don't propagate IFF_ALLMULTI changes,
      from Peter Christensen.

  15) Missing NULL pointer check while parsing netlink config options in
      ip6_tnl_validate().  From Susant Sahani.

  16) Fix handling of neighbour entries during ipv6 router reachability
      probing, from Duan Jiong.

  17) x86 and s390 JIT address randomization has some address
      calculation bugs leading to crashes, from Alexei Starovoitov and
      Heiko Carstens.

  18) Clear up those uglies with nop patching and net_get_random_once(),
      from Hannes Frederic Sowa.

  19) Option length miscalculated in ip6_append_data(), fix also from
      Hannes Frederic Sowa.

  20) A while ago we fixed a race during device unregistry when a
      namespace went down, turns out there is a second place that needs
      similar protection.  From Cong Wang.

  21) In the new Altera TSE driver multicast filtering isn't working,
      disable it and just use promisc mode until the cause is found.
      From Vince Bridgers.

  22) When we disable router enabling in ipv6 we have to flush the
      cached routes explicitly, from Duan Jiong.

  23) NBMA tunnels should not cache routes on the tunnel object because
      the key is variable, from Timo Teräs.

  24) With stacked devices GRO information in skb->cb[] can be not setup
      properly, make sure it is in all code paths.  From Eric Dumazet.

  25) Really fix stacked vlan locking, multiple levels of nesting with
      intervening non-vlan devices are possible.  From Vlad Yasevich.

  26) Fallback ipip tunnel device's mtu is not setup properly, from
      Steffen Klassert.

  27) The packet scheduler's tcindex filter can crash because we
      structure copy objects with list_head's inside, oops.  From Cong
      Wang.

  28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric
      Dumazet.

  29) In some configurations 'itag' in __mkroute_input() can end up
      being used uninitialized because of how fib_validate_source()
      works.  Fix it by explitly initializing itag to zero like all the
      other fib_validate_source() callers do, from Li RongQing"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  batman: fix a bogus warning from batadv_is_on_batman_iface()
  ipv4: initialise the itag variable in __mkroute_input
  bonding: Send ALB learning packets using the right source
  bonding: Don't assume 802.1Q when sending alb learning packets.
  net: doc: Update references to skb->rxhash
  stmmac: Remove unbalanced clk_disable call
  ipv6: gro: fix CHECKSUM_COMPLETE support
  net_sched: fix an oops in tcindex filter
  can: peak_pci: prevent use after free at netdev removal
  ip_tunnel: Initialize the fallback device properly
  vlan: Fix build error wth vlan_get_encap_level()
  can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option
  MAINTAINERS: Pravin Shelar is Open vSwitch maintainer.
  bnx2x: Convert return 0 to return rc
  bonding: Fix alb mode to only use first level vlans.
  bonding: Fix stacked device detection in arp monitoring
  macvlan: Fix lockdep warnings with stacked macvlan devices
  vlan: Fix lockdep warning with stacked vlan devices.
  net: Allow for more then a single subclass for netif_addr_lock
  net: Find the nesting level of a given device by type.
  ...
2014-05-23 15:29:43 -07:00
Linus Torvalds
a7aa96a92e Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi target fixes from Nicholas Bellinger:
 "This series include:

   - Close race between iser-target network portal shutdown + accepting
     new connection logins (sagi)
   - Fix free-after-use regression in tcm_fc post conversion to
     percpu-ida pre-allocation (nab)
   - Explicitly disable Immediate + Unsolicited Data for iser-target
     connections when T10-PI is enabled (sagi + nab)
   - Allow pi_prot_type + emulate_write_cache attributes to be set to
     zero regardless of backend support (andy)
   - memory leak fix (mikulas)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: fix memory leak on XCOPY
  target: Don't allow setting WC emulation if device doesn't support
  iscsi-target: Disable Immediate + Unsolicited Data with ISER Protection
  tcm_fc: Fix free-after-use regression in ft_free_cmd
  iscsi-target: Change BUG_ON to REJECT in iscsit_process_nop_out
  Target/iscsi,iser: Avoid accepting transport connections during stop stage
  Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
  Target/iser: Fix wrong connection requests list addition
  target: Allow non-supporting backends to set pi_prot_type to 0
2014-05-21 18:03:14 +09:00
Sagi Grimberg
5f80ff8ecc Target/iser: Gracefully reject T10-PI enabled connect request if not supported
In case user chose to set T10-PI enable on the target while
the IB device does not support it, gracefully reject the request.

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:18:44 -07:00
Sagi Grimberg
f5ebec9629 Target/iser: Wait for proper cleanup before unloading
disconnected_handler works are scheduled on system_wq.
When attempting to unload, first make sure all works
have cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:18:43 -07:00
Sagi Grimberg
88c4015fda Target/iser: Improve cm events handling
There are 4 RDMA_CM events that all basically mean that
the user should teardown the IB connection:
- DISCONNECTED
- ADDR_CHANGE
- DEVICE_REMOVAL
- TIMEWAIT_EXIT

Only in DISCONNECTED/ADDR_CHANGE it makes sense to
call rdma_disconnect (send DREQ/DREP to our initiator).
So we keep the same teardown handler for all of them
but only indicate calling rdma_disconnect for the relevant
events.

This patch also removes redundant debug prints for each single
event.

v2 changes:
 - Call isert_disconnected_handler() for DEVICE_REMOVAL (Or + Sag)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:17:40 -07:00
Bart Van Assche
5cfb17828d IB/srp: Add fast registration support
Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support fast registration but not FMR. Hence add fast
registration support.

In function srp_rport_reconnect(), move the the srp_finish_req()
loop from after to before the srp_create_target_ib() call. This is
needed to avoid that srp_finish_req() tries to queue any
invalidation requests for rkeys associated with the old queue pair
on the newly allocated queue pair. Invoking srp_finish_req() before
the queue pair has been reallocated is safe since srp_claim_req()
handles completions correctly that arrive after srp_finish_req()
has been invoked.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
52ede08f00 IB/srp: Rename FMR-related variables
The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality. The start of this patch was
the changes applied via the following shell command:

sed -i.orig 's/SRP_FMR_SIZE/SRP_MAX_PAGES_PER_MR/g; \
    s/fmr_page_mask/mr_page_mask/g;s/fmr_page_size/mr_page_size/g; \
    s/fmr_page_shift/mr_page_shift/g;s/fmr_max_size/mr_max_size/g; \
    s/max_pages_per_fmr/max_pages_per_mr/g;s/nfmr/nmdesc/g; \
    s/fmr_len/dma_len/g' drivers/infiniband/ulp/srp/ib_srp.[ch]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
d1b4289e16 IB/srp: One FMR pool per SRP connection
Allocate one FMR pool per SRP connection instead of one SRP pool
per HCA. This improves scalability of the SRP initiator.

Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.

Tell the SCSI mid-layer to reduce queue depth temporarily in the
unlikely case where an application is queuing many requests with
more than max_pages_per_fmr sg-list elements.

For FMR pool allocation, base the max_pages_per_fmr parameter on
the HCA memory registration limit. Only try to allocate an FMR
pool if FMR is supported.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
b1b8854d16 IB/srp: Introduce the 'register_always' kernel module parameter
Add a kernel module parameter that enables memory registration also for SG-lists
that can be processed without memory registration. This makes it easier for kernel
developers to test the memory registration code.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
539dde6fc5 IB/srp: Introduce srp_finish_mapping()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
76bc1e1ddd IB/srp: Introduce srp_map_fmr()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
62154b2e89 IB/srp: Introduce an additional local variable
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
af24663bc8 IB/srp: Fix kernel-doc warnings
Avoid that the kernel-doc tool warns about missing argument
descriptions for the ib_srp.[ch] source files.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
024ca90151 IB/srp: Fix a sporadic crash triggered by cable pulling
Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.

Reported-by: Sagi Grimberg <sagig@mellanox.com>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Steve Wise
11b8e22d4d RDMA/cxgb4: Fix vlan support
RDMA connections over a vlan interface don't work due to
import_ep() not using the correct egress device.

 - use the real device in import_ep()
 - use rdma_vlan_dev_real_dev() in get_real_dev().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-19 18:00:32 -07:00
Duan Jiong
0cc65dd691 RDMA/ocrdma: Convert to use simple_open()
This removes an open-coded duplicate of simple_open().

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-19 17:55:54 -07:00
Christoph Jaeger
65b302ad31 RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths
c4iw_alloc() bails out without freeing the storage that 'devp' points to.

Picked up by Coverity - CID 1204241.

Fixes: fa658a98a2 ("RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices")
Signed-off-by: Christoph Jaeger <christophjaeger@linux.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-19 17:55:43 -07:00
Sagi Grimberg
9d49f5e284 Target/iser: Fix hangs in connection teardown
In ungraceful teardowns isert close flows seem racy such that
isert_wait_conn hangs as RDMA_CM_EVENT_DISCONNECTED never
gets invoked (no one called rdma_disconnect).

Both graceful and ungraceful teardowns will have rx flush errors
(isert posts a batch once connection is established). Once all
flush errors are consumed we invoke isert_wait_conn and it will
be responsible for calling rdma_disconnect. This way it can be
sure that rdma_disconnect was called and it won't wait forever.

This patch also removes the logout_posted indicator. either the
logout completion was consumed and no problem decrementing the
post_send_buf_count, or it was consumed as a flush error. no point
of keeping it for isert_wait_conn as there is no danger that
isert_conn will be accidentally removed while it is running.

(Drop unnecessary sleep_on_conn_wait_comp check in
 isert_cq_rx_comp_err - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:38:49 -07:00
Sagi Grimberg
e346ab343f Target/iser: Bail from accept_np if np_thread is trying to close
In case np_thread state is in RESET/SHUTDOWN/EXIT states,
no point for isert to stall there as we may get a hang in
case no one will wake it up later.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:32:58 -07:00
Matan Barak
9433c18891 IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes
When we receive a netdev event indicating a netdev change and/or
a netdev address change, we must change the MAC index used by the
proxy QP1 (in the QP context), otherwise RoCE CM packets sent by the
VF will not carry the same source MAC address as the non-CM packets.

We use the UPDATE_QP command to perform this change.

In order to avoid modifying a QP context based on netdev event,
while the driver attempts to destroy this QP (e.g either the mlx4_ib
or ib_mad modules are unloaded), we use mutex locking in both flows.

Since the relevant mlx4 proxy GSI QP is created indirectly by the
mad module when they create their GSI QP, the mlx4 didn't need to
keep track on that QP prior to this change.

Now, when QP modifications are needed to this QP from within the
driver, we added refernece to it.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 15:12:45 -04:00
Sagi Grimberg
14f4b54fe3 Target/iscsi,iser: Avoid accepting transport connections during stop stage
When the target is in stop stage, iSER transport initiates RDMA disconnects.
The iSER initiator may wish to establish a new connection over the
still existing network portal. In this case iSER transport should not
accept and resume new RDMA connections. In order to learn that, iscsi_np
is added with enabled flag so the iSER transport can check when deciding
weather to accept and resume a new connection request.

The iscsi_np is enabled after successful transport setup, and disabled
before iscsi_np login threads are cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:11 -07:00
Sagi Grimberg
531b7bf4bd Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
RDMA CM and iSCSI target flows are asynchronous and completely
uncorrelated. Relying on the fact that iscsi_accept_np will be called
after CM connection request event and will wait for it is a mistake.

When attempting to login to a few targets this flow is racy and
unpredictable, but for parallel login to dozens of targets will
race and hang every time.

The correct synchronizing mechanism in this case is pending on
a semaphore rather than a wait_for_event. We keep the pending
interruptible for iscsi_np cleanup stage.

(Squash patch to remove dead code into parent - nab)

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Sagi Grimberg
9fe63c88b1 Target/iser: Fix wrong connection requests list addition
Should be adding list_add_tail($new, $head) and not
the other way around.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Wilfried Klaebe
7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Hariprasad S
7d0a73a40c RDMA/cxgb4: Update Kconfig to include Chelsio T5 adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise
c2f9da92f2 RDMA/cxgb4: Only allow kernel db ringing for T4 devs
The whole db drop avoidance stuff is for T4 only.  So we cannot allow
that to be enabled for T5 devices.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise
92e5011ab0 RDMA/cxgb4: Force T5 connections to use TAHOE congestion control
This is required to work around a T5 HW issue.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise
cc18b939e1 RDMA/cxgb4: Fix endpoint mutex deadlocks
In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock
because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
!internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
The design was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should
disconnect after releasing the endpoint mutex.  Now rx_data() will do
the disconnect in the cases where process_mpa_reply() wants to
disconnect after the TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
and to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the
state, and make all calls to abort_connection() do so with the mutex
held.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Linus Torvalds
38137a5188 InfiniBand/RDMA updates for 3.15-rc2:
- Mostly cxgb4 fixes unblocked by the merge of some prerequisites via
    the net tree.
 
  - Drop deprecated MSI-X API use.
 
  - A couple other miscellaneous things.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTUXCpAAoJEENa44ZhAt0h4tEP/1P1TqA/90sO7WeChze6H2ph
 kkGJPsCj8i1AWpef/Ax2hvoH1pjxK/VQbpwgkGJElv+rRaP99pandYGPBFluUoJK
 jI6D/mZSAsl6BXIZZCTcz9ttQNYacDdDL1I22WIdntkkCUsoP1awuHnXS4WLAiPA
 3D1XmiSHTGv9nbqd2wKA3sHWFaUN/s+RqsvIXUcNrN4qEUL2KXPOyt7i4Pz8eUq7
 /lxG1S854QUrZsW7qFR6Jc3clhNupUDWBBRrII3MOV98W3upHOVsFAdtcLZAkcTA
 FBCk90VVi0hsPeHqqCZjKUW/JE414vEDzOZfmYOpMj0xzc9sF1POxlvqMfRjvKTD
 fwYAkNLT7AW74jSU38KZl9pdLsQx187c75pcYUZ4ph6m6bozbtlPGkJaa3M3yyUb
 DmWUZL1dX8i69dV+w/ABz/4uK0dbCwMeAM9cCCUiedqI5wIuEV4Ld3NOLe7HnAXz
 O81UrVVnwrOVl8uPmGzY3VliU0OtFqhMBCxTgsZRxuBr9DSx/dUzb4Tzfw7e3Sc+
 +uxMMnbuCDysqEfzKsLG0+qu9P3hcwyafTJ9u8pq0D+BCB3tA8ZIKiqrKeMBua2h
 IJAy8M+zVG2HZCQFreH2oMnXmzVoyp+5Yl28izMDEfDptJ6hel4FLBiLLlYnU1DO
 zxUN5X1yOHxFkK4YYetE
 =/PTZ
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:

 - mostly cxgb4 fixes unblocked by the merge of some prerequisites via
   the net tree

 - drop deprecated MSI-X API use.

 - a couple other miscellaneous things.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix over-dereference when terminating
  RDMA/cxgb4: Use uninitialized_var()
  RDMA/cxgb4: Add missing debug stats
  RDMA/cxgb4: Initialize reserved fields in a FW work request
  RDMA/cxgb4: Use pr_warn_ratelimited
  RDMA/cxgb4: Max fastreg depth depends on DSGL support
  RDMA/cxgb4: SQ flush fix
  RDMA/cxgb4: rmb() after reading valid gen bit
  RDMA/cxgb4: Endpoint timeout fixes
  RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
  IB/mlx5: Add block multicast loopback support
  IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()
  IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()
2014-04-18 13:49:42 -07:00
Linus Torvalds
141eaccd01 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the target pending updates for v3.15-rc1.  Apologies in
  advance for waiting until the second to last day of the merge window
  to send these out.

  The highlights this round include:

   - iser-target support for T10 PI (DIF) offloads (Sagi + Or)
   - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
   - Pass in transport supported PI at session initialization (Sagi + MKP + nab)
   - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
   - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
   - Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
   - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)

  Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
  metadata into KVM guest have been left-out for now, as there where a
  few comments from MST + Paolo that where not able to be addressed in
  time for v3.15.  Please expect this feature for v3.16-rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
  ib_srpt: Use correct ib_sg_dma primitives
  target/tcm_fc: Rename ft_tport_create to ft_tport_get
  target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
  target/tcm_fc: Rename structs and list members for clarity
  target/tcm_fc: Limit to 1 TPG per wwn
  target/tcm_fc: Don't export ft_lport_list
  target/tcm_fc: Fix use-after-free of ft_tpg
  target: Add check to prevent Abort Task from aborting itself
  target: Enable READ_STRIP emulation in target_complete_ok_work
  target/sbc: Add sbc_dif_read_strip software emulation
  target: Enable WRITE_INSERT emulation in target_execute_cmd
  target/sbc: Add sbc_dif_generate software emulation
  target/sbc: Only expose PI read_cap16 bits when supported by fabric
  target/spc: Only expose PI mode page bits when supported by fabric
  target/spc: Only expose PI inquiry bits when supported by fabric
  target: Pass in transport supported PI at session initialization
  target/iblock: Fix double bioset_integrity_free bug
  Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
  target/rd: T10-Dif: RAM disk is allocating more space than required.
  iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
  ...
2014-04-12 16:51:08 -07:00
Mike Marciniszyn
b076808051 ib_srpt: Use correct ib_sg_dma primitives
The code was incorrectly using sg_dma_address() and
sg_dma_len() instead of ib_sg_dma_address() and
ib_sg_dma_len().

This prevents srpt from functioning with the
Intel HCA and indeed will corrupt memory
badly.

Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-11 15:31:20 -07:00
Roland Dreier
5ae2866f52 Merge branches 'cxgb4', 'misc', 'mlx5' and 'qib' into for-next 2014-04-11 11:36:15 -07:00
Steve Wise
1d1ca9b4fd RDMA/cxgb4: Fix over-dereference when terminating
Need to get the endpoint reference before calling rdma_fini(), which
might fail causing us to not get the reference.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:10 -07:00
Steve Wise
97df1c6736 RDMA/cxgb4: Use uninitialized_var()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:10 -07:00
Steve Wise
98a3e87990 RDMA/cxgb4: Add missing debug stats
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:09 -07:00
Steve Wise
c3f98fa291 RDMA/cxgb4: Initialize reserved fields in a FW work request
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:09 -07:00
Hariprasad Shenai
aec844df10 RDMA/cxgb4: Use pr_warn_ratelimited
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:08 -07:00
Steve Wise
a03d9f94cc RDMA/cxgb4: Max fastreg depth depends on DSGL support
The max depth of a fastreg mr depends on whether the device supports
DSGL or not.  So compute it dynamically based on the device support
and the module use_dsgl option.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:08 -07:00
Steve Wise
b4e2901c52 RDMA/cxgb4: SQ flush fix
There is a race when moving a QP from RTS->CLOSING where a SQ work
request could be posted after the FW receives the RDMA_RI/FINI WR.
The SQ work request will never get processed, and should be completed
with FLUSHED status.  Function c4iw_flush_sq(), however was dropping
the oldest SQ work request when in CLOSING or IDLE states, instead of
completing the pending work request. If that oldest pending work
request was actually complete and has a CQE in the CQ, then when that
CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
SQ/CQ state.

This is a very small timing hole and has only been hit once so far.

The fix is two-fold:

1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
   status regardless of the QP state.

2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
   before moving the state out of RTS.  This ensures that the state
   transition will not happen while another thread is in
   post_rc_send(), because set_state() and post_rc_send() both aquire
   the qp spinlock.  Also, once we transition the state out of RTS,
   subsequent calls to post_rc_send() will fail because the "in error"
   bit is set.  I don't think this fully closes the race where the FW
   can get a FINI followed a SQ work request being posted (because
   they are posted to differente EQs), but the #1 fix will handle the
   issue by flushing the SQ work request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:08 -07:00
Steve Wise
def4771f4b RDMA/cxgb4: rmb() after reading valid gen bit
Some HW platforms can reorder read operations, so we must rmb() after
we see a valid gen bit in a CQE but before we read any other fields
from the CQE.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:07 -07:00