2006-01-02 19:04:38 +01:00
|
|
|
/*
|
|
|
|
* net/tipc/bearer.h: Include file for TIPC bearer code
|
2007-02-09 23:25:21 +09:00
|
|
|
*
|
tipc: add neighbor monitoring framework
TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.
This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.
This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:
- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.
- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".
- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.
- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.
- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.
- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.
- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.
- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.
- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.
- This change is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-13 20:46:22 -04:00
|
|
|
* Copyright (c) 1996-2006, 2013-2016, Ericsson AB
|
2011-01-07 13:00:11 -05:00
|
|
|
* Copyright (c) 2005, 2010-2011, Wind River Systems
|
2006-01-02 19:04:38 +01:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
2006-01-11 13:30:43 +01:00
|
|
|
* Redistribution and use in source and binary forms, with or without
|
2006-01-02 19:04:38 +01:00
|
|
|
* modification, are permitted provided that the following conditions are met:
|
|
|
|
*
|
2006-01-11 13:30:43 +01:00
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. Neither the names of the copyright holders nor the names of its
|
|
|
|
* contributors may be used to endorse or promote products derived from
|
|
|
|
* this software without specific prior written permission.
|
2006-01-02 19:04:38 +01:00
|
|
|
*
|
2006-01-11 13:30:43 +01:00
|
|
|
* Alternatively, this software may be distributed under the terms of the
|
|
|
|
* GNU General Public License ("GPL") version 2 as published by the Free
|
|
|
|
* Software Foundation.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
|
|
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
|
|
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
2006-01-02 19:04:38 +01:00
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef _TIPC_BEARER_H
|
|
|
|
#define _TIPC_BEARER_H
|
|
|
|
|
2014-11-20 10:29:07 +01:00
|
|
|
#include "netlink.h"
|
2015-05-14 10:46:13 -04:00
|
|
|
#include "core.h"
|
2016-12-02 09:33:41 +01:00
|
|
|
#include "msg.h"
|
2014-11-20 10:29:07 +01:00
|
|
|
#include <net/genetlink.h>
|
2006-01-02 19:04:38 +01:00
|
|
|
|
2015-03-05 10:23:49 +01:00
|
|
|
#define MAX_MEDIA 3
|
2006-01-02 19:04:38 +01:00
|
|
|
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:13 -04:00
|
|
|
/* Identifiers associated with TIPC message header media address info
|
|
|
|
* - address info field is 32 bytes long
|
|
|
|
* - the field's actual content and length is defined per media
|
|
|
|
* - remaining unused bytes in the field are set to zero
|
2011-10-07 13:37:34 -04:00
|
|
|
*/
|
2015-02-27 08:56:57 +01:00
|
|
|
#define TIPC_MEDIA_INFO_SIZE 32
|
2011-10-07 13:37:34 -04:00
|
|
|
#define TIPC_MEDIA_TYPE_OFFSET 3
|
2015-02-27 08:56:58 +01:00
|
|
|
#define TIPC_MEDIA_ADDR_OFFSET 4
|
2011-10-07 13:37:34 -04:00
|
|
|
|
2010-11-30 12:00:53 +00:00
|
|
|
/*
|
|
|
|
* Identifiers of supported TIPC media types
|
|
|
|
*/
|
|
|
|
#define TIPC_MEDIA_TYPE_ETH 1
|
2013-04-17 06:18:28 +00:00
|
|
|
#define TIPC_MEDIA_TYPE_IB 2
|
2015-03-05 10:23:49 +01:00
|
|
|
#define TIPC_MEDIA_TYPE_UDP 3
|
2010-11-30 12:00:53 +00:00
|
|
|
|
2017-01-18 13:50:50 -05:00
|
|
|
/* Minimum bearer MTU */
|
2016-12-02 09:33:41 +01:00
|
|
|
#define TIPC_MIN_BEARER_MTU (MAX_H_SIZE + INT_H_SIZE)
|
|
|
|
|
2017-01-18 13:50:50 -05:00
|
|
|
/* Identifiers for distinguishing between broadcast/multicast and replicast
|
|
|
|
*/
|
|
|
|
#define TIPC_BROADCAST_SUPPORT 1
|
|
|
|
#define TIPC_REPLICAST_SUPPORT 2
|
|
|
|
|
2012-07-10 10:55:09 +00:00
|
|
|
/**
|
2011-10-07 15:19:11 -04:00
|
|
|
* struct tipc_media_addr - destination address used by TIPC bearers
|
|
|
|
* @value: address info (format defined by media)
|
|
|
|
* @media_id: TIPC media type identifier
|
|
|
|
* @broadcast: non-zero if address is a broadcast address
|
2010-11-30 12:00:53 +00:00
|
|
|
*/
|
|
|
|
struct tipc_media_addr {
|
2015-02-27 08:56:57 +01:00
|
|
|
u8 value[TIPC_MEDIA_INFO_SIZE];
|
2011-10-07 15:19:11 -04:00
|
|
|
u8 media_id;
|
|
|
|
u8 broadcast;
|
2010-11-30 12:00:53 +00:00
|
|
|
};
|
|
|
|
|
2011-01-07 13:00:11 -05:00
|
|
|
struct tipc_bearer;
|
2006-01-02 19:04:38 +01:00
|
|
|
|
|
|
|
/**
|
2013-12-10 20:45:40 -08:00
|
|
|
* struct tipc_media - Media specific info exposed to generic bearer layer
|
2006-01-02 19:04:38 +01:00
|
|
|
* @send_msg: routine which handles buffer transmission
|
2013-10-18 07:23:17 +02:00
|
|
|
* @enable_media: routine which enables a media
|
|
|
|
* @disable_media: routine which disables a media
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:13 -04:00
|
|
|
* @addr2str: convert media address format to string
|
|
|
|
* @addr2msg: convert from media addr format to discovery msg addr format
|
|
|
|
* @msg2addr: convert from discovery msg addr format to media addr format
|
|
|
|
* @raw2addr: convert from raw addr format to media addr format
|
2006-01-02 19:04:38 +01:00
|
|
|
* @priority: default link (and bearer) priority
|
|
|
|
* @tolerance: default time (in ms) before declaring link failure
|
2020-11-29 10:32:43 -08:00
|
|
|
* @min_win: minimum window (in packets) before declaring link congestion
|
|
|
|
* @max_win: maximum window (in packets) before declaring link congestion
|
2018-04-19 11:06:19 +02:00
|
|
|
* @mtu: max packet size bearer can support for media type not dependent on
|
|
|
|
* underlying device MTU
|
2010-11-30 12:00:53 +00:00
|
|
|
* @type_id: TIPC media identifier
|
2013-12-10 20:45:43 -08:00
|
|
|
* @hwaddr_len: TIPC media address len
|
2006-01-02 19:04:38 +01:00
|
|
|
* @name: media name
|
|
|
|
*/
|
2011-12-29 20:19:42 -05:00
|
|
|
struct tipc_media {
|
2015-01-09 15:27:07 +08:00
|
|
|
int (*send_msg)(struct net *net, struct sk_buff *buf,
|
2015-11-19 14:30:47 -05:00
|
|
|
struct tipc_bearer *b,
|
2006-01-02 19:04:38 +01:00
|
|
|
struct tipc_media_addr *dest);
|
2015-11-19 14:30:47 -05:00
|
|
|
int (*enable_media)(struct net *net, struct tipc_bearer *b,
|
2015-03-05 10:23:49 +01:00
|
|
|
struct nlattr *attr[]);
|
2015-11-19 14:30:47 -05:00
|
|
|
void (*disable_media)(struct tipc_bearer *b);
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:13 -04:00
|
|
|
int (*addr2str)(struct tipc_media_addr *addr,
|
|
|
|
char *strbuf,
|
|
|
|
int bufsz);
|
|
|
|
int (*addr2msg)(char *msg, struct tipc_media_addr *addr);
|
|
|
|
int (*msg2addr)(struct tipc_bearer *b,
|
|
|
|
struct tipc_media_addr *addr,
|
|
|
|
char *msg);
|
|
|
|
int (*raw2addr)(struct tipc_bearer *b,
|
|
|
|
struct tipc_media_addr *addr,
|
2021-10-12 08:58:39 -07:00
|
|
|
const char *raw);
|
2006-01-02 19:04:38 +01:00
|
|
|
u32 priority;
|
|
|
|
u32 tolerance;
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 00:52:46 +01:00
|
|
|
u32 min_win;
|
|
|
|
u32 max_win;
|
2018-04-19 11:06:19 +02:00
|
|
|
u32 mtu;
|
2006-01-02 19:04:38 +01:00
|
|
|
u32 type_id;
|
2013-12-10 20:45:43 -08:00
|
|
|
u32 hwaddr_len;
|
2006-01-02 19:04:38 +01:00
|
|
|
char name[TIPC_MAX_MEDIA_NAME];
|
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
2013-12-10 20:45:40 -08:00
|
|
|
* struct tipc_bearer - Generic TIPC bearer structure
|
tipc: remove bearer_lock from tipc_bearer struct
After the earlier commits ("tipc: remove 'links' list from
tipc_bearer struct") and ("tipc: introduce new spinlock to protect
struct link_req"), there is no longer any need to protect struct
link_req or or any link list by use of bearer_lock. Furthermore,
we have eliminated the need for using bearer_lock during downcalls
(send) from the link to the bearer, since we have ensured that
bearers always have a longer life cycle that their associated links,
and always contain valid data.
So, the only need now for a lock protecting bearers is for guaranteeing
consistency of the bearer list itself. For this, it is sufficient, at
least for the time being, to continue applying 'net_lock´ in write mode.
By removing bearer_lock we also pre-empt introduction of issue b) descibed
in the previous commit "tipc: remove 'links' list from tipc_bearer struct":
"b) When the outer protection from net_lock is gone, taking
bearer_lock and node_lock in opposite order of method 1) and 2)
will become an obvious deadlock hazard".
Therefore, we now eliminate the bearer_lock spinlock.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:29:17 -05:00
|
|
|
* @media_ptr: pointer to additional media-specific information about bearer
|
2011-01-07 13:00:11 -05:00
|
|
|
* @mtu: max packet size bearer can support
|
|
|
|
* @addr: media-specific address associated with bearer
|
|
|
|
* @name: bearer name (format = media:interface)
|
2006-01-02 19:04:38 +01:00
|
|
|
* @media: ptr to media structure associated with bearer
|
2013-12-10 20:45:40 -08:00
|
|
|
* @bcast_addr: media address used in broadcasting
|
2017-08-28 17:57:02 +02:00
|
|
|
* @pt: packet type for bearer
|
2014-04-21 10:55:45 +08:00
|
|
|
* @rcu: rcu struct for tipc_bearer
|
2006-01-02 19:04:38 +01:00
|
|
|
* @priority: default link priority for bearer
|
2020-11-29 10:32:43 -08:00
|
|
|
* @min_win: minimum window (in packets) before declaring link congestion
|
|
|
|
* @max_win: maximum window (in packets) before declaring link congestion
|
2011-10-18 11:34:29 -04:00
|
|
|
* @tolerance: default link tolerance for bearer
|
2014-03-28 10:32:08 +01:00
|
|
|
* @domain: network domain to which links can be established
|
2006-01-02 19:04:38 +01:00
|
|
|
* @identity: array index of this bearer within TIPC bearer array
|
2020-11-29 10:32:43 -08:00
|
|
|
* @disc: ptr to link setup request
|
2006-01-02 19:04:38 +01:00
|
|
|
* @net_plane: network plane ('A' through 'H') currently associated with bearer
|
2023-05-14 15:52:27 -04:00
|
|
|
* @encap_hlen: encap headers length
|
2020-11-29 10:32:43 -08:00
|
|
|
* @up: bearer up flag (bit 0)
|
|
|
|
* @refcnt: tipc_bearer reference counter
|
2011-01-07 13:00:11 -05:00
|
|
|
*
|
|
|
|
* Note: media-specific code is responsible for initialization of the fields
|
|
|
|
* indicated below when a bearer is enabled; TIPC's generic bearer code takes
|
|
|
|
* care of initializing all other fields.
|
2006-01-02 19:04:38 +01:00
|
|
|
*/
|
2011-01-07 13:00:11 -05:00
|
|
|
struct tipc_bearer {
|
2021-04-07 09:59:45 +08:00
|
|
|
void __rcu *media_ptr; /* initialized by media */
|
|
|
|
u32 mtu; /* initialized by media */
|
|
|
|
struct tipc_media_addr addr; /* initialized by media */
|
2011-01-07 13:00:11 -05:00
|
|
|
char name[TIPC_MAX_BEARER_NAME];
|
2011-12-29 20:19:42 -05:00
|
|
|
struct tipc_media *media;
|
2013-04-17 06:18:26 +00:00
|
|
|
struct tipc_media_addr bcast_addr;
|
2017-08-28 17:57:02 +02:00
|
|
|
struct packet_type pt;
|
2014-04-21 10:55:45 +08:00
|
|
|
struct rcu_head rcu;
|
2006-01-02 19:04:38 +01:00
|
|
|
u32 priority;
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 00:52:46 +01:00
|
|
|
u32 min_win;
|
|
|
|
u32 max_win;
|
2011-10-18 11:34:29 -04:00
|
|
|
u32 tolerance;
|
2014-03-28 10:32:08 +01:00
|
|
|
u32 domain;
|
2006-01-02 19:04:38 +01:00
|
|
|
u32 identity;
|
2018-03-22 20:42:46 +01:00
|
|
|
struct tipc_discoverer *disc;
|
2006-01-02 19:04:38 +01:00
|
|
|
char net_plane;
|
2023-05-14 15:52:27 -04:00
|
|
|
u16 encap_hlen;
|
2016-08-16 11:53:50 -04:00
|
|
|
unsigned long up;
|
2019-11-08 12:05:08 +07:00
|
|
|
refcount_t refcnt;
|
2006-01-02 19:04:38 +01:00
|
|
|
};
|
|
|
|
|
2011-12-29 21:39:49 -05:00
|
|
|
struct tipc_bearer_names {
|
2006-01-02 19:04:38 +01:00
|
|
|
char media_name[TIPC_MAX_MEDIA_NAME];
|
|
|
|
char if_name[TIPC_MAX_IF_NAME];
|
|
|
|
};
|
|
|
|
|
2010-11-30 12:00:53 +00:00
|
|
|
/*
|
|
|
|
* TIPC routines available to supported media types
|
|
|
|
*/
|
|
|
|
|
2015-11-19 14:30:47 -05:00
|
|
|
void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b);
|
2010-11-30 12:00:53 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Routines made available to TIPC by supported media types
|
|
|
|
*/
|
2013-12-10 20:45:39 -08:00
|
|
|
extern struct tipc_media eth_media_info;
|
2010-11-30 12:00:53 +00:00
|
|
|
|
2013-04-17 06:18:28 +00:00
|
|
|
#ifdef CONFIG_TIPC_MEDIA_IB
|
2013-12-10 20:45:39 -08:00
|
|
|
extern struct tipc_media ib_media_info;
|
2013-04-17 06:18:28 +00:00
|
|
|
#endif
|
2015-03-05 10:23:49 +01:00
|
|
|
#ifdef CONFIG_TIPC_MEDIA_UDP
|
|
|
|
extern struct tipc_media udp_media_info;
|
|
|
|
#endif
|
2013-04-17 06:18:28 +00:00
|
|
|
|
2014-11-20 10:29:07 +01:00
|
|
|
int tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info);
|
2018-02-14 13:37:59 +08:00
|
|
|
int __tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info);
|
2014-11-20 10:29:07 +01:00
|
|
|
int tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info);
|
2018-02-14 13:38:00 +08:00
|
|
|
int __tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info);
|
2014-11-20 10:29:08 +01:00
|
|
|
int tipc_nl_bearer_dump(struct sk_buff *skb, struct netlink_callback *cb);
|
|
|
|
int tipc_nl_bearer_get(struct sk_buff *skb, struct genl_info *info);
|
2014-11-20 10:29:09 +01:00
|
|
|
int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info);
|
2018-02-14 13:38:01 +08:00
|
|
|
int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info);
|
2016-08-26 10:52:53 +02:00
|
|
|
int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info);
|
2014-11-20 10:29:07 +01:00
|
|
|
|
2014-11-20 10:29:15 +01:00
|
|
|
int tipc_nl_media_dump(struct sk_buff *skb, struct netlink_callback *cb);
|
|
|
|
int tipc_nl_media_get(struct sk_buff *skb, struct genl_info *info);
|
2014-11-20 10:29:16 +01:00
|
|
|
int tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info);
|
2018-02-14 13:38:02 +08:00
|
|
|
int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info);
|
2014-11-20 10:29:15 +01:00
|
|
|
|
tipc: enable tracepoints in tipc
As for the sake of debugging/tracing, the commit enables tracepoints in
TIPC along with some general trace_events as shown below. It also
defines some 'tipc_*_dump()' functions that allow to dump TIPC object
data whenever needed, that is, for general debug purposes, ie. not just
for the trace_events.
The following trace_events are now available:
- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
e.g. message type, user, droppable, skb truesize, cloned skb, etc.
- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
queues, e.g. TIPC link transmq, socket receive queue, etc.
- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
sk state, sk type, connection type, rmem_alloc, socket queues, etc.
- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
node state, active links, capabilities, link entries, etc.
How to use:
Put the trace functions at any places where we want to dump TIPC data
or events.
Note:
a) The dump functions will generate raw data only, that is, to offload
the trace event's processing, it can require a tool or script to parse
the data but this should be simple.
b) The trace_tipc_*_dump() should be reserved for a failure cases only
(e.g. the retransmission failure case) or where we do not expect to
happen too often, then we can consider enabling these events by default
since they will almost not take any effects under normal conditions,
but once the rare condition or failure occurs, we get the dumped data
fully for post-analysis.
For other trace purposes, we can reuse these trace classes as template
but different events.
c) A trace_event is only effective when we enable it. To enable the
TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
directory in the 'debugfs' file system. Normally, they are located at:
/sys/kernel/debug/tracing/events/tipc/
For example:
To enable the tipc_link_dump event:
echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
To enable all the TIPC trace_events:
echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
To collect the trace data:
cat trace
or
cat trace_pipe > /trace.out &
To disable all the TIPC trace_events:
echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
To clear the trace buffer:
echo > trace
d) Like the other trace_events, the feature like 'filter' or 'trigger'
is also usable for the tipc trace_events.
For more details, have a look at:
Documentation/trace/ftrace.txt
MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 09:17:56 +07:00
|
|
|
int tipc_media_addr_printf(char *buf, int len, struct tipc_media_addr *a);
|
2015-03-05 10:23:49 +01:00
|
|
|
int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
|
|
|
|
struct nlattr *attrs[]);
|
2019-11-08 12:05:08 +07:00
|
|
|
bool tipc_bearer_hold(struct tipc_bearer *b);
|
|
|
|
void tipc_bearer_put(struct tipc_bearer *b);
|
2013-12-10 20:45:43 -08:00
|
|
|
void tipc_disable_l2_media(struct tipc_bearer *b);
|
2015-01-09 15:27:07 +08:00
|
|
|
int tipc_l2_send_msg(struct net *net, struct sk_buff *buf,
|
|
|
|
struct tipc_bearer *b, struct tipc_media_addr *dest);
|
2006-01-02 19:04:38 +01:00
|
|
|
|
2015-01-09 15:27:06 +08:00
|
|
|
void tipc_bearer_add_dest(struct net *net, u32 bearer_id, u32 dest);
|
|
|
|
void tipc_bearer_remove_dest(struct net *net, u32 bearer_id, u32 dest);
|
|
|
|
struct tipc_bearer *tipc_bearer_find(struct net *net, const char *name);
|
2016-07-26 08:47:21 +02:00
|
|
|
int tipc_bearer_get_name(struct net *net, char *name, u32 bearer_id);
|
2011-12-29 20:19:42 -05:00
|
|
|
struct tipc_media *tipc_media_find(const char *name);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 20:45:42 -08:00
|
|
|
int tipc_bearer_setup(void);
|
|
|
|
void tipc_bearer_cleanup(void);
|
2015-01-09 15:27:05 +08:00
|
|
|
void tipc_bearer_stop(struct net *net);
|
2015-10-22 08:51:43 -04:00
|
|
|
int tipc_bearer_mtu(struct net *net, u32 bearer_id);
|
2023-05-14 15:52:27 -04:00
|
|
|
int tipc_bearer_min_mtu(struct net *net, u32 bearer_id);
|
2017-01-18 13:50:50 -05:00
|
|
|
bool tipc_bearer_bcast_support(struct net *net, u32 bearer_id);
|
2015-10-22 08:51:44 -04:00
|
|
|
void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
struct tipc_media_addr *dest);
|
2015-07-16 16:54:24 -04:00
|
|
|
void tipc_bearer_xmit(struct net *net, u32 bearer_id,
|
|
|
|
struct sk_buff_head *xmitq,
|
2019-11-08 12:05:11 +07:00
|
|
|
struct tipc_media_addr *dst,
|
|
|
|
struct tipc_node *__dnode);
|
tipc: simplify bearer level broadcast
Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.
A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.
This commit introduces these changes.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 08:51:42 -04:00
|
|
|
void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
|
|
|
|
struct sk_buff_head *xmitq);
|
2019-08-07 12:52:29 +10:00
|
|
|
void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts);
|
|
|
|
int tipc_attach_loopback(struct net *net);
|
|
|
|
void tipc_detach_loopback(struct net *net);
|
|
|
|
|
|
|
|
static inline void tipc_loopback_trace(struct net *net,
|
|
|
|
struct sk_buff_head *pkts)
|
|
|
|
{
|
|
|
|
if (unlikely(dev_nit_active(net->loopback_dev)))
|
|
|
|
tipc_clone_to_loopback(net, pkts);
|
|
|
|
}
|
2006-01-02 19:04:38 +01:00
|
|
|
|
2016-12-02 09:33:41 +01:00
|
|
|
/* check if device MTU is too low for tipc headers */
|
2023-05-29 10:52:13 -04:00
|
|
|
static inline bool tipc_mtu_bad(struct net_device *dev)
|
2016-12-02 09:33:41 +01:00
|
|
|
{
|
2023-05-29 10:52:13 -04:00
|
|
|
if (dev->mtu >= TIPC_MIN_BEARER_MTU)
|
2016-12-02 09:33:41 +01:00
|
|
|
return false;
|
|
|
|
netdev_warn(dev, "MTU too low for tipc bearer\n");
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2010-05-11 14:30:16 +00:00
|
|
|
#endif /* _TIPC_BEARER_H */
|