linux-stable/include/net
Eric Dumazet 90c337da15 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-06 23:57:12 -07:00
..
9p 9p: switch p9_client_read() to passing struct iov_iter * 2015-04-11 22:28:27 -04:00
bluetooth Bluetooth: Read LE remote features during connection establishment 2015-04-09 08:36:54 +03:00
caif caif: fix a signedness bug in cfpkt_iterate() 2015-02-20 17:35:14 -05:00
irda irda: Convert function pointer arrays and uses to const 2014-12-10 15:33:16 -05:00
iucv af_iucv: fix recvmsg by replacing skb_pull() function 2013-04-08 17:16:57 -04:00
netfilter netfilter: nf_tables: allow to bind table to net_device 2015-05-26 18:41:17 +02:00
netns Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2015-05-31 00:02:30 -07:00
nfc nfc: nci: Add comment to explain NCI_HCI_MAX_PIPES 2015-04-06 00:19:05 +02:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
sctp sctp: Fix mangled IPv4 addresses on a IPv6 listening socket 2015-05-27 14:15:26 -04:00
tc_act act_bpf: add initial eBPF support for actions 2015-03-20 19:10:44 -04:00
6lowpan.h ieee802154: 6lowpan: rename process_data and lowpan_process_data 2014-10-27 15:51:16 +01:00
act_api.h net_sched: act: refuse to remove bound action outside 2014-02-12 19:23:32 -05:00
addrconf.h net: Export IGMP/MLD message validation code 2015-05-04 14:49:23 -04:00
af_ieee802154.h ieee802154: mac802154: remove FSF address 2014-10-25 08:07:30 +02:00
af_rxrpc.h af_rxrpc.h: Remove extern from function prototypes 2013-07-31 17:50:01 -07:00
af_unix.h af_unix: improve STREAM behavior with fragmented memory 2013-08-10 01:16:44 -07:00
af_vsock.h net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
ah.h ipsec: Remove obsolete MAX_AH_AUTH_LEN 2014-09-18 10:54:36 +02:00
arp.h neigh: Factor out ___neigh_lookup_noref 2015-03-04 00:23:23 -05:00
atmclip.h atm: clip: Use device neigh support on top of "arp_tbl". 2011-11-30 18:51:03 -05:00
ax25.h ax25: Stop using magic neighbour cache operations. 2015-03-03 14:44:41 -05:00
ax88796.h
bond_3ad.h bonding: Implement port churn-machine (AD standard 43.4.17). 2015-02-24 16:05:48 -05:00
bond_alb.h net: Move bonding headers under include/net 2014-11-10 13:27:49 -05:00
bond_options.h bonding: Implement user key part of port_key in an AD system. 2015-05-11 10:59:32 -04:00
bonding.h bonding: Implement user key part of port_key in an AD system. 2015-05-11 10:59:32 -04:00
busy_poll.h sched, net: Fixup busy_loop_us_clock() 2014-01-13 17:39:11 +01:00
cfg80211-wext.h cfg80211: remove unused wext handler exports 2011-08-08 14:26:29 -04:00
cfg80211.h cfg80211: properly send NL80211_ATTR_DISCONNECTED_BY_AP in disconnect 2015-05-26 15:21:27 +02:00
cfg802154.h nl802154: add support to set cca ed level 2015-05-27 19:29:42 +02:00
checksum.h net: fix sparse error in csum_replace4() 2015-05-17 13:08:29 -04:00
cipso_ipv4.h cipso: don't use IPCB() to locate the CIPSO IP option 2015-02-11 14:46:37 -05:00
cls_cgroup.h cgroup: clean up cgroup_subsys names and initialization 2014-02-08 10:36:58 -05:00
codel.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-13 14:31:43 -04:00
compat.h net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
datalink.h net: Move prototype declaration to header file include/net/datalink.h from net/ipx/af_ipx.c 2014-02-09 17:32:50 -08:00
dcbevent.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
dcbnl.h net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dn_dev.h dn_dev: add support for IFA_FLAGS nl attribute 2013-12-10 21:50:00 -05:00
dn_fib.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dn_neigh.h netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
dn_nsp.h decnet (dn*.h): Remove extern from function prototypes 2013-09-20 14:49:32 -04:00
dn_route.h net: Move prototype declaration to appropriate header file from decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dn.h net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dsa.h net: dsa: Add basic framework to support ndo_fdb functions 2015-03-29 13:23:54 -07:00
dsfield.h ipv6: Optimize ipv6_change_dsfield(). 2013-01-09 23:59:53 -08:00
dst_ops.h net: Remove protocol from struct dst_ops 2015-03-09 16:06:10 -04:00
dst.h net: make skb_dst_pop routine static 2015-05-12 23:19:49 -04:00
esp.h net: move pskb_put() to core code 2013-11-07 19:28:58 -05:00
ethoc.h net: ethoc: set up MII management bus clock 2014-02-04 20:19:51 -08:00
fib_rules.h net: Kill hold_net release_net 2015-03-12 14:39:40 -04:00
firewire.h firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. 2013-03-26 12:32:13 -04:00
flow_dissector.h mpls: Add MPLS entropy label in flow_keys 2015-06-04 15:44:31 -07:00
flow.h ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif 2014-04-16 15:05:11 -04:00
flowcache.h flowcache: Make flow cache name space aware 2014-02-12 07:02:11 +01:00
fou.h ip_tunnel: Ops registration for secondary encap (fou, gue) 2014-11-12 15:01:35 -05:00
garp.h garp.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
gen_stats.h net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
genetlink.h net: Introduce possible_net_t 2015-03-12 14:39:40 -04:00
geneve.h geneve: move definition of geneve_hdr() to geneve.h 2015-05-13 15:59:13 -04:00
gre.h gre: Call gso_make_checksum 2014-06-04 22:46:38 -07:00
gro_cells.h ip_tunnel: Create percpu gro_cell 2015-01-18 01:56:32 -05:00
gue.h gue: Protocol constants for remote checksum offload 2014-11-05 16:30:03 -05:00
icmp.h icmp.h: Remove extern from function prototypes 2013-09-20 14:49:33 -04:00
ieee80211_radiotap.h mac80211: propagate STBC / LDPC flags to radiotap 2014-02-06 09:34:58 +01:00
ieee802154_netdev.h ieee802154: Remove ieee802154_reduced_mlme_ops references. 2015-05-26 20:26:09 +02:00
if_inet6.h ipv6: do retries on stable privacy addresses 2015-03-23 22:12:09 -04:00
inet6_connection_sock.h inet: get rid of central tcp/dccp listener timer 2015-03-20 12:40:25 -04:00
inet6_hashtables.h ipv6: get rid of __inet6_hash() 2015-03-18 22:00:35 -04:00
inet_common.h net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
inet_connection_sock.h tcp: fix child sockets to use system default congestion control if not set 2015-05-31 21:49:14 -07:00
inet_ecn.h tunnel: fix RFC number in comment for INET_ECN_decapsulate() 2014-05-07 15:30:52 -04:00
inet_frag.h ip_fragment: don't forward defragmented DF packet 2015-05-27 13:03:31 -04:00
inet_hashtables.h tcp: fix/cleanup inet_ehash_locks_alloc() 2015-05-26 19:48:46 -04:00
inet_sock.h inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations 2015-06-06 23:57:12 -07:00
inet_timewait_sock.h tcp/dccp: get rid of central timewait timer 2015-04-13 16:40:05 -04:00
inetpeer.h tcp: simplify inetpeer_addr_base use 2015-03-31 13:58:35 -04:00
ip6_checksum.h net: add gro_compute_pseudo functions 2014-08-24 18:09:23 -07:00
ip6_fib.h ipv6: Create percpu rt6_info 2015-05-25 13:25:35 -04:00
ip6_route.h ipv6: Add rt6_get_cookie() function 2015-05-25 13:25:34 -04:00
ip6_tunnel.h udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
ip_fib.h ipv4: FIB Local/MAIN table collapse 2015-03-11 16:22:14 -04:00
ip_tunnels.h ipip,gre,vti,sit: implement ndo_get_iflink 2015-04-02 14:05:00 -04:00
ip_vs.h net: Introduce possible_net_t 2015-03-12 14:39:40 -04:00
ip.h net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
ipcomp.h percpu: add __percpu sparse annotations to net 2010-02-16 23:05:38 -08:00
ipconfig.h
ipv6.h net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
ipx.h switch ipxrtr_route_packet() from iovec to msghdr 2014-11-24 04:28:49 -05:00
iw_handler.h wext: add checked wrappers for adding events/points to streams 2015-02-28 21:31:12 +01:00
lapb.h lapb.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
lib80211.h lib80211: remove unused print_ssid() 2014-10-14 02:18:27 +02:00
llc_c_ac.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_c_ev.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_c_st.h llc: Make llc_conn_ev_qfyr_t function pointer arrays const 2014-12-10 15:21:24 -05:00
llc_conn.h net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
llc_if.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_pdu.h net: llc: fix order of evaluation in llc_conn_ac_inc_vr_by_1 2014-01-01 22:22:43 -05:00
llc_s_ac.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_s_ev.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc_s_st.h llc: Make llc_sap_action_t function pointer arrays const 2014-12-10 15:21:24 -05:00
llc_sap.h llc*.h: Remove extern from function prototypes 2013-09-21 14:01:38 -04:00
llc.h llc: make lock static 2014-01-03 20:56:48 -05:00
mac80211.h mac80211: Fix mac80211.h docbook comments 2015-05-28 14:37:43 +02:00
mac802154.h cfg802154: introduce wpan phy flags 2015-05-19 11:44:42 +02:00
mip6.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
mld.h ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback 2014-09-22 16:23:15 -04:00
mpls.h openvswitch: Add basic MPLS support to kernel 2014-11-05 23:52:33 -08:00
mrp.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-10-01 17:06:14 -04:00
ndisc.h neigh: Factor out ___neigh_lookup_noref 2015-03-04 00:23:23 -05:00
neighbour.h net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state. 2015-03-20 21:47:40 -04:00
net_namespace.h netns: make nsid_lock per net 2015-05-17 23:41:11 -04:00
net_ratelimit.h net: Kill ratelimit.h dependency in linux/net.h 2011-05-27 13:41:33 -04:00
netevent.h netevent/netlink.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
netlabel.h netlabel: fix the netlbl_catmap_setlong() dummy function 2014-08-07 20:55:21 -04:00
netlink.h netlink: implement nla_get_in_addr and nla_get_in6_addr 2015-03-31 13:58:35 -04:00
netprio_cgroup.h cgroup: clean up cgroup_subsys names and initialization 2014-02-08 10:36:58 -05:00
netrom.h netrom.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
nexthop.h
nl802154.h nl802154: add support for dump phy capabilities 2015-05-19 11:44:44 +02:00
p8022.h p8022.h: Remove extern from function prototypes 2013-09-21 14:01:39 -04:00
ping.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
pkt_cls.h net: sched: remove tcf_proto from ematch calls 2014-10-06 18:02:32 -04:00
pkt_sched.h net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
protocol.h net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
psnap.h psnap.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
raw.h raw/rawv6.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
rawv6.h raw/rawv6.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
red.h reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
regulatory.h cfg80211: allow wiphy specific regdomain management 2014-12-17 11:49:55 +01:00
request_sock.h tcp: provide SYN headers for passive connections 2015-05-05 16:02:34 -04:00
rose.h rose.h: Remove extern from function prototypes 2013-09-23 01:51:08 -04:00
route.h ipv4: per cpu uncached list 2015-01-15 18:26:16 -05:00
rtnetlink.h rtnetlink: Mark name argument of rtnl_create_link() const 2015-04-10 12:42:40 -07:00
sch_generic.h net: sched: use counter to break reclassify loops 2015-05-13 15:08:14 -04:00
scm.h scm.h: Remove extern from function prototypes 2013-09-23 01:51:09 -04:00
secure_seq.h inetpeer: get rid of ip_id_count 2014-06-02 11:00:41 -07:00
slhc_vj.h
snmp.h Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
sock.h tcp: add a force_schedule argument to sk_stream_alloc_skb() 2015-05-21 16:56:40 -04:00
Space.h drivers: net: Include new header file in sbni.c 2013-12-19 18:51:20 -05:00
stp.h stp.h: Remove extern from function prototypes 2013-09-23 01:51:09 -04:00
switchdev.h switchdev: add support for fdb add/del/dump via switchdev_port_obj ops. 2015-05-17 22:49:09 -04:00
tcp_memcontrol.h tcp_memcontrol: Kill struct tcp_memcontrol 2013-10-21 18:43:02 -04:00
tcp_states.h inet: add TCP_NEW_SYN_RECV state 2015-03-12 22:58:12 -04:00
tcp.h tcp: add rfc3168, section 6.1.1.1. fallback 2015-05-19 16:53:37 -04:00
timewait_sock.h [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. 2012-06-09 14:56:12 -07:00
transp_v6.h ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams 2014-01-19 19:53:18 -08:00
tso.h net: Add a software TSO helper API 2014-05-22 14:57:15 -04:00
udp_tunnel.h udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
udp.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
udplite.h net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter() 2015-02-04 01:34:15 -05:00
vsock_addr.h VSOCK: Move af_vsock.h and vsock_addr.h to include/net 2013-07-27 22:14:06 -07:00
vxlan.h udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
wext.h wext.h: Remove extern from function prototypes 2013-09-23 16:29:40 -04:00
wimax.h net: treewide: Fix typo found in DocBook/networking.xml 2014-09-05 17:35:28 -07:00
x25.h x25.h: Remove extern from function prototypes 2013-09-23 16:29:41 -04:00
x25device.h X25: Add if_x25.h and x25 to device identifiers 2010-04-22 16:12:36 -07:00
xfrm.h netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00