Julian Wiedmann says:
====================
s390/net: updates 2017-12-20
Please apply the following patch series for 4.16.
Nothing too exciting, mostly just beating the qeth L3 code into shape.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a common helper for parsing an IP address string, let's use it.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TSO and IQD paths already need to fix-up the current values, and
OSA will require more flexibility in the future as well. So just let
the caller specify the data length.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate the cast type translation, move the passthru path out of
the RCU-guarded section, and use the appropriate rtable helpers when
determining the next-hop address.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The L3 packet descriptor's 'dest_addr' field is used for a different
purpose in RX descriptors. Clean up the hard-coded byte accesses and
try to be more self-documenting.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When
1. an skb has no neighbour, and
2. skb->protocol is not IP[V6],
we select the skb's cast type based on its destination MAC address.
The multicast check is currently restricted to Multicast IP-mapped MACs.
Extend it to also cover non-IP Multicast MACs.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the proper helpers to check for multicast IP addressing, and remove
some ancient Token Ring code.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of assuming that skb->data points to the Ethernet header, use
the right helper and struct to access the Ethertype field.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once all of qeth_l3_set_rx_mode()'s single-use helpers are folded back
in, the two implementations actually look quite similar. So improve the
readability by converting both set_rx_mode() routines to a common
format.
This also allows us to walk ip_mc_htable just once, instead of three
times.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be a little more self-documenting, and get rid of OSA_ADDR_LEN.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For adding/removing a MAC address, use just one helper each that
handles both unicast and multicast.
Saves one level of indirection for multicast addresses, while improving
the error reporting for unicast addresses.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of tracking the uc/mc state in each MAC address object, just
check the multicast bit in the address itself.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit "s390/qeth: use ip*_eth_mc_map helpers" removed the last
occurrence of CONFIG_IPV6-dependent code.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of some wrapper indirection, and stop accessing the skb at
hard-coded offsets.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable qeth_reply.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
[jwi: removed the WARN_ONs. Use CONFIG_REFCOUNT_FULL if you care.]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable lcs_reply.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
[jwi: removed the WARN_ONs. Use CONFIG_REFCOUNT_FULL if you care.]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 966a967116e6 randomly added alignment to this structure, but
it's actually detrimental to performance of null_blk. Test case:
Running on both the home and remote node shows a ~5% degradation
in performance.
While in there, move blk_status_t to the hole after the integer tag
in the nullb_cmd structure. After this patch, we shrink the size
from 192 to 152 bytes.
Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous change blindly added massive alignment to the
call_single_data structure in struct request. This ballooned it in size
from 296 to 320 bytes on my setup, for no valid reason at all.
Use the unaligned struct __call_single_data variant instead.
Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data")
Cc: stable@vger.kernel.org # v4.14
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
local table uses the same trie allocated for the main table when custom
rules are not in use.
When a net namespace is dismantled, the main table is flushed and freed
(via an RCU callback) before the local table. In case the callback is
invoked before the local table is iterated, a use-after-free can occur.
Fix this by iterating over the FIB tables in reverse order, so that the
main table is always freed after the local table.
v3: Reworded comment according to Alex's suggestion.
v2: Add a comment to make the fix more explicit per Dave's and Alex's
feedback.
Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to check both return code fields before processing the
response. Otherwise we risk operating on invalid data.
Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we receive a JOIN message from a peer member, the message may
contain an advertised window value ADV_IDLE that permits removing the
member in question from the tipc_group::congested list. However, since
the removal has been made conditional on that the advertised window is
*not* ADV_IDLE, we miss this case. This has the effect that a sender
sometimes may enter a state of permanent, false, broadcast congestion.
We fix this by unconditinally removing the member from the congested
list before calling tipc_member_update(), which might potentially sort
it into the list again.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- de-inline hash functions to save memory footprint, by Denys Vlasenko
- Add License information to various files, by Sven Eckelmann (3 patches)
- Change batman_adv.h from ISC to MIT, by Sven Eckelmann
- Improve various includes, by Sven Eckelmann (5 patches)
- Lots of kernel-doc work by Sven Eckelmann (8 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlo6QfgWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobWfEADPEOdxWS7nW4Xkhug+7vLbcloJ
Om1VDKFD4n5NfB6e+vh8kQAnnnQ/LzFSv53giNdnjjE9IPNKxNhzBQFS95H189EP
ebP0mKOesTadkTx+MjFxenhnaTnzK0hkngdxz/frvaq+i6ECMhnq8Bw0elVG0nSg
X9ts0x6BuNIw6EjIdPP0GvfOV1DvUmdMz1YLJy4yoJ8Tm671y86y07jTJaaEN2Ex
9dp0j3hEqtrYZvgRdQ/hzYLFJ9fvcF1FyA+duufK3FNbkJtZn89zqseIuN2saqqw
QN/nDBrduzW6SR9y0JfXlatI6FN6316jskLcpqorz5/88KrwQeTOg2ZXXn0iw6l1
r0tkBP5/eEu2Dcd3WRsNtTnMZmGfc2uuqmvD1Pz0wN7RAzIYhPvs9to6TVy3mK5p
0OyFJaZU9vurtIPqVYjNtSofkUHKMOZZ1H7LocWaINGIVsREx/i0DmX//M0ZiqbB
mS4ybkn8yOYfFUizg51RYo2nKVyw/ZXNqBYBPUWio91CPj0vOUFitvfxnxNitV/m
182HVdoOnGhorYbS3J/8Su3AyEyhJTVWnmq0z3u1CuvdMhppbAzvXvmERotAJN9e
5Wp26PE2a7zb4LJhMBQ4q+RnUnZV5ADhigrIEGPOQoY5KdIrEOcW65wrgfhWYlER
NVJc2okVZKhg3o7amA==
=5JUK
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20171220' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- de-inline hash functions to save memory footprint, by Denys Vlasenko
- Add License information to various files, by Sven Eckelmann (3 patches)
- Change batman_adv.h from ISC to MIT, by Sven Eckelmann
- Improve various includes, by Sven Eckelmann (5 patches)
- Lots of kernel-doc work by Sven Eckelmann (8 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
kernel config fragement CONFIG_NUMA=y is need for reuseport_bpf_numa.
Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yafang Shao says:
====================
replace tcp_set_state tracepoint with inet_sock_set_state
According to the discussion in the mail thread
https://patchwork.kernel.org/patch/10099243/,
tcp_set_state tracepoint is renamed to inet_sock_set_state tracepoint and is
moved to include/trace/events/sock.h.
With this new tracepoint, we can trace AF_INET/AF_INET6 sock state transitions.
As there's only one single tracepoint for inet, so I didn't create a new trace
file named trace/events/inet_sock.h, and just place it in
include/trace/events/sock.h
Currently TCP/DCCP/SCTP state transitions are traced with this tracepoint.
- Why not more protocol ?
If we really think that anonter protocol should be traced, I will modify the
code to trace it.
I just want to make the code easy and not output useless information.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With changes in inet_ files, SCTP state transitions are traced with
inet_sock_set_state tracepoint.
As SCTP state names, i.e. SCTP_SS_CLOSED, SCTP_SS_ESTABLISHED,
have the same value with TCP state names. So the output info still print
the TCP state names, that makes the code easy.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With changes in inet_ files, DCCP state transitions are traced with
inet_sock_set_state tracepoint.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_state_load is only used by AF_INET/AF_INET6, so rename it to
inet_sk_state_load and move it into inet_sock.h.
sk_state_store is removed as it is not used any more.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As sk_state is a common field for struct sock, so the state
transition tracepoint should not be a TCP specific feature.
Currently it traces all AF_INET state transition, so I rename this
tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
into trace/events/sock.h.
We dont need to create a file named trace/events/inet_sock.h for this one single
tracepoint.
Two helpers are introduced to trace sk_state transition
- void inet_sk_state_store(struct sock *sk, int newstate);
- void inet_sk_set_state(struct sock *sk, int state);
As trace header should not be included in other header files,
so they are defined in sock.c.
The protocol such as SCTP maybe compiled as a ko, hence export
inet_sk_set_state().
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TCP trace events (specifically tcp_set_state), maps emums to symbol
names via __print_symbolic(). But this only works for reading trace events
from the tracefs trace files. If perf or trace-cmd were to record these
events, the event format file does not convert the enum names into numbers,
and you get something like:
__print_symbolic(REC->oldstate,
{ TCP_ESTABLISHED, "TCP_ESTABLISHED" },
{ TCP_SYN_SENT, "TCP_SYN_SENT" },
{ TCP_SYN_RECV, "TCP_SYN_RECV" },
{ TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
{ TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
{ TCP_TIME_WAIT, "TCP_TIME_WAIT" },
{ TCP_CLOSE, "TCP_CLOSE" },
{ TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
{ TCP_LAST_ACK, "TCP_LAST_ACK" },
{ TCP_LISTEN, "TCP_LISTEN" },
{ TCP_CLOSING, "TCP_CLOSING" },
{ TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
Where trace-cmd and perf do not know the values of those enums.
Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
the enum strings into their values at system boot. This will allow perf and
trace-cmd to see actual numbers and not enums:
__print_symbolic(REC->oldstate,
{ 1, "TCP_ESTABLISHED" },
{ 2, "TCP_SYN_SENT" },
{ 3, "TCP_SYN_RECV" },
{ 4, "TCP_FIN_WAIT1" },
{ 5, "TCP_FIN_WAIT2" },
{ 6, "TCP_TIME_WAIT" },
{ 7, "TCP_CLOSE" },
{ 8, "TCP_CLOSE_WAIT" },
{ 9, "TCP_LAST_ACK" },
{ 10, "TCP_LISTEN" },
{ 11, "TCP_CLOSING" },
{ 12, "TCP_NEW_SYN_RECV" })
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Checking return value with IS_ERROR_OR_NULL
- Added error handling where it was not handled
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If md is NULL, tun_dst must be freed, otherwise it will cause memory
leak.
Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If md is NULL, tun_dst must be freed, otherwise it will cause memory
leak.
Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as ipv4 code, when ip6erspan_rcv call return PACKET_REJECT, we
should call icmpv6_send to send icmp unreachable message in error path.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Acked-by: William Tu <u9012063@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When erspan_rcv call return PACKET_REJECT, we shoudn't call ipgre_rcv to
process packets again, instead send icmp unreachable message in error
path.
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Acked-by: William Tu <u9012063@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
the right place.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Cc: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Misc fixes for mlx5 core and mlx5 netdev driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEbBAABAgAGBQJaOYOtAAoJEEg/ir3gV/o+1/0H+I9yY/HYQIU0Cl08yNvYKBcS
KuhGpeJCH20rPQPrcPTG3zN0KF6DZKjwsQOwxdE5pUXqQNUuyuogUZCuLYB4OElL
P4b1o5G377sTcDQ7jH2YAWD5hO8c5vKxsDbvN+kJaUkK+SHvjhLdvC5teUPf58rx
tlDcWzdp3w1nBWeoLbASXTiqPYyA8vGkjOWWiQxBZoD4A4U7QpRKsEKaWd6g3mtH
mnKVd8sczIFCHoHCQ3e+efJMz478YvX2rzdKZ8eycAMkQHBIny0fZzc4IiFy5ZXN
L2CUGesr8x1CZ9dtK+OEw1STpalD0kpCfjRhYd2B7X0KgY/FN+7vgnpmBtxdzA==
=G//q
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2017-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
===================
Mellanox, mlx5 fixes 2017-12-19
The follwoing series includes some fixes for mlx5 core and etherent
driver.
Please pull and let me know if there is any problem.
This series doesn't introduce any conflict with the ongoing mlx5 for-next
submission.
For -stable:
kernels >= v4.7.y
("net/mlx5e: Fix possible deadlock of VXLAN lock")
("net/mlx5e: Add refcount to VXLAN structure")
("net/mlx5e: Prevent possible races in VXLAN control flow")
("net/mlx5e: Fix features check of IPv6 traffic")
kernels >= v4.9.y
("net/mlx5: Fix error flow in CREATE_QP command")
("net/mlx5: Fix rate limit packet pacing naming and struct")
kernels >= v4.13.y
("net/mlx5: FPGA, return -EINVAL if size is zero")
kernels >= v4.14.y
("Revert "mlx5: move affinity hints assignments to generic code")
All above patches apply and compile with no issues on corresponding -stable.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.
To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().
This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.
V1-> V2: Not add extra field in bio based on discussion with Tejun
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jakub Kicinski says:
===================
cls_bpf: fix offload state tracking with block callbacks
After introduction of block callbacks classifiers can no longer track
offload state. cls_bpf used to do that in an attempt to move common
code from drivers to the core. Remove that functionality and fix
drivers.
The user-visible bug this is fixing is that trying to offload a second
filter would trigger a spurious DESTROY and in turn disable the already
installed one.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
After TC offloads were converted to callbacks we have no choice
but keep track of the offloaded filter in the driver.
The check for nn->dp.bpf_offload_xdp was a stop gap solution
to make sure failed TC offload won't disable XDP, it's no longer
necessary. nfp_net_bpf_offload() will return -EBUSY on
TC vs XDP conflicts.
Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cls_bpf used to take care of tracking what offload state a filter
is in, i.e. it would track if offload request succeeded or not.
This information would then be used to issue correct requests to
the driver, e.g. requests for statistics only on offloaded filters,
removing only filters which were offloaded, using add instead of
replace if previous filter was not added etc.
This tracking of offload state no longer functions with the new
callback infrastructure. There could be multiple entities trying
to offload the same filter.
Throw out all the tracking and corresponding commands and simply
pass to the drivers both old and new bpf program. Drivers will
have to deal with offload state tracking by themselves.
Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of yet another custom hex_dump_to_buffer().
The output is slightly changed, i.e. each byte followed by white space.
Note, we don't use print_hex_dump() here since the original code uses
nedev_dbg().
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
netdevsim: couple of build warning fixes
This series fixes two harmless build warning about a symbol which
should be static and an unused variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
skip_sw is set but no longer used.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct device_type nsim_dev_type created for SR-IOV support
should be static.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test cases for gretap and ip6gretap, native mode
and external (collect metadata) mode.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace sscanf() with mac_pton().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace sscanf() with mac_pton().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use
%pM to print MAC
mac_pton() to convert it from ASCII to binary format, and
ether_addr_copy() to copy.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(I can trivially verify that that idr_remove in cleanup_net happens
after the network namespace count has dropped to zero --EWB)
Function get_net_ns_by_id() does not check for net::count
after it has found a peer in netns_ids idr.
It may dereference a peer, after its count has already been
finaly decremented. This leads to double free and memory
corruption:
put_net(peer) rtnl_lock()
atomic_dec_and_test(&peer->count) [count=0] ...
__put_net(peer) get_net_ns_by_id(net, id)
spin_lock(&cleanup_list_lock)
list_add(&net->cleanup_list, &cleanup_list)
spin_unlock(&cleanup_list_lock)
queue_work() peer = idr_find(&net->netns_ids, id)
| get_net(peer) [count=1]
| ...
| (use after final put)
v ...
cleanup_net() ...
spin_lock(&cleanup_list_lock) ...
list_replace_init(&cleanup_list, ..) ...
spin_unlock(&cleanup_list_lock) ...
... ...
... put_net(peer)
... atomic_dec_and_test(&peer->count) [count=0]
... spin_lock(&cleanup_list_lock)
... list_add(&net->cleanup_list, &cleanup_list)
... spin_unlock(&cleanup_list_lock)
... queue_work()
... rtnl_unlock()
rtnl_lock() ...
for_each_net(tmp) { ...
id = __peernet2id(tmp, peer) ...
spin_lock_irq(&tmp->nsid_lock) ...
idr_remove(&tmp->netns_ids, id) ...
... ...
net_drop_ns() ...
net_free(peer) ...
} ...
|
v
cleanup_net()
...
(Second free of peer)
Also, put_net() on the right cpu may reorder with left's cpu
list_replace_init(&cleanup_list, ..), and then cleanup_list
will be corrupted.
Since cleanup_net() is executed in worker thread, while
put_net(peer) can happen everywhere, there should be
enough time for concurrent get_net_ns_by_id() to pick
the peer up, and the race does not seem to be unlikely.
The patch fixes the problem in standard way.
(Also, there is possible problem in peernet2id_alloc(), which requires
check for net::count under nsid_lock and maybe_get_net(peer), but
in current stable kernel it's used under rtnl_lock() and it has to be
safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
be fixed too. While this is not in stable kernel yet, so I'll send
a separate message to netdev@ later).
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids"
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>