66837 Commits

Author SHA1 Message Date
Tariq Toukan
e2869fb206 net/mlx5: Kconfig, Better organize compilation flags
Always contain all acceleration functions declarations in
'accel' files, independent to the flags setting.
For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs
where needed.

This obsoletes the need for stubs in 'fpga' files. Remove them.

Also use the new flags in Makefile, to decide whether to compile
TLS-specific or IPSEC-specific objects, or not.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:29:19 -07:00
David S. Miller
61c2491db7 mlx5-update-2019-07-04
This series adds mlx5 support for devlink fw versions query.
 
 1) Implement the required low level firmware commands
 2) Implement the devlink knobs and callbacks for fw versions query.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0eZOQACgkQSD+KveBX
 +j6a0gf/SbSY2yDz6iyDZy7Zt0e7onD395UyWIoXbXJAI/aMggvdNE233dh369xa
 d4Zo8zvFijiFpPKmP0p+WZfvR5YPPWgtOYDv1MbaTz5lqsGYUh2kN9FaTyCNZW3O
 jeNidxjsne70VBrpmOZJX2mxduKwRdxchRg2iQPdeZQGO5Uq3nEdOfJVx/mhsIGw
 iRczF4IAQpArZEyKXgqMSrtJ99FJ7JSJXnsqjVJyUC1zyCZ+Cq3VV55+u5VCBokd
 qr/VSYqFd38SEHR5l4nXDXLX23pOwWNhUdPgGopjak8bN3k19vk6GfQ2bzLFnaOW
 4fx1k4uTixKOrbLkgzay/YSsoF2TDw==
 =Bwl7
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-07-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-update-2019-07-04

This series adds mlx5 support for devlink fw versions query.

1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:24:27 -07:00
Li RongQing
e4aa33ad59 net: remove unused parameter from skb_checksum_try_convert
the check parameter is never used

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 15:30:36 -07:00
Saeed Mahameed
e08a976a16 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:

1) Add the required HW definitions and structures for upcoming TLS
   support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
   management on ECPF embedded CPU NIC:
   5.1) Consolidate querying eswitch number of VFs
   5.2) Register event handler at the correct E-Switch init stage
   5.3) Setup PF's inline mode and vlan pop when the ECPF is the
        E-Swtich manager ( the host PF is basically a VF ).
   5.4) Handle Vport UC address changes in switchdev mode.

6) Cleanup the rep and netdev reference when unloading IB rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

i# All conflicts fixed but you are still merging.
2019-07-04 16:42:59 -04:00
David S. Miller
c4cde5804d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-07-03

The following pull-request contains BPF updates for your *net-next* tree.

There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:

** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
  static int mlx5e_open_cq(struct mlx5e_channel *c,
                           struct dim_cq_moder moder,
                           struct mlx5e_cq_param *param,
                           struct mlx5e_cq *cq)
  =======
  int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
                    struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
  >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497

Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...

  drivers/net/ethernet/mellanox/mlx5/core/en.h +977

... and in mlx5e_open_xsk() ...

  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64

... needs the same rename from net_dim_cq_moder into dim_cq_moder.

** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
          int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
          struct dim_cq_moder icocq_moder = {0, 0};
          struct net_device *netdev = priv->netdev;
          struct mlx5e_channel *c;
          unsigned int irq;
  =======
          struct net_dim_cq_moder icocq_moder = {0, 0};
  >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497

Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.

Let me know if you run into any issues. Anyway, the main changes are:

1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.

2) Addition of two new per-cgroup BPF hooks for getsockopt and
   setsockopt along with a new sockopt program type which allows more
   fine-grained pass/reject settings for containers. Also add a sock_ops
   callback that can be selectively enabled on a per-socket basis and is
   executed for every RTT to help tracking TCP statistics, both features
   from Stanislav.

3) Follow-up fix from loops in precision tracking which was not propagating
   precision marks and as a result verifier assumed that some branches were
   not taken and therefore wrongly removed as dead code, from Alexei.

4) Fix BPF cgroup release synchronization race which could lead to a
   double-free if a leaf's cgroup_bpf object is released and a new BPF
   program is attached to the one of ancestor cgroups in parallel, from Roman.

5) Support for bulking XDP_TX on veth devices which improves performance
   in some cases by around 9%, from Toshiaki.

6) Allow for lookups into BPF devmap and improve feedback when calling into
   bpf_redirect_map() as lookup is now performed right away in the helper
   itself, from Toke.

7) Add support for fq's Earliest Departure Time to the Host Bandwidth
   Manager (HBM) sample BPF program, from Lawrence.

8) Various cleanups and minor fixes all over the place from many others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:48:21 -07:00
Mark Zhang
f8efee08dd net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS
qp modification can be used to change the counter of a QP.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-04 21:36:33 +03:00
Tariq Toukan
0718edf528 net/mlx5: Properly name the generic WQE control field
A generic WQE control field is used for different purposes
in different cases.
Use union to allow using the proper name in each case.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03 12:50:42 -07:00
Eran Ben Elisha
a12ff35e0f net/mlx5: Introduce TLS TX offload hardware bits and structures
Add TLS offload related IFC structs, layouts and enumerations.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03 12:50:42 -07:00
Parav Pandit
2752b82316 net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.

Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03 12:50:42 -07:00
Yishai Hadas
e4075c4428 net/mlx5: Expose device definitions for object events
Expose an extra device definitions for objects events.

It includes: object_type values for legacy objects and generic data
header for any other object.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:02:45 +03:00
Yishai Hadas
4e0e2ea188 net/mlx5: Report EQE data upon CQ completion
Report EQE data upon CQ completion to let upper layers use this data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:00:20 +03:00
Yishai Hadas
38164b7719 net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:32 +03:00
Yishai Hadas
c0670781f5 net/mlx5: Expose the API to register for ANY event
Expose the API to register for ANY event, mlx5_ib will be able to use
this functionality for its needs.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:56:29 +03:00
Yishai Hadas
b9a7ba5562 net/mlx5: Use event mask based on device capabilities
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:55:45 +03:00
Mahesh Bandewar
4de83b88c6 loopback: create blackhole net device similar to loopack.
Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registered it won't have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Jason A. Donenfeld
362b87f5b1 netlink: use 48 byte ctx instead of 6 signed longs for callback
People are inclined to stuff random things into cb->args[n] because it
looks like an array of integers. Sometimes people even put u64s in there
with comments noting that a certain member takes up two slots. The
horror! Really this should mirror the usage of skb->cb, which are just
48 opaque bytes suitable for casting a struct. Then people can create
their usual casting macros for accessing strongly typed members of a
struct.

As a plus, this also gives us the same amount of space on 32bit and 64bit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:12:10 -07:00
Bodong Wang
411ec9e0b4 net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop
When ECPF is the eswitch manager, host PF is treated like other VFs.
Driver should do the same for inline mode and vlan pop.

Add new iterators to include host PF if ECPF is the eswitch manager.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:31 -07:00
Bodong Wang
f6455de0b0 net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
e1d974d03e net/mlx5: Handle host PF vport mac/guid for ECPF
When ECPF is eswitch manager, it has the privilege to query and
configure the mac and node guid of host PF.

While vport number of host PF is 0, the vport command should be
issued with other_vport set in this case as the cmd is issued by
ECPF vport(0xfffe).

Add a specific function to query own vport mac. Low level functions
are used by vport manager to query/modify any vport mac and node guid.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Parav Pandit
d886aba677 net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs
While enabling SR-IOV, PCI core already checks that if SR-IOV is already
enabled, it returns failure error code.
Hence, remove such duplicate check from mlx5_core driver.

While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in
reverse order of mlx5_device_enable_sriov().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
5ccf2770e8 net/mlx5: Don't handle VF func change if host PF is disabled
When ECPF eswitch manager is at offloads mode, it monitors functions
changed event from host PF side and acts according to the number of
VFs enabled/disabled.

As ECPF and host PF work in two independent hosts, it's possible that
host PF OS reboots but ECPF system is still kept on and continues
monitoring events from host PF. When kernel from host PF side is
booting, PCI iov driver does sriov_init and compute_max_vf_buses by
iterating over all valid num of VFs. This triggers FLR and generates
functions changed events, even though host PF HCA is not enabled at
this time. However, ECPF is not aware of this information, and still
handles these events as usual. ECPF system will see massive number of
reps are created, but destroyed immediately once creation finished.

To eliminate this noise, a bit is added to host parameter context to
indicate host PF is disabled. ECPF will not handle the VF changed
event if this bit is set.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Huy Nguyen
386e75af99 net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5
device types.

mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep
mlx5_coredev_type in mlx5_core_dev structure.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
2f69e591e4 {IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mapping
In the single IB device mode, the mapping between vport number and
rep relies on a counter. However for dynamic vport allocation, it is
desired to keep consistent map of eswitch vport and IB port.

Hence, simplify code to remove the free running counter and instead
use the available vport index during load/unload sequence from the
eswitch.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Shay Agroskin
a82e0b5bda net/mlx5: Added MCQI and MCQS registers' description to ifc
Given a fw component index, the MCQI register allows us to query
this component's information (e.g. its version and capabilities).

Given a fw component index, the MCQS register allows us to query the
status of a fw component, including its type and state
(e.g. PRESET/IN_USE).
It can be used to find the index of a component of a specific type, by
sequentially increasing the component index, and querying each time the
type of the returned component.
If max component index is reached, 'last_index_flag' is set by the HCA.

These registers' description was added to query the running and pending
fw version of the HCA.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Parav Pandit
1759d322f4 net/mlx5: Add hardware definitions for sub functions
Update mlx5 device interface data structures for:
1. New command definitions for allocating, deallocating SF
2. Query SF partition
3. Eswitch SF fields
4. HCA CAP SF fields
5. Extend Eswitch functions command for SF

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Toke Høiland-Jørgensen
43e74c0267 bpf_xdp_redirect_map: Perform map lookup in eBPF helper
The bpf_redirect_map() helper used by XDP programs doesn't return any
indication of whether it can successfully redirect to the map index it was
given. Instead, BPF programs have to track this themselves, leading to
programs using duplicate maps to track which entries are populated in the
devmap.

This patch fixes this by moving the map lookup into the bpf_redirect_map()
helper, which makes it possible to return failure to the eBPF program. The
lower bits of the flags argument is used as the return code, which means
that existing users who pass a '0' flag argument will get XDP_ABORTED.

With this, a BPF program can check the return code from the helper call and
react by, for instance, substituting a different redirect. This works for
any type of map used for redirect.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29 01:31:09 +02:00
Toke Høiland-Jørgensen
4b55cf290d devmap: Rename ifindex member in bpf_redirect_info
The bpf_redirect_info struct has an 'ifindex' member which was named back
when the redirects could only target egress interfaces. Now that we can
also redirect to sockets and CPUs, this is a bit misleading, so rename the
member to tgt_index.

Reorder the struct members so we can have 'tgt_index' and 'tgt_value' next
to each other in a subsequent patch.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29 01:31:09 +02:00
Toke Høiland-Jørgensen
c8af5cd75e xskmap: Move non-standard list manipulation to helper
Add a helper in list.h for the non-standard way of clearing a list that is
used in xskmap. This makes it easier to reuse it in the other map types,
and also makes sure this usage is not forgotten in any list refactorings in
the future.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29 01:31:08 +02:00
Arnd Bergmann
5233794b17 net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_create
Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
wasteful and causes a warning on 32-bit architectures when building
with clang -fsanitize-coverage:

drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Since the structure is never written to, we can statically allocate
it to avoid the stack usage. To be on the safe side, mark all
subsequent function arguments that we pass it into as 'const'
as well.

Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-28 16:03:59 -07:00
Saeed Mahameed
4f5d1beadc Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:

1) E-Switch vport metadata support for source vport matching
2) Convert mkey_table to XArray
3) Shared IRQs and to use single IRQ for all async EQs

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-28 16:03:54 -07:00
David S. Miller
d96ff269a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The new route handling in ip_mc_finish_output() from 'net' overlapped
with the new support for returning congestion notifications from BPF
programs.

In order to handle this I had to take the dev_loopback_xmit() calls
out of the switch statement.

The aquantia driver conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 21:06:39 -07:00
Linus Torvalds
763cf1f2d9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:

 - fix for one corner case in HID++ protocol with respect to handling
   very long reports, from Hans de Goede

 - power management fix in Intel-ISH driver, from Hyungwoo Yang

 - use-after-free fix in Intel-ISH driver, from Dan Carpenter

 - a couple of new device IDs/quirks from Kai-Heng Feng, Kyle Godbey and
   Oleksandr Natalenko

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: intel-ish-hid: fix wrong driver_data usage
  HID: multitouch: Add pointstick support for ALPS Touchpad
  HID: logitech-dj: Fix forwarding of very long HID++ reports
  HID: uclogic: Add support for Huion HS64 tablet
  HID: chicony: add another quirk for PixArt mouse
  HID: intel-ish-hid: Fix a use after free in load_fw_from_host()
2019-06-28 08:39:18 +08:00
Stanislav Fomichev
0d01da6afc bpf: implement getsockopt and setsockopt hooks
Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
passing them down to the kernel or bypass kernel completely.
BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
kernel returns.
Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

The buffer memory is pre-allocated (because I don't think there is
a precedent for working with __user memory from bpf). This might be
slow to do for each {s,g}etsockopt call, that's why I've added
__cgroup_bpf_prog_array_is_empty that exits early if there is nothing
attached to a cgroup. Note, however, that there is a race between
__cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
program layout might have changed; this should not be a problem
because in general there is a race between multiple calls to
{s,g}etsocktop and user adding/removing bpf progs from a cgroup.

The return code of the BPF program is handled as follows:
* 0: EPERM
* 1: success, continue with next BPF program in the cgroup chain

v9:
* allow overwriting setsockopt arguments (Alexei Starovoitov):
  * use set_fs (same as kernel_setsockopt)
  * buffer is always kzalloc'd (no small on-stack buffer)

v8:
* use s32 for optlen (Andrii Nakryiko)

v7:
* return only 0 or 1 (Alexei Starovoitov)
* always run all progs (Alexei Starovoitov)
* use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
  (decided to use optval=-1 instead, optval=0 might be a valid input)
* call getsockopt hook after kernel handlers (Alexei Starovoitov)

v6:
* rework cgroup chaining; stop as soon as bpf program returns
  0 or 2; see patch with the documentation for the details
* drop Andrii's and Martin's Acked-by (not sure they are comfortable
  with the new state of things)

v5:
* skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

v4:
* don't export bpf_sk_fullsock helper (Martin Lau)
* size != sizeof(__u64) for uapi pointers (Martin Lau)
* offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

v3:
* typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
* reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
  Nakryiko)
* use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
* use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
* new CG_SOCKOPT_ACCESS macro to wrap repeated parts

v2:
* moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
* aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
* bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
* added [0,2] return code check to verifier (Martin Lau)
* dropped unused buf[64] from the stack (Martin Lau)
* use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
* dropped bpf_target_off from ctx rewrites (Martin Lau)
* use return code for kernel bypass (Martin Lau & Andrii Nakryiko)

Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-27 15:25:16 -07:00
David S. Miller
d7ee287827 Generic DIM
From: Tal Gilboa and Yamin Fridman
 
 Implement net DIM over a generic DIM library, add RDMA DIM
 
 dim.h lib exposes an implementation of the DIM algorithm for
 dynamically-tuned interrupt moderation for networking interfaces.
 
 We want a similar functionality for other protocols, which might need to
 optimize interrupts differently. Main motivation here is DIM for NVMf
 storage protocol.
 
 Current DIM implementation prioritizes reducing interrupt overhead over
 latency. Also, in order to reduce DIM's own overhead, the algorithm might
 take some time to identify it needs to change profiles. While this is
 acceptable for networking, it might not work well on other scenarios.
 
 Here we propose a new structure to DIM. The idea is to allow a slightly
 modified functionality without the risk of breaking Net DIM behavior for
 netdev. We verified there are no degradations in current DIM behavior with
 the modified solution.
 
 Suggested solution:
 - Common logic is implemented in lib/dim/dim.c
 - Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
   the common logic in dim.c
 - Any new DIM logic will be implemented in "lib/dim/new_dim.c".
   This new implementation will expose modified versions of profiles,
   dim_step() and dim_decision().
 - DIM API is declared in include/linux/dim.h for all implementations.
 
 Pros for this solution are:
 - Zero impact on existing net_dim implementation and usage
 - Relatively more code reuse (compared to two separate solutions)
 - Increased extensibility
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0SiDAACgkQSD+KveBX
 +j4mlQgAoabbtxwIapMg62tKhnlr/3hEEyuEjiQRhzMUb5y5zOIJNRL82Nerv4K3
 oD7v76emw/Xb1ErPe4yJJRdkpbpabEAFKbqkM0R7xFPNK33ZbphlU1d9YGgUKp1W
 Uieydn5kfubzi6p7R1TC7c2qW1Wr6rE1wnNJyC/sy8zFEP/OEH0NmgPDyOs7ckWI
 xaoJ1lhNWMoGN+Qk+fxazd1VtsKdcbc9JvLD2e8ON5qWdnuNiLhawcJVoWzfsCAa
 V/Tdff+Rj9bVFsi468u20og4EdV3ceAOTzMnhCYpAtYZHVbVk6MFgh87uQYkTrfY
 sW8BDmWKkRdHtzHd3a3yQiVDnz/iuw==
 =bAZn
 -----END PGP SIGNATURE-----

Merge tag 'blk-dim-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mamameed says:

====================
Generic DIM

From: Tal Gilboa and Yamin Fridman

Implement net DIM over a generic DIM library, add RDMA DIM

dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.

We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.

Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.

Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.

Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
  the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
  This new implementation will expose modified versions of profiles,
  dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.

Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 12:42:51 -07:00
David S. Miller
0b58f64845 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-06-26

This series contains updates to ixgbe and i40e only.

Mauro S. M. Rodrigues update the ixgbe driver to handle transceivers who
comply with SFF-8472 but do not implement the Digital Diagnostic
Monitoring (DOM) interface.  Update the driver to check the necessary
bits to see if DOM is implemented before trying to read the additional
256 bytes in the EEPROM for DOM data.

Young Xiao fixes a potential divide by zero issue in ixgbe driver.

Aleksandr fixes i40e to recognize 2.5 and 5.0 GbE link speeds so that it
is not reported as "Unknown bps".  Fixes the driver to read the firmware
LLDP agent status during DCB initialization, and to properly log the
LLDP agent status to help with debugging when DCB fails to initialize.

Martyna fixes i40e for the missing supported and advertised link modes
information in ethtool.

Jake fixes a function header comment that was incorrect for a PTP
function in i40e.

Maciej fixes an issue for i40e when a XDP program is loaded the
descriptor count gets reset to the default value, resolve the issue by
making the current descriptor count persistent across resets.

Alice corrects a copyright date which she found to be incorrect.

Piotr adds a log entry when the traffic class 0 is added or deleted, which
was not being logged previously.

Gustavo A. R. Silva updates i40e to use struct_size() where possible.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:49:06 -07:00
Dave Taht
96125bf998 Allow 0.0.0.0/8 as a valid address range
The longstanding prohibition against using 0.0.0.0/8 dates back
to two issues with the early internet.

There was an interoperability problem with BSD 4.2 in 1984, fixed in
BSD 4.3 in 1986. BSD 4.2 has long since been retired.

Secondly, addresses of the form 0.x.y.z were initially defined only as
a source address in an ICMP datagram, indicating "node number x.y.z on
this IPv4 network", by nodes that know their address on their local
network, but do not yet know their network prefix, in RFC0792 (page
19).  This usage of 0.x.y.z was later repealed in RFC1122 (section
3.2.2.7), because the original ICMP-based mechanism for learning the
network prefix was unworkable on many networks such as Ethernet (which
have longer addresses that would not fit into the 24 "node number"
bits).  Modern networks use reverse ARP (RFC0903) or BOOTP (RFC0951)
or DHCP (RFC2131) to find their full 32-bit address and CIDR netmask
(and other parameters such as default gateways). 0.x.y.z has had
16,777,215 addresses in 0.0.0.0/8 space left unused and reserved for
future use, since 1989.

This patch allows for these 16m new IPv4 addresses to appear within
a box or on the wire. Layer 2 switches don't care.

0.0.0.0/32 is still prohibited, of course.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: John Gilmore <gnu@toad.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26 13:19:46 -07:00
Jianbo Liu
8d212ff057 net/mlx5e: Specifying known origin of packets matching the flow
In vport metadata matching, source port number is replaced by metadata.
While FW has no idea about what it is in the metadata, a syndrome will
happen. Specify a known origin to avoid the syndrome.
However, there is no functional change because ANY_VPORT (0) is filled
in flow_source, the same default value as before, as a pre-step towards
metadata matching for fast path.
There are two other values can be filled in flow_source. When setting
0x1, packet matching this rule is from uplink, while 0x2 is for packet
from other local vports.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
7445cfb116 net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs
When a dual-port VHCA sends a RoCE packet on its non-native port, and the
packet arrives to its affiliated vport FDB, a mismatch might occur on the
rules that match the packet source vport as it is not represented by single
VHCA only in this case. So we change to match on metadata instead of source
vport.
To do that, a rule is created in all vports and uplink ingress ACLs, to
save the source vport number and vhca id in the packet's metadata in order
to match on it later.
The metadata register used is the first of the 32-bit type C registers. It
can be used for matching and header modify operations. The higher 16 bits
of this register are for vhca id, and the lower 16 ones is for vport
number.
This change is not for dual-port RoCE only. If HW and FW allow, the vport
metadata matching is enabled by default.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
bb0ee7dcc4 net/mlx5: Add flow context for flow tag
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
65c0f2c166 net/mlx5: Introduce vport metadata matching bits and enum constants
When a dual-port VHCA sends a RoCE packet on its non-native port, and
the packet arrives to its affiliated vport FDB, a mismatch might occur
on the rules that match the packet source vport. So we replace the
match on source port with the match on metadata that was configured in
ingress ACL, and that metadata will be passed further also to the NIC
RX table of the eswitch manager.

Introduce vport metadata matching bits and enum constants as a pre-step
towards metadata matching.
    o metadata type C registers in the misc parameters 2 fields.
    o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports
      ingress ACL for the uplink vport.
    o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to
      support propagating the metadata to the nic rx through the loopback
      path.
    o flow_source in flow context, to indicate the known origin of packets.
    o enum constants, to support the above bits.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Aleksandr Loktionov
4ae4916b56 i40e: fix 'Unknown bps' in dmesg for 2.5Gb/5Gb speeds
This patch fixes 'NIC Link is Up, Unknown bps' message in dmesg
for 2.5Gb/5Gb speeds. This problem is fixed by adding constants
for VIRTCHNL_LINK_SPEED_2_5GB and VIRTCHNL_LINK_SPEED_5GB cases
in the i40e_virtchnl_link_speed() function.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-26 09:18:54 -07:00
Hyungwoo Yang
b12bbdc5dd HID: intel-ish-hid: fix wrong driver_data usage
Currently, in suspend() and resume(), ishtp client drivers are using
driver_data to get "struct ishtp_cl_device" object which is set by
bus driver. It's wrong since the driver_data should not be owned bus.
driver_data should be owned by the corresponding ishtp client driver.
Due to this, some ishtp client driver like cros_ec_ishtp which uses
its driver_data to transfer its data to its child doesn't work correctly.

So this patch removes setting driver_data in bus drier and instead of
using driver_data to get "struct ishtp_cl_device", since "struct device"
is embedded in "struct ishtp_cl_device", we introduce a helper function
that returns "struct ishtp_cl_device" from "struct device".

Signed-off-by: Hyungwoo Yang <hyungwoo.yang@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2019-06-26 14:08:11 +02:00
Yamin Friedman
398c2b05bb linux/dim: Add completions count to dim_sample
Added a measurement of completions per/msec to allow for completion based
dim algorithms.

In order to use dynamic interrupt moderation with RDMA we need to have a
different measurment than packets per second. This change is meant to
prepare for adding a new DIM method.

All drivers that use net_dim and thus do not need a completion count will
have the completions set to 0.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:40 -07:00
Tal Gilboa
4f75da3666 linux/dim: Move implementation to .c files
Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.
This is both more structurally appealing and would allow to only
expose externally used functions.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:39 -07:00
Tal Gilboa
8960b38932 linux/dim: Rename externally used net_dim members
Removed 'net' prefix from functions and structs used by external drivers.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:39 -07:00
Tal Gilboa
e5b6ab02d7 linux/dim: Rename net_dim_sample() to net_dim_update_sample()
In order to avoid confusion between the function and the similarly
named struct.
In preparation for removing the 'net' prefix from dim members.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:39 -07:00
Tal Gilboa
c002bd529d linux/dim: Rename externally exposed macros
Renamed macros in use by external drivers.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:39 -07:00
Tal Gilboa
449986ea92 linux/dim: Remove "net" prefix from internal DIM members
Only renaming functions and structs which aren't used by an external code.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:38 -07:00
Tal Gilboa
0e58983de0 linux/dim: Move logic to dim.h
In preparation for supporting more implementations of the DIM
algorithm, I'm moving what would become common logic to a common
library. Downstream DIM implementations will use the common lib
for their implementation.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25 13:46:38 -07:00
Matthew Wilcox
792c4e9d0b net/mlx5: Convert mkey_table to XArray
The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24 16:44:40 -07:00