linux-next/net
Ira Weiny 932f4a630a mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM
Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move.  I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem.  Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory.  I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance.  I don't have the numbers as the
tests for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast().  Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality.  In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked.  However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages.  This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer.  This is really flagging a pin which is going to be given to
hardware and can't move.  I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem.  Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken
with some RDMA users who consider MR in the performance path...  For the
overall application performance.  I don't have the numbers as the tests
for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

</quote>

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
..
6lowpan 6lowpan: Off by one handling ->nexthdr 2019-04-23 19:09:58 +02:00
9p 9p/net: fix memory leak in p9_client_create 2019-03-13 11:50:04 +01:00
802
8021q vlan: disable SIOCSHWTSTAMP in container 2019-05-09 09:31:16 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-02 22:14:21 -04:00
atm net: atm: clean up a range check 2019-05-05 10:25:52 -07:00
ax25 net: ax25: fix misuse of %x 2019-04-21 10:37:26 -07:00
batman-adv This feature/cleanup patchset includes the following patches: 2019-05-09 09:44:17 -07:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
bpf bpf: Introduce bpf sk local storage 2019-04-27 09:07:04 -07:00
bpfilter bpfilter: re-add header search paths to tools include to fix build error 2019-02-23 13:34:40 -08:00
bridge netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
caif net: caif: avoid using qdisc_qlen() 2019-04-10 12:20:46 -07:00
can netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
ceph Printk changes for 5.2 2019-05-07 09:18:12 -07:00
core fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied 2019-05-08 09:32:10 -07:00
dcb netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
dccp net: rework SIOCGSTAMP ioctl handling 2019-04-19 14:07:40 -07:00
decnet netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
dns_resolver dns: remove redundant zero length namelen check 2019-04-11 14:01:08 -07:00
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-07 17:22:09 -07:00
ethernet net: ethernet: support of_get_mac_address new ERR_PTR error 2019-05-07 12:22:47 -07:00
hsr genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
ieee802154 genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
ife net: sched: ife: check on metadata length 2018-04-22 21:12:00 -04:00
ipv4 net/tcp: use deferred jump label for TCP acked data hook 2019-05-09 11:13:57 -07:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-07 17:22:09 -07:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm kcm: switch order of device registration to fix a crash 2019-04-01 14:59:20 -07:00
key xfrm: clean up xfrm protocol checks 2019-03-26 08:35:36 +01:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-07 17:22:09 -07:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: Check address length before reading address field 2019-04-12 10:25:03 -07:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-02 22:14:21 -04:00
mac802154 mac802154: Remove VLA usage of skcipher 2018-09-28 12:46:07 +08:00
mpls netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
ncsi genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
netfilter netfilter: xt_hashlimit: use struct_size() helper 2019-05-06 01:03:04 +02:00
netlabel genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
netlink genetlink: do not validate dump requests if there is no policy 2019-05-04 01:27:10 -04:00
netrom net: rework SIOCGSTAMP ioctl handling 2019-04-19 14:07:40 -07:00
nfc genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
nsh nsh: set mac len based on inner packet 2018-07-12 16:55:29 -07:00
openvswitch openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT) 2019-05-08 09:43:15 -07:00
packet packet: Fix error path in packet_init 2019-05-09 13:45:46 -07:00
phonet netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
psample genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
qrtr netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
rds net: rds: fix spelling mistake "syctl" -> "sysctl" 2019-05-05 10:19:43 -07:00
rfkill *: convert stream-like files from nonseekable_open -> stream_open 2019-05-06 17:46:41 +03:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-25 23:52:29 -04:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-02 22:14:21 -04:00
sched net/sched: avoid double free on matchall reoffload 2019-05-08 16:34:58 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
smc 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
strparser net: strparser: make it explicitly non-modular 2019-04-22 21:50:54 -07:00
sunrpc NFS client updates for Linux 5.2 2019-05-09 14:33:15 -07:00
switchdev switchdev: Remove unused transaction item queue 2019-03-01 21:35:19 -08:00
tipc tipc: fix hanging clients using poll with EPOLLOUT flag 2019-05-09 09:26:09 -07:00
tls net/tls: handle errors from padding_length() 2019-05-09 16:37:39 -07:00
unix datagram: remove rendundant 'peeked' argument 2019-04-08 09:51:54 -07:00
vmw_vsock vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock 2019-03-08 15:15:44 -08:00
wimax genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
x25 net: rework SIOCGSTAMP ioctl handling 2019-04-19 14:07:40 -07:00
xdp mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
compat.c net: rework SIOCGSTAMP ioctl handling 2019-04-19 14:07:40 -07:00
Kconfig net: devlink: select NET_DEVLINK from drivers 2019-03-24 14:55:31 -04:00
Makefile net: split out functions related to registering inflight socket files 2019-02-28 08:24:23 -07:00
socket.c net: use indirect calls helpers at the socket layer 2019-05-05 10:38:04 -07:00
sysctl_net.c