93907 Commits

Author SHA1 Message Date
Linus Torvalds
130568d5ea Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
 "This is a followup for block changes, that didn't make the initial
  pull request. It's a bit of a mixed bag, this contains:

   - A followup pull request from Sagi for NVMe. Outside of fixups for
     NVMe, it also includes a series for ensuring that we properly
     quiesce hardware queues when browsing live tags.

   - Set of integrity fixes from Dmitry (mostly), fixing various issues
     for folks using DIF/DIX.

   - Fix for a bug introduced in cciss, with the req init changes. From
     Christoph.

   - Fix for a bug in BFQ, from Paolo.

   - Two followup fixes for lightnvm/pblk from Javier.

   - Depth fix from Ming for blk-mq-sched.

   - Also from Ming, performance fix for mtip32xx that was introduced
     with the dynamic initialization of commands"

* 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
  block: call bio_uninit in bio_endio
  nvmet: avoid unneeded assignment of submit_bio return value
  nvme-pci: add module parameter for io queue depth
  nvme-pci: compile warnings in nvme_alloc_host_mem()
  nvmet_fc: Accept variable pad lengths on Create Association LS
  nvme_fc/nvmet_fc: revise Create Association descriptor length
  lightnvm: pblk: remove unnecessary checks
  lightnvm: pblk: control I/O flow also on tear down
  cciss: initialize struct scsi_req
  null_blk: fix error flow for shared tags during module_init
  block: Fix __blkdev_issue_zeroout loop
  nvme-rdma: unconditionally recycle the request mr
  nvme: split nvme_uninit_ctrl into stop and uninit
  virtio_blk: quiesce/unquiesce live IO when entering PM states
  mtip32xx: quiesce request queues to make sure no submissions are inflight
  nbd: quiesce request queues to make sure no submissions are inflight
  nvme: kick requeue list when requeueing a request instead of when starting the queues
  nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
  nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
  nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
  ...
2017-07-11 15:36:52 -07:00
Linus Torvalds
3bf7878f0f The main item here is support for v12.y.z ("Luminous") clusters:
RESEND_ON_SPLIT, RADOS_BACKOFF, OSDMAP_PG_UPMAP and CRUSH_CHOOSE_ARGS
 feature bits, and various other changes in the RADOS client protocol.
 On top of that we have a new fsc mount option to allow supplying
 fscache uniquifier (similar to NFS) and the usual pile of filesystem
 fixes from Zheng.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJZZQT+AAoJEEp/3jgCEfOLSsMH/i8ZdSzp7ocX00oLMlIxzFEk
 5BUXZ086mEPAE4fjJFPO7+qYk6y26MzAhJL+bj8r5E0GvBEpQkoAoSQZ19Mj5ApC
 nZnllzQ2C8kYvM4hp4Z2pLrF/OYACj/WJJgbTxubBET1zRq1iPj4EgbzBEraPvma
 K76W9ILKNUjIoSDlNR5qvykXXfvi2dxRpi/8nvfMCOcjlw/7orjXVLa05fKmmOoX
 OvpOjicWOrc8NlacGK+j1j1aaKlmLvZb9Ff+45hfC/L5PPQblM0dypFCVfq3MFFq
 nUxKgTCAQDPrndzCdURCtdovjFKbskRGKmhnd0EZkdDCcnUmg6nLxqta6g2Dbs0=
 =ioKM
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The main item here is support for v12.y.z ("Luminous") clusters:
  RESEND_ON_SPLIT, RADOS_BACKOFF, OSDMAP_PG_UPMAP and CRUSH_CHOOSE_ARGS
  feature bits, and various other changes in the RADOS client protocol.

  On top of that we have a new fsc mount option to allow supplying
  fscache uniquifier (similar to NFS) and the usual pile of filesystem
  fixes from Zheng"

* tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client: (44 commits)
  libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS
  libceph: osd_state is 32 bits wide in luminous
  crush: remove an obsolete comment
  crush: crush_init_workspace starts with struct crush_work
  libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()
  crush: implement weight and id overrides for straw2
  libceph: apply_upmap()
  libceph: compute actual pgid in ceph_pg_to_up_acting_osds()
  libceph: pg_upmap[_items] infrastructure
  libceph: ceph_decode_skip_* helpers
  libceph: kill __{insert,lookup,remove}_pg_mapping()
  libceph: introduce and switch to decode_pg_mapping()
  libceph: don't pass pgid by value
  libceph: respect RADOS_BACKOFF backoffs
  libceph: make DEFINE_RB_* helpers more general
  libceph: avoid unnecessary pi lookups in calc_target()
  libceph: use target pi for calc_target() calculations
  libceph: always populate t->target_{oid,oloc} in calc_target()
  libceph: make sure need_resend targets reflect latest map
  libceph: delete from need_resend_linger before check_linger_pool_dne()
  ...
2017-07-11 12:12:28 -07:00
Linus Torvalds
a3ddacbae5 chrome-platform-for-linus-4.13
Changes in this pull request are around catching up
 cros_ec with the internal chromeos-kernel versions of
 cros_ec, cros_ec_lpc, and cros_ec_lightbar.
 
 Also, switching maintainership from olof to bleung.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJZZBB8AAoJEB8J9XsKL+ZYcf4P/iRXb23r6pJgaqE3jO1mLJjQ
 aJH8sMVk3q0tIA/Wo3blVZmUD87RkDPqQNRhUx4AKuTtkq+zi+YIdltBk9nyK2tZ
 oRKtAFe1RL1a7Bxvh2im51mFE91q05nItPee+zylAKHL2PudKsAtvsjqEP/qmIBm
 h3XkkOMzSB3cqAjzaLm6bE531pFoRx6yKWUMGr0aTbOjXewC2uhP/U9rJYqtiaYl
 1oRfg1759cUxH1QXmsKIA5Ua2gKDZ+32aszxxgxSWmZ5671SB0psuyLW4Aar7XS0
 MNKGIYgKWBAUHX8iBTLwz/Z4VBB8X9DS2BfDvCZwDJtjCjYcJPzLKjqyGeJ3wr0G
 jW/kfjJL0G1FPxmS7WnsiUcDJemn+p/ia2/9HipLMM61fy7clezmBaxV8I4aWMh0
 zxW8Bk7+qOOv9D72ErKKHJ1oaZ3EWXgWWfiUEmr+99n6GOfFu0vF5+gcdV4HVLKB
 g2Gmt89OE+oMBAlWtDhX/RdhY2Xxf4POsCriBrqrealYXe9NIxjrleKRr6ysEj37
 71/X6TFaqGTYoyyDAVjFmIu6upGVoCLLdx9b/BodV1hyq97AIKHOdzOXpCKk2nvx
 IuA+JOWeoSGBD28CBhuvitJFDwTJv973Z+N9VrvZj91MKI89zI3Y0+sPAm69fbQ4
 mqkTtiLPIfCsvZE/7lWN
 =QtSr
 -----END PGP SIGNATURE-----

Merge tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform

Pull chrome platform updates from Benson Leung:
 "Changes in this pull request are around catching up cros_ec with the
  internal chromeos-kernel versions of cros_ec, cros_ec_lpc, and
  cros_ec_lightbar.

  Also, switching maintainership from olof to bleung"

* tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
  platform/chrome : Add myself as Maintainer
  platform/chrome: cros_ec_lightbar - hide unused PM functions
  cros_ec: Don't signal wake event for non-wake host events
  cros_ec: Fix deadlock when EC is not responsive at probe
  cros_ec: Don't return error when checking command version
  platform/chrome: cros_ec_lightbar - Avoid I2C xfer to EC during suspend
  platform/chrome: cros_ec_lightbar - Add userspace lightbar control bit to EC
  platform/chrome: cros_ec_lightbar - Control of suspend/resume lightbar sequence
  platform/chrome: cros_ec_lightbar - Add lightbar program feature to sysfs
  platform/chrome: cros_ec_lpc: Add MKBP events support over ACPI
  platform/chrome: cros_ec_lpc: Add power management ops
  platform/chrome: cros_ec_lpc: Add support for GOOG004 ACPI device
  platform/chrome: cros_ec_lpc: Add support for mec1322 EC
  platform/chrome: cros_ec_lpc: Add R/W helpers to LPC protocol variants
  mfd: cros_ec: Add support for dumping panic information
  cros_ec_debugfs: Pass proper struct sizes to cros_ec_cmd_xfer()
  mfd: cros_ec: add debugfs, console log file
  mfd: cros_ec: Add EC console read structures definitions
  mfd: cros_ec: Add helper for event notifier.
2017-07-11 09:55:47 -07:00
Linus Torvalds
9967468c0a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - most of the rest of MM

 - KASAN updates

 - lib/ updates

 - checkpatch updates

 - some binfmt_elf changes

 - various misc bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (115 commits)
  kernel/exit.c: avoid undefined behaviour when calling wait4()
  kernel/signal.c: avoid undefined behaviour in kill_something_info
  binfmt_elf: safely increment argv pointers
  s390: reduce ELF_ET_DYN_BASE
  powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
  arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
  arm: move ELF_ET_DYN_BASE to 4MB
  binfmt_elf: use ELF_ET_DYN_BASE only for PIE
  fs, epoll: short circuit fetching events if thread has been killed
  checkpatch: improve multi-line alignment test
  checkpatch: improve macro reuse test
  checkpatch: change format of --color argument to --color[=WHEN]
  checkpatch: silence perl 5.26.0 unescaped left brace warnings
  checkpatch: improve tests for multiple line function definitions
  checkpatch: remove false warning for commit reference
  checkpatch: fix stepping through statements with $stat and ctx_statement_block
  checkpatch: [HLP]LIST_HEAD is also declaration
  checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t
  checkpatch: improve the unnecessary OOM message test
  lib/bsearch.c: micro-optimize pivot position calculation
  ...
2017-07-10 16:58:42 -07:00
Thomas Meyer
a94c33dd1f lib/extable.c: use bsearch() library function in search_extable()
[thomas@m3y3r.de: v3: fix arch specific implementations]
  Link: http://lkml.kernel.org/r/1497890858.12931.7.camel@m3y3r.de
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:35 -07:00
Matthew Wilcox
2c6deb0152 bitmap: use memcmp optimisation in more situations
Commit 7dd968163f7c ("bitmap: bitmap_equal memcmp optimization") was
rather more restrictive than necessary; we can use memcmp() to implement
bitmap_equal() as long as the number of bits can be proved to be a
multiple of 8.  And architectures other than s390 may be able to make
good use of this optimisation.

[arnd@arndb.de: fix build: add a memcmp() declaration]
  Link: http://lkml.kernel.org/r/20170630153908.3439707-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628153221.11322-5-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Matthew Wilcox
2a98dc028f include/linux/bitmap.h: turn bitmap_set and bitmap_clear into memset when possible
Several callers have constant 'start' and an 'nbits' that is a multiple
of 8, so we can turn them into calls to memset.  We don't need the
entirety of 'start' and 'nbits' to be constant, we just need to know
whether they're divisible by 8.

Link: http://lkml.kernel.org/r/20170628153221.11322-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Matthew Wilcox
e5af323c9b bitmap: optimise bitmap_set and bitmap_clear of a single bit
We have eight users calling bitmap_clear for a single bit and seventeen
calling bitmap_set for a single bit.  Rather than fix all of them to
call __clear_bit or __set_bit, turn bitmap_clear and bitmap_set into
inline functions and make this special case efficient.

Link: http://lkml.kernel.org/r/20170628153221.11322-3-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Bart Van Assche
287f3ca563 ARM: fix rd_size declaration
The global variable 'rd_size' is declared as 'int' in source file
arch/arm/kernel/atags_parse.c and as 'unsigned long' in
drivers/block/brd.c.  Fix this inconsistency.

Additionally, remove the declarations of rd_image_start, rd_prompt and
rd_doload from parse_tag_ramdisk() since these duplicate existing
declarations in <linux/initrd.h>.

Link: http://lkml.kernel.org/r/20170627065024.12347-1-bart.vanassche@wdc.com
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Zhaohongjiang <zhaohongjiang@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Ian Abbott
bc6245e5ef bug: split BUILD_BUG stuff out into <linux/build_bug.h>
Including <linux/bug.h> pulls in a lot of bloat from <asm/bug.h> and
<asm-generic/bug.h> that is not needed to call the BUILD_BUG() family of
macros.  Split them out into their own header, <linux/build_bug.h>.

Also correct some checkpatch.pl errors for the BUILD_BUG_ON_ZERO() and
BUILD_BUG_ON_NULL() macros by adding parentheses around the bitfield
widths that begin with a minus sign.

Link: http://lkml.kernel.org/r/20170525120316.24473-6-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Ian Abbott
47e81e59d9 linux/bug.h: correct "space required before that '-'"
Correct these checkpatch.pl errors:

|ERROR: space required before that '-' (ctx:OxO)
|#37: FILE: include/linux/bug.h:37:
|+#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))

|ERROR: space required before that '-' (ctx:OxO)
|#38: FILE: include/linux/bug.h:38:
|+#define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); }))

I decided to wrap the bitfield expressions that begin with minus signs
in parentheses rather than insert spaces before the minus signs.

Link: http://lkml.kernel.org/r/20170525120316.24473-5-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Ian Abbott
8cdd7cca92 linux/bug.h: correct "(foo*)" should be "(foo *)"
Correct this checkpatch.pl error:

|ERROR: "(foo*)" should be "(foo *)"
|#19: FILE: include/linux/bug.h:19:
|+#define BUILD_BUG_ON_NULL(e) ((void*)0)

Link: http://lkml.kernel.org/r/20170525120316.24473-4-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Ian Abbott
e9d5a48499 linux/bug.h: correct formatting of block comment
Correct these checkpatch.pl warnings:

|WARNING: Block comments use * on subsequent lines
|#34: FILE: include/linux/bug.h:34:
|+/* Force a compilation error if condition is true, but also produce a
|+   result (of value 0 and type size_t), so the expression can be used

|WARNING: Block comments use a trailing */ on a separate line
|#36: FILE: include/linux/bug.h:36:
|+   aren't permitted). */

Link: http://lkml.kernel.org/r/20170525120316.24473-3-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Ian Abbott
0b396923ee asm-generic/bug.h: declare struct pt_regs; before function prototype
This series of patches splits BUILD_BUG related macros out of
"include/linux/bug.h" into new file "include/linux/build_bug.h" (patch
5), and changes the pointer type checking in the `container_of()` macro
to deal with pointers of array type better (patch 6).  Patches 1 to 4
are prerequisites.

Patches 2, 3, 4, and 5 have been inserted since the previous version of
this patch series.  Patch 6 here corresponds to v3 and v4's patch 2.

Patch 1 was a prerequisite in v3 of this series to avoid a lot of
warnings when <linux/bug.h> was included by <linux/kernel.h>.  That is
no longer relevant for v5 of the series, but I left it in because it was
acked by a Arnd Bergmann and Michal Nazarewicz.

Patches 2, 3, and 4 are some checkpatch clean-ups on
"include/linux/bug.h" before splitting out the BUILD_BUG stuff in patch
5.

Patch 5 splits the BUILD_BUG related macros out of "include/linux/bug.h"
into new file "include/linux/build_bug.h" because including
<linux/bug.h> in "include/linux/kernel.h" would result in build failures
due to circular dependencies.

Patch 6 changes the pointer type checking by `container_of()` to avoid
some incompatible pointer warnings when the dereferenced pointer has
array type.

1) asm-generic/bug.h: declare struct pt_regs; before function prototype
2) linux/bug.h: correct formatting of block comment
3) linux/bug.h: correct "(foo*)" should be "(foo *)"
4) linux/bug.h: correct "space required before that '-'"
5) bug: split BUILD_BUG stuff out into <linux/build_bug.h>
6) kernel.h: handle pointers to arrays better in container_of()

This patch (of 6):

The declaration of `__warn()` has `struct pt_regs *regs` as one of its
parameters.  This can result in compiler warnings if `struct regs` is not
already declared.  Add an empty declaration of `struct pt_regs` to avoid
the warnings.

Link: http://lkml.kernel.org/r/20170525120316.24473-2-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Michal Hocko
9d1f4b3f5b mm: disallow early_pfn_to_nid on configurations which do not implement it
early_pfn_to_nid will return node 0 if both HAVE_ARCH_EARLY_PFN_TO_NID
and HAVE_MEMBLOCK_NODE_MAP are disabled.  It seems we are safe now
because all architectures which support NUMA define one of them (with an
exception of alpha which however has CONFIG_NUMA marked as broken) so
this works as expected.  It can get silently and subtly broken too
easily, though.  Make sure we fail the compilation if NUMA is enabled
and there is no proper implementation for this function.  If that ever
happens we know that either the specific configuration is invalid and
the fix should either disable NUMA or enable one of the above configs.

Link: http://lkml.kernel.org/r/20170704075803.15979-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Thomas Gleixner
a47fed5b5b mm: swap: provide lru_add_drain_all_cpuslocked()
The rework of the cpu hotplug locking unearthed potential deadlocks with
the memory hotplug locking code.

The solution for these is to rework the memory hotplug locking code as
well and take the cpu hotplug lock before the memory hotplug lock in
mem_hotplug_begin(), but this will cause a recursive locking of the cpu
hotplug lock when the memory hotplug code calls lru_add_drain_all().

Split out the inner workings of lru_add_drain_all() into
lru_add_drain_all_cpuslocked() so this function can be invoked from the
memory hotplug code with the cpu hotplug lock held.

Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Sahitya Tummala
2c80cd57c7 mm/list_lru.c: fix list_lru_count_node() to be race free
list_lru_count_node() iterates over all memcgs to get the total number of
entries on the node but it can race with memcg_drain_all_list_lrus(),
which migrates the entries from a dead cgroup to another.  This can return
incorrect number of entries from list_lru_count_node().

Fix this by keeping track of entries per node and simply return it in
list_lru_count_node().

Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Polakov <apolyakov@beget.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Nikolay Borisov
e3d3910a57 include/linux/backing-dev.h: simplify wb_stat_sum
wb_stat_sum() disables interrupts and calls __wb_stat_sum() which
eventually calls __percpu_counter_sum().  However, the percpu routine is
already irq-safe.  Simplify the code a bit by making wb_stat_sum()
directly call percpu_counter_sum_positive() and not disable interrupts.

Also remove the now-uneeded __wb_stat_sum() which was just a wrapper
over percpu_counter_sum_positive().

Link: http://lkml.kernel.org/r/1498230681-29103-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:32 -07:00
Nikolay Borisov
618b8c20d0 include/linux/mmzone.h: remove ancient/ambiguous comment
Currently pg_data_t is just a struct which describes a NUMA node memory
layout.  Let's keep the comment simple and remove ambiguity.

Link: http://lkml.kernel.org/r/1498220534-22717-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:32 -07:00
Michal Hocko
3e59fcb0e8 hugetlb: add support for preferred node to alloc_huge_page_nodemask
alloc_huge_page_nodemask tries to allocate from any numa node in the
allowed node mask starting from lower numa nodes.  This might lead to
filling up those low NUMA nodes while others are not used.  We can
reduce this risk by introducing a concept of the preferred node similar
to what we have in the regular page allocator.  We will start allocating
from the preferred nid and then iterate over all allowed nodes in the
zonelist order until we try them all.

This is mimicing the page allocator logic except it operates on per-node
mempools.  dequeue_huge_page_vma already does this so distill the
zonelist logic into a more generic dequeue_huge_page_nodemask and use it
in alloc_huge_page_nodemask.

This will allow us to use proper per numa distance fallback also for
alloc_huge_page_node which can use alloc_huge_page_nodemask now and we
can get rid of alloc_huge_page_node helper which doesn't have any user
anymore.

Link: http://lkml.kernel.org/r/20170622193034.28972-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:32 -07:00
Michal Hocko
aaf14e40a3 mm, hugetlb: unclutter hugetlb allocation layers
Patch series "mm, hugetlb: allow proper node fallback dequeue".

While working on a hugetlb migration issue addressed in a separate
patchset[1] I have noticed that the hugetlb allocations from the
preallocated pool are quite subotimal.

 [1] //lkml.kernel.org/r/20170608074553.22152-1-mhocko@kernel.org

There is no fallback mechanism implemented and no notion of preferred
node.  I have tried to work around it but Vlastimil was right to push
back for a more robust solution.  It seems that such a solution is to
reuse zonelist approach we use for the page alloctor.

This series has 3 patches.  The first one tries to make hugetlb
allocation layers more clear.  The second one implements the zonelist
hugetlb pool allocation and introduces a preferred node semantic which
is used by the migration callbacks.  The last patch is a clean up.

This patch (of 3):

Hugetlb allocation path for fresh huge pages is unnecessarily complex
and it mixes different interfaces between layers.

__alloc_buddy_huge_page is the central place to perform a new
allocation.  It checks for the hugetlb overcommit and then relies on
__hugetlb_alloc_buddy_huge_page to invoke the page allocator.  This is
all good except that __alloc_buddy_huge_page pushes vma and address down
the callchain and so __hugetlb_alloc_buddy_huge_page has to deal with
two different allocation modes - one for memory policy and other node
specific (or to make it more obscure node non-specific) requests.

This just screams for a reorganization.

This patch pulls out all the vma specific handling up to
__alloc_buddy_huge_page_with_mpol where it belongs.
__alloc_buddy_huge_page will get nodemask argument and
__hugetlb_alloc_buddy_huge_page will become a trivial wrapper over the
page allocator.

In short:
__alloc_buddy_huge_page_with_mpol - memory policy handling
  __alloc_buddy_huge_page - overcommit handling and accounting
    __hugetlb_alloc_buddy_huge_page - page allocator layer

Also note that __hugetlb_alloc_buddy_huge_page and its cpuset retry loop
is not really needed because the page allocator already handles the
cpusets update.

Finally __hugetlb_alloc_buddy_huge_page had a special case for node
specific allocations (when no policy is applied and there is a node
given).  This has relied on __GFP_THISNODE to not fallback to a different
node.  alloc_huge_page_node is the only caller which relies on this
behavior so move the __GFP_THISNODE there.

Not only does this remove quite some code it also should make those
layers easier to follow and clear wrt responsibilities.

Link: http://lkml.kernel.org/r/20170622193034.28972-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:32 -07:00
Roman Gushchin
422580c3ce mm/oom_kill.c: add tracepoints for oom reaper-related events
During the debugging of the problem described in
https://lkml.org/lkml/2017/5/17/542 and fixed by Tetsuo Handa in
https://lkml.org/lkml/2017/5/19/383 , I've found that the existing debug
output is not really useful to understand issues related to the oom
reaper.

So, I assume, that adding some tracepoints might help with debugging of
similar issues.

Trace the following events:
 1) a process is marked as an oom victim,
 2) a process is added to the oom reaper list,
 3) the oom reaper starts reaping process's mm,
 4) the oom reaper finished reaping,
 5) the oom reaper skips reaping.

How it works in practice? Below is an example which show how the problem
mentioned above can be found: one process is added twice to the
oom_reaper list:

  $ cd /sys/kernel/debug/tracing
  $ echo "oom:mark_victim" > set_event
  $ echo "oom:wake_reaper" >> set_event
  $ echo "oom:skip_task_reaping" >> set_event
  $ echo "oom:start_task_reaping" >> set_event
  $ echo "oom:finish_task_reaping" >> set_event
  $ cat trace_pipe
          allocate-502   [001] ....    91.836405: mark_victim: pid=502
          allocate-502   [001] .N..    91.837356: wake_reaper: pid=502
          allocate-502   [000] .N..    91.871149: wake_reaper: pid=502
        oom_reaper-23    [000] ....    91.871177: start_task_reaping: pid=502
        oom_reaper-23    [000] .N..    91.879511: finish_task_reaping: pid=502
        oom_reaper-23    [000] ....    91.879580: skip_task_reaping: pid=502

Link: http://lkml.kernel.org/r/20170530185231.GA13412@castle
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:32 -07:00
Michal Hocko
8b91323889 mm: unify new_node_page and alloc_migrate_target
Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest
neighbor node when mem-offline") has duplicated a large part of
alloc_migrate_target with some hotplug specific special casing.

To be more precise it tried to enfore the allocation from a different
node than the original page.  As a result the two function diverged in
their shared logic, e.g.  the hugetlb allocation strategy.

Let's unify the two and express different NUMA requirements by the given
nodemask.  new_node_page will simply exclude the node it doesn't care
about and alloc_migrate_target will use all the available nodes.
alloc_migrate_target will then learn to migrate hugetlb pages more
sanely and use preallocated pool when possible.

Please note that alloc_migrate_target used to call alloc_page resp.
alloc_pages_current so the memory policy of the current context which is
quite strange when we consider that it is used in the context of
alloc_contig_range which just tries to migrate pages which stand in the
way.

Link: http://lkml.kernel.org/r/20170608074553.22152-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Michal Hocko
4db9b2efe9 hugetlb, memory_hotplug: prefer to use reserved pages for migration
new_node_page will try to use the origin's next NUMA node as the
migration destination for hugetlb pages.  If such a node doesn't have
any preallocated pool it falls back to __alloc_buddy_huge_page_no_mpol
to allocate a surplus page instead.  This is quite subotpimal for any
configuration when hugetlb pages are no distributed to all NUMA nodes
evenly.  Say we have a hotplugable node 4 and spare hugetlb pages are
node 0

  /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:10000
  /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node4/hugepages/hugepages-2048kB/nr_hugepages:10000
  /sys/devices/system/node/node5/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node6/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node7/hugepages/hugepages-2048kB/nr_hugepages:0

Now we consume the whole pool on node 4 and try to offline this node.
All the allocated pages should be moved to node0 which has enough
preallocated pages to hold them.  With the current implementation
offlining very likely fails because hugetlb allocations during runtime
are much less reliable.

Fix this by reusing the nodemask which excludes migration source and try
to find a first node which has a page in the preallocated pool first and
fall back to __alloc_buddy_huge_page_no_mpol only when the whole pool is
consumed.

[akpm@linux-foundation.org: remove bogus arg from alloc_huge_page_nodemask() stub]
Link: http://lkml.kernel.org/r/20170608074553.22152-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Will Deacon
108a7ac448 include/linux/page_ref.h: ensure page_ref_unfreeze is ordered against prior accesses
page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,
wrapping a critical section where struct pages can be modified without
having to worry about consistency for a concurrent fast-GUP.

Whilst page_ref_freeze has full barrier semantics due to its use of
atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which
doesn't provide any barrier semantics and allows the operation to be
reordered with respect to page modifications in the critical section.

This patch ensures that page_ref_unfreeze is ordered after any critical
section updates, by invoking smp_mb() prior to the atomic_set.

Link: http://lkml.kernel.org/r/1497349722-6731-3-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Dan Williams
baabda2614 mm: always enable thp for dax mappings
The madvise policy for transparent huge pages is meant to avoid unwanted
allocations of transparent huge pages.  It allows a policy of disabling
the extra memory pressure and effort to arrange for a huge page when it
is not needed.

DAX by definition never incurs this overhead since it is statically
allocated.  The policy choice makes even less sense for device-dax which
tries to guarantee a given tlb-fault size.  Specifically, the following
setting:

	echo never > /sys/kernel/mm/transparent_hugepage/enabled

...violates that guarantee and silently disables all device-dax
instances with a 2M or 1G alignment.  So, let's avoid that non-obvious
side effect by force enabling thp for dax mappings in all cases.

It is worth noting that the reason this uses vma_is_dax(), and the
resulting header include changes, is that previous attempts to add a
VM_DAX flag were NAKd.

Link: http://lkml.kernel.org/r/149739531127.20686.15813586620597484283.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Dan Williams
16981d7635 mm: improve readability of transparent_hugepage_enabled()
Turn the macro into a static inline and rewrite the condition checks for
better readability in preparation for adding another condition.

[ross.zwisler@linux.intel.com: fix logic to make conversion equivalent]
[akpm@linux-foundation.org: resolve vs mm-make-pr_set_thp_disable-immediately-active.patch]
[akpm@linux-foundation.org: include coredump.h for MMF_DISABLE_THP]
Link: http://lkml.kernel.org/r/149739530612.20686.14760671150202647861.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Steven Rostedt (VMware)
7ab0e50ad0 oom, trace: remove ENUM evaluation of COMPACTION_FEEDBACK
After enabling CONFIG_TRACE_ENUM_MAP_FILE (which will soon be renamed to
CONFIG_TRACE_EVAL_MAP_FILE), I am able to examine the enums that have
been evaluated:

 # cat /sys/kernel/debug/tracing/enum_map

(which will soon be renamed to eval_map)

And it showed some interesting results:

  [..]
  ZONE_MOVABLE 3 (oom)
  ZONE_NORMAL 2 (oom)
  ZONE_DMA32 1 (oom)
  ZONE_DMA 0 (oom)
  3 3 (oom)
  2 2 (oom)
  1 1 (oom)
  COMPACT_PRIO_ASYNC 2 (oom)
  COMPACT_PRIO_SYNC_LIGHT 1 (oom)
  COMPACT_PRIO_SYNC_FULL 0 (oom)
  [..]
  ZONE_DMA 0 (vmscan)
  3 3 (vmscan)
  2 2 (vmscan)
  1 1 (vmscan)
  COMPACT_PRIO_ASYNC 2 (vmscan)
  [..]
  ZONE_DMA 0 (kmem)
  3 3 (kmem)
  2 2 (kmem)
  1 1 (kmem)
  COMPACT_PRIO_ASYNC 2 (kmem)
  [..]
  ZONE_DMA 0 (compaction)
  3 3 (compaction)
  2 2 (compaction)
  1 1 (compaction)
  COMPACT_PRIO_ASYNC 2 (compaction)
  [..]

The name within the parenthesis are the trace systems that the enum/eval
maps are associated with. When there's a number evaluated to another
number, that tells me that the TRACE_DEFINE_ENUM() was used on a #define
and not an enum. As #defines get converted normally, they are not needed
to be evaluated.

Each of the above trace systems with the number to number evaluation
included the file include/trace/events/mmflags.h which has:

 /* High-level compaction status feedback */
 #define COMPACTION_FAILED       1
 #define COMPACTION_WITHDRAWN    2
 #define COMPACTION_PROGRESS     3

[..]

 #define COMPACTION_FEEDBACK             \
        EM(COMPACTION_FAILED,           "failed")       \
        EM(COMPACTION_WITHDRAWN,        "withdrawn")    \
        EMe(COMPACTION_PROGRESS,        "progress")

Which is still needed for the __print_symbolic() usage in the
trace_event.  But it is not needed to be evaluated.

Removing the evaluation part removes the unnecessary evaluations of
numbers to numbers.

Link: http://lkml.kernel.org/r/20170615074944.7be9a647@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Michal Hocko
1860033237 mm: make PR_SET_THP_DISABLE immediately active
PR_SET_THP_DISABLE has a rather subtle semantic.  It doesn't affect any
existing mapping because it only updated mm->def_flags which is a
template for new mappings.

The mappings created after prctl(PR_SET_THP_DISABLE) have VM_NOHUGEPAGE
flag set.  This can be quite surprising for all those applications which
do not do prctl(); fork() & exec() and want to control their own THP
behavior.

Another usecase when the immediate semantic of the prctl might be useful
is a combination of pre- and post-copy migration of containers with
CRIU.  In this case CRIU populates a part of a memory region with data
that was saved during the pre-copy stage.  Afterwards, the region is
registered with userfaultfd and CRIU expects to get page faults for the
parts of the region that were not yet populated.  However, khugepaged
collapses the pages and the expected page faults do not occur.

In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
temporary mechanism for enabling/disabling THP process wide.

Implementation wise, a new MMF_DISABLE_THP flag is added.  This flag is
tested when decision whether to use huge pages is taken either during
page fault of at the time of THP collapse.

It should be noted, that the new implementation makes PR_SET_THP_DISABLE
master override to any per-VMA setting, which was not the case
previously.

Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
Link: http://lkml.kernel.org/r/1496415802-30944-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Naoya Horiguchi
ddd40d8a2c mm: hugetlb: delete dequeue_hwpoisoned_huge_page()
dequeue_hwpoisoned_huge_page() is no longer used, so let's remove it.

Link: http://lkml.kernel.org/r/1496305019-5493-9-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:30 -07:00
Anshuman Khandual
c3114a84f7 mm: hugetlb: soft-offline: dissolve source hugepage after successful migration
Currently hugepage migrated by soft-offline (i.e.  due to correctable
memory errors) is contained as a hugepage, which means many non-error
pages in it are unreusable, i.e.  wasted.

This patch solves this issue by dissolving source hugepages into buddy.
As done in previous patch, PageHWPoison is set only on a head page of
the error hugepage.  Then in dissoliving we move the PageHWPoison flag
to the raw error page so that all healthy subpages return back to buddy.

[arnd@arndb.de: fix warnings: replace some macros with inline functions]
  Link: http://lkml.kernel.org/r/20170609102544.2947326-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1496305019-5493-5-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:30 -07:00
Naoya Horiguchi
b37ff71cc6 mm: hwpoison: change PageHWPoison behavior on hugetlb pages
We'd like to narrow down the error region in memory error on hugetlb
pages.  However, currently we set PageHWPoison flags on all subpages in
the error hugepage and add # of subpages to num_hwpoison_pages, which
doesn't fit our purpose.

So this patch changes the behavior and we only set PageHWPoison on the
head page then increase num_hwpoison_pages only by 1.  This is a
preparation for narrow-down part which comes in later patches.

Link: http://lkml.kernel.org/r/1496305019-5493-4-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:30 -07:00
Shaohua Li
23955622ff swap: add block io poll in swapin path
For fast flash disk, async IO could introduce overhead because of
context switch.  block-mq now supports IO poll, which improves
performance and latency a lot.  swapin is a good place to use this
technique, because the task is waiting for the swapin page to continue
execution.

In my virtual machine, directly read 4k data from a NVMe with iopoll is
about 60% better than that without poll.  With iopoll support in swapin
patch, my microbenchmark (a task does random memory write) is about
10%~25% faster.  CPU utilization increases a lot though, 2x and even 3x
CPU utilization.  This will depend on disk speed.

While iopoll in swapin isn't intended for all usage cases, it's a win
for latency sensistive workloads with high speed swap disk.  block layer
has knob to control poll in runtime.  If poll isn't enabled in block
layer, there should be no noticeable change in swapin.

I got a chance to run the same test in a NVMe with DRAM as the media.
In simple fio IO test, blkpoll boosts 50% performance in single thread
test and ~20% in 8 threads test.  So this is the base line.  In above
swap test, blkpoll boosts ~27% performance in single thread test.
blkpoll uses 2x CPU time though.

If we enable hybid polling, the performance gain has very slight drop
but CPU time is only 50% worse than that without blkpoll.  Also we can
adjust parameter of hybid poll, with it, the CPU time penality is
reduced further.  In 8 threads test, blkpoll doesn't help though.  The
performance is similar to that without blkpoll, but cpu utilization is
similar too.  There is lock contention in swap path.  The cpu time
spending on blkpoll isn't high.  So overall, blkpoll swapin isn't worse
than that without it.

The swapin readahead might read several pages in in the same time and
form a big IO request.  Since the IO will take longer time, it doesn't
make sense to do poll, so the patch only does iopoll for single page
swapin.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965c
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:30 -07:00
Linus Torvalds
548aa0e3c5 Device properties framework updates for v4.13-rc1
- Rearrange the core device properties code by moving the code
    specific to each supported platform configuration framework
    (ACPI, DT and build-in) into a separate file (Sakari Ailus).
 
  - Add helper functions for accessing device properties in a
    firmware-agnostic way (Sakari Ailus, Kieran Bingham).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZY/OvAAoJEILEb/54YlRx8NYP/3hL9jK8yBgDEVkoNnRDN7Xj
 gefqm++3YO6wy8pfuACfNal2lJRFat5NDWdkBUsFA7LAfaLYe6jkIZ8YW+XTcHq7
 H8dmFiDjfhVa4/urS6DTHmHqPyWLh4ayjTfwRP0yy2aW2XpXe1WNAgilLStu96iH
 OU44HY5FQCQipnIIcuJZpJwH51AMyJIuAV5rQJySoIuDyULRKmVQ8G/xCAfhzHCH
 tLcQHZnnYUdSCi29JwCu2RKKI6ta2o68EYmi1wxjvMEs2z27KVltCu+5Y9mpOIwl
 DU+X8hAmepzd5aJxpR69hzM+dr71+0feUOklEU+r0DJzlmdDzTK8NyEPW2Ysh0aa
 0SsdtgoRWv+kq0FuzL26fpa01WOcWB8N5uDp4XUZKFCriRmBk2HNc1KRoPtvILm2
 2ISOqdIPe2/fC/FOYT0mKcFXaezkONpdbZoUqWGKFw0OWkgc5Vesl7L0LlilLCS6
 XpHP/ISJBDfmIwjgaDjZIV1G6PkyaKu4tPkn9NsN+ztbH02dpTodIB0qDJSimyPX
 cYwuDBn6NpKhk/l8b7lNMLxX8xK5EpQKw90av3LFYwZcGpyBzsoqciLONPSqF+1p
 jHDK4yaV+Ddwo3q5368x+eXX/Y9hfDewikoAuAUG3y0cMzuwKwFZ7MRkfl1Azbcc
 CLNeBd8YXBUnbzDMYaNB
 =EcrF
 -----END PGP SIGNATURE-----

Merge tag 'devprop-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull device properties framework updates from Rafael Wysocki:
 "These mostly rearrange the device properties core code and add a few
  helper functions to it as a foundation for future work.

  Specifics:

   - Rearrange the core device properties code by moving the code
     specific to each supported platform configuration framework (ACPI,
     DT and build-in) into a separate file (Sakari Ailus).

   - Add helper functions for accessing device properties in a
     firmware-agnostic way (Sakari Ailus, Kieran Bingham)"

* tag 'devprop-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  device property: Add fwnode_graph_get_port_parent
  device property: Add FW type agnostic fwnode_graph_get_remote_node
  device property: Introduce fwnode_device_is_available()
  device property: Move fwnode graph ops to firmware specific locations
  device property: Move FW type specific functionality to FW specific files
  ACPI: Constify argument to acpi_device_is_present()
2017-07-10 15:23:45 -07:00
Linus Torvalds
3226186843 More ACPI updates for v4.13-rc1
- Fix the ACPI code handling the SPCR table to check access
    width of MMIO regions and add a workaround for APM X-Gene
    8250 UART to use 32-bit MMIO accesses with its register
    (Loc Ho).
 
  - Fix two ACPI-based hotplug issues related to the handling of
    hot-remove failures on the OS side (Chun-Yi Lee).
 
  - Constify attribute_group structures in a few places (Arvind Yadav).
 
  - Make one local function static (Colin Ian King).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZY/M4AAoJEILEb/54YlRxwf4QAJph6HYzSUlOWdiY6ufGZ9Ff
 XmR0xUt6VAol6Kj8otZzlf3txBejn6ABMtdeneEulHYQKT3SXlQFcRz2rm9Ggp22
 i5B/Tdg1dHCRCxha4YZouXX8NnioKn/VsgXKRHoPrHaQZPNU6RNW8LS4UvsQbuxQ
 qx+7Le1lZN9bgMmaw/Hl1HW6QIVe4VCnl0fwYo/BgXCWEfnL+kotmCCywPivpSK4
 9t8uyFLHGLZwXcZsyibawl32X5mON/QNbUC4fKp35Z0xJ7Gl2/9+KhsUot6wXnFs
 BTfC5rJcI8CpWzlftPe9p5uZ2W4NGsLqW1osu8p4HXliY8vAACa3GqNa0fe5cU/g
 SOR+hxEO9DbwG+SuINxbmIYZxYwAT5TrQo76+jkmmbpQymtD52N0L24CWqU55FPn
 WAIP39FQUQqMVtYqxh29c2LbDvvzXH2BXk0O4RZl6sATInMESrX4FFp0hOOfJrtl
 2i27He05EZDAtM55i2/eNjmsG5j8eMux+30c1JsybNidM9ukvbztfO+Pt7oR+02c
 r975bzX3Uu8NObIDswzfVSnMpcdisdJ2grqJu2H8MTDgCt4ttBeMMpp8Y+i/k7qf
 yGyzA0rd+wuQenwH9BqZiWVSaWflK3OrvrrOPlYJcvTsmIGzj3nCTsD94rqqkD2x
 2i8BCl3IiX3JSy+lu6Xn
 =cdK6
 -----END PGP SIGNATURE-----

Merge tag 'acpi-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "These fix the ACPI SPCR table handling and add a workaround for APM
  X-Gene 8250 UART on top of that, fix two ACPI hotplug issues related
  to hot-remove failures, add a missing "static" to one function and
  constify some attribute_group structures.

  Specifics:

   - Fix the ACPI code handling the SPCR table to check access width of
     MMIO regions and add a workaround for APM X-Gene 8250 UART to use
     32-bit MMIO accesses with its register (Loc Ho).

   - Fix two ACPI-based hotplug issues related to the handling of
     hot-remove failures on the OS side (Chun-Yi Lee).

   - Constify attribute_group structures in a few places (Arvind Yadav).

   - Make one local function static (Colin Ian King)"

* tag 'acpi-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / DPTF: constify attribute_group structures
  ACPI / LPSS: constify attribute_group structures
  ACPI: BGRT: constify attribute_group structures
  ACPI / power: constify attribute_group structures
  ACPI / scan: Indicate to platform when hot remove returns busy
  ACPI / bus: handle ACPI hotplug schedule errors completely
  ACPI / osi: Make local function acpi_osi_dmi_linux() static
  ACPI: SPCR: Workaround for APM X-Gene 8250 UART 32-alignment errata
  ACPI: SPCR: Use access width to determine mmio usage
2017-07-10 15:19:40 -07:00
Linus Torvalds
5cdd4c0468 for-f2fs-4.13
In this round, we've added new features such as disk quota and statx, and
 modified internal bio management flow to merge more IOs depending on block
 types. We've also made internal threads freezeable for Android battery life.
 In addition to them, there are some patches to avoid lock contention as well
 as a couple of deadlock conditions.
 
 = Enhancement
 - support usrquota, grpquota, and statx
 - manage DATA/NODE typed bios separately to serialize more IOs
 - modify f2fs_lock_op/wio_mutex to avoid lock contention
 - prevent lock contention in migratepage
 
 = Bug fix
 - miss to load written inode flag
 - fix worst case victim selection in GC
 - freezeable GC and discard threads for Android battery life
 - sanitize f2fs metadata to deal with security hole
 - clean up sysfs-related code and docs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAllj6fMACgkQQBSofoJI
 UNJ6Ng/+PqdGV/b6KroYIXI/scFx/1t87/0W+rY9tyLr1jX7nIHn9KLPjeDdvdlk
 5vEeZ/dGfW8wSI+ESzscvKberG2QlOPwJRyTB4jWR+bLatwzg7YjEblz+RX4/wfJ
 jKjnR7M//gRdhHdqA0xXrqguAjPbcEDK2RiVbhioMjWbZ/77j0IjcRokjMYdEf0m
 cJc2oMXFtlo+DJ1h9/8BmwQPTI9FfVdgbkPFTTJzV0ydQnBdxcAigrzwYZhPOVv0
 n2M1dKOiQewB4OADMuepZLFqJheItlgG9wlvEjGq7zTd5epHXRIqhM6h9GikQVb9
 YKAkajlKfWcwEXaEcVXtsMHC9x69Yf8xxOSQ1VrhypSUNbaynC9LDsErJx6yrF3P
 XC5baiqXsd/btg7tfrHJjk3gI+ck97d6TrTfUVR91X+1Tpkz7cyB226WxFKbyOG3
 EYCFVMbrIN2CaHHt1xWIT2zCfX5w9ycp8kFjY6jPi0OOZrKXpFw+1AwwTu9kn4xJ
 iuUc8pmc0/FyPqokmLef4Qp/RRM83+f+nzW/y//lkEf3nMn6qlHzNI1RAxXnBvGV
 DMXzuJDcJcHGcSDr7mWyKkm6gYcak/E4DdQLQqJ6VCt6KCdCEXP/XDlig5ey5ODY
 uGEr1QhXIpiYAON45HUi3gmytB3J3ZdzzpsG1PEco4+hjSuFhyE=
 =N4GZ
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've added new features such as disk quota and statx,
  and modified internal bio management flow to merge more IOs depending
  on block types. We've also made internal threads freezeable for
  Android battery life. In addition to them, there are some patches to
  avoid lock contention as well as a couple of deadlock conditions.

  Enhancements:
   - support usrquota, grpquota, and statx
   - manage DATA/NODE typed bios separately to serialize more IOs
   - modify f2fs_lock_op/wio_mutex to avoid lock contention
   - prevent lock contention in migratepage

  Bug fixes:
   - fix missing load of written inode flag
   - fix worst case victim selection in GC
   - freezeable GC and discard threads for Android battery life
   - sanitize f2fs metadata to deal with security hole
   - clean up sysfs-related code and docs"

* tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits)
  f2fs: support plain user/group quota
  f2fs: avoid deadlock caused by lock order of page and lock_op
  f2fs: use spin_{,un}lock_irq{save,restore}
  f2fs: relax migratepage for atomic written page
  f2fs: don't count inode block in in-memory inode.i_blocks
  Revert "f2fs: fix to clean previous mount option when remount_fs"
  f2fs: do not set LOST_PINO for renamed dir
  f2fs: do not set LOST_PINO for newly created dir
  f2fs: skip ->writepages for {mete,node}_inode during recovery
  f2fs: introduce __check_sit_bitmap
  f2fs: stop gc/discard thread in prior during umount
  f2fs: introduce reserved_blocks in sysfs
  f2fs: avoid redundant f2fs_flush after remount
  f2fs: report # of free inodes more precisely
  f2fs: add ioctl to do gc with target block address
  f2fs: don't need to check encrypted inode for partial truncation
  f2fs: measure inode.i_blocks as generic filesystem
  f2fs: set CP_TRIMMED_FLAG correctly
  f2fs: require key for truncate(2) of encrypted file
  f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c
  ...
2017-07-10 14:29:45 -07:00
Rafael J. Wysocki
f19e80b394 Merge branches 'acpi-spcr', 'acpi-osi', 'acpi-bus', 'acpi-scan' and 'acpi-misc'
* acpi-spcr:
  ACPI: SPCR: Workaround for APM X-Gene 8250 UART 32-alignment errata
  ACPI: SPCR: Use access width to determine mmio usage

* acpi-osi:
  ACPI / osi: Make local function acpi_osi_dmi_linux() static

* acpi-bus:
  ACPI / bus: handle ACPI hotplug schedule errors completely

* acpi-scan:
  ACPI / scan: Indicate to platform when hot remove returns busy

* acpi-misc:
  ACPI / DPTF: constify attribute_group structures
  ACPI / LPSS: constify attribute_group structures
  ACPI: BGRT: constify attribute_group structures
  ACPI / power: constify attribute_group structures
2017-07-10 22:46:21 +02:00
Linus Torvalds
7cee9384cb Fix up over-eager 'wait_queue_t' renaming
Commit ac6424b981bc ("sched/wait: Rename wait_queue_t =>
wait_queue_entry_t") had scripted the renaming incorrectly, and didn't
actually check that the 'wait_queue_t' was a full token.

As a result, it also triggered on 'wait_queue_token', and renamed that
to 'wait_queue_entry_token' entry in the autofs4 packet structure
definition too.  That was entirely incorrect, and not intended.

The end result built fine when building just the kernel - because
everything had been renamed consistently there - but caused problems in
user space because the "struct autofs_packet_missing" type is exported
as part of the uapi.

This scripts it all back again:

    git grep -lw wait_queue_entry_token |
        xargs sed -i 's/wait_queue_entry_token/wait_queue_token/g'

and checks the end result.

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Fixes: ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 11:40:19 -07:00
Linus Torvalds
642338ba33 Changes for 4.13:
- Avoid quotacheck deadlocks
 - Fix transaction overflows when bunmapping fragmented files
 - Refactor directory readahead
 - Allow admin to configure if ASSERT is fatal
 - Improve transaction usage detail logging during overflows
 - Minor cleanups
 - Don't leak log items when the log shuts down
 - Remove double-underscore typedefs
 - Various preparation for online scrubbing
 - Introduce new error injection configuration sysfs knobs
 - Refactor dq_get_next to use extent map directly
 - Fix problems with iterating the page cache for unwritten data
 - Implement SEEK_{HOLE,DATA} via iomap
 - Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA
 - Don't use MAXPATHLEN to check on-disk symlink target lengths
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJZYDw4AAoJEPh/dxk0SrTr2IMP/3JLeygIDtKBBVRPvlCmEXQC
 j8w1C/ntn46zZKQ8l14fAFV4HV2d+KJWf8+yDuPuGdMXJfPeKZf95otYhnSx/9Th
 MvCH7Nzg63yjEGqXpBkfIVr/GT0KTx28lxiqNViChr7XiXWookgf3SSLINO+vU4J
 L2jgLqieJfijiHTBs4qGCQPDwSXVoSOi5XCCQWDYQrXz6DI5UEJc70U53WkH4tRu
 RctOgp1lralwEO0PhfomD3m/Gk94taE/4ZpX/j/5Y4tvH/yh5aY3/KTCLm6+mYT3
 rgMpmg5hmm+UiCTNoTnQ5RxzGZWCfI1I9FZ3HqDsbhmFtaWh32ti0dEEDYsF8Opj
 ARnTty3cRx41LH9dULrVWdwW105AHgwEz8/OZlG0JOca9qzj9GKERMg/hpHINAKN
 TrBlkweg86LWZDy23udZJ/v35svNqSFsqL1yV8j5dXyBi+Yi2CGfU27zbBwnj4Jk
 047l+OuRbBnEOUULqJTEVBY3euoclwl/yQrW2m409s7vPGkGQBLuFCsDKQdnvJ/A
 D7frZqH8XypwnhFOkKybUnBkn4P7vZ2sEuCIZMsrH5k/ys8XyEkaBaOurjvMBOKA
 vLIMD6RXDWrFbOoovfK/stEM6/UFoQkgMhBe7vB9EXk1AjM8NYyWZgp5BkHtytC7
 qa6GRjtGefhc67hbwXJd
 =/GZI
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS updates from Darrick Wong:
 "Here are some changes for you for 4.13. For the most part it's fixes
  for bugs and deadlock problems, and preparation for online fsck in
  some future merge window.

   - Avoid quotacheck deadlocks

   - Fix transaction overflows when bunmapping fragmented files

   - Refactor directory readahead

   - Allow admin to configure if ASSERT is fatal

   - Improve transaction usage detail logging during overflows

   - Minor cleanups

   - Don't leak log items when the log shuts down

   - Remove double-underscore typedefs

   - Various preparation for online scrubbing

   - Introduce new error injection configuration sysfs knobs

   - Refactor dq_get_next to use extent map directly

   - Fix problems with iterating the page cache for unwritten data

   - Implement SEEK_{HOLE,DATA} via iomap

   - Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA

   - Don't use MAXPATHLEN to check on-disk symlink target lengths"

* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
  xfs: don't crash on unexpected holes in dir/attr btrees
  xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
  xfs: fix contiguous dquot chunk iteration livelock
  xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
  vfs: Add iomap_seek_hole and iomap_seek_data helpers
  vfs: Add page_cache_seek_hole_data helper
  xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
  xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
  xfs: Check for m_errortag initialization in xfs_errortag_test
  xfs: grab dquots without taking the ilock
  xfs: fix semicolon.cocci warnings
  xfs: Don't clear SGID when inheriting ACLs
  xfs: free cowblocks and retry on buffered write ENOSPC
  xfs: replace log_badcrc_factor knob with error injection tag
  xfs: convert drop_writes to use the errortag mechanism
  xfs: remove unneeded parameter from XFS_TEST_ERROR
  xfs: expose errortag knobs via sysfs
  xfs: make errortag a per-mountpoint structure
  xfs: free uncommitted transactions during log recovery
  xfs: don't allow bmap on rt files
  ...
2017-07-10 10:51:53 -07:00
Jens Axboe
459bd0dc39 Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus
Pull followup NVMe (mostly) changes from Sagi:

I added the quiesce/unquiesce patches in here as it's
easy for me easily apply changes on top. It has accumulated
reviews and includes mostly nvme anyway, please tell me if
you don't want to take them with this.

This includes:
- quiesce/unquiesce fixes in nvme and others from me
- nvme-fc add create association padding spec updates from James
- some more quirking from MKP
- nvmet nit cleanup from Max
- Fix nvme-rdma racy RDMA completion signalling from Marta
- some centralization patches from me
- add tagset nr_hw_queues updates on controller resets in
  nvme drivers from me
- nvme-rdma fix resources recycling when doing error recovery from me
- minor cleanups in nvme-fc from me
2017-07-10 11:44:34 -06:00
Linus Torvalds
1d07b6cb96 Merge branch 'fix-uio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull copy*_iter fix from Al Viro.

[ Al used entirely the wrong return value. Oopsie. ]

* 'fix-uio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix brown paperbag bug in inlined copy_..._iter()
2017-07-10 10:13:45 -07:00
Linus Torvalds
a91ab911df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:

 - open/close tracking improvements from Dmitry Torokhov

 - battery support improvements in Wacom driver from Jason Gerecke

 - Win8 support fixes from Benjamin Tissories and Hans de Geode

 - misc fixes to Intel-ISH driver from Arnd Bergmann

 - support for quite a few new devices and small assorted fixes here and
   there

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (35 commits)
  HID: intel-ish-hid: Enable Gemini Lake ish driver
  HID: intel-ish-hid: Enable Cannon Lake ish driver
  HID: wacom: fix mistake in printk
  HID: multitouch: optimize the sticky fingers timer
  HID: multitouch: fix rare Win 8 cases when the touch up event gets missing
  HID: multitouch: use BIT macro
  HID: Add driver for Retrode2 joypad adapter
  HID: multitouch: Add support for Google Rose Touchpad
  HID: multitouch: Support PTP Stick and Touchpad device
  HID: core: don't use negative operands when shift
  HID: apple: Use country code to detect ISO keyboards
  HID: remove no longer used hid->open field
  greybus: hid: remove custom locking from gb_hid_open/close
  HID: usbhid: remove custom locking from usbhid_open/close
  HID: i2c-hid: remove custom locking from i2c_hid_open/close
  HID: serialize hid_hw_open and hid_hw_close
  HID: usbhid: do not rely on hid->open when deciding to do IO
  HID: hiddev: use hid_hw_power instead of usbhid_get/put_power
  HID: hiddev: use hid_hw_open/close instead of usbhid_open/close
  HID: asus: Add support for Zen AiO MD-5110 keyboard
  ...
2017-07-10 09:22:48 -07:00
Al Viro
c43aeb1980 fix brown paperbag bug in inlined copy_..._iter()
"copied nothing" == "return 0", not "return full size".

Fixes: aa28de275a24 "iov_iter/hardening: move object size checks to inlined part"
Spotted-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-10 07:40:49 -04:00
Jiri Kosina
837c194a4d Merge branches 'for-4.13/multitouch', 'for-4.13/retrode', 'for-4.13/transport-open-close-consolidation', 'for-4.13/upstream' and 'for-4.13/wacom' into for-linus 2017-07-10 11:11:25 +02:00
Jiri Kosina
604250ddcf Merge branches 'for-4.13/ish' and 'for-4.13/ite' into for-linus
Conflicts:
	drivers/hid/hid-core.c
2017-07-10 11:11:05 +02:00
James Smart
d1438ad8f3 nvme_fc/nvmet_fc: revise Create Association descriptor length
Revises the Create Association LS for the amount of pad expected in 1.16.

Add defines for the minimum lengths that a target can accept (e.g. variable
pad lengths)

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-07-10 09:09:57 +03:00
Linus Torvalds
af3c8d9850 main drm pull for v4.13
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZYseIAAoJEAx081l5xIa+85kP/0zKzKKVzZXSXG2TAGb5jNfk
 Ex+TELG8tWk9KBxA7lEE5c0WEsnP79cNoXZLQu8wlUzO8+kwQK5Bz0zgNUkpSuo1
 RthwdsxBQX1++UxB+HoSG+dOa7hkKVqlgQR3z9qyhsBXzetkJV0DoYcpMV0A1EWd
 6Jzt+AvCShVkcW+21LqHPlc5EIVewrDMoA3oU6aYCLhyAOUTVvvQB2ML8YApH7TM
 JrSrzCFHTrQEBbGUrZQhzR0sZzZzk9byntb/I/mdVbHeCyIHiL8sC4PfWSOyyazm
 GkPnA8G3aFAY9haBRz9jG/VBr1yVb0mCBjkWQ1lGfIAOCDDSc+d7PDXdG+i4AewK
 jZheXlrDIdGgmJLy4W3rdEqJvdf7UQHZOs8594OL19l4+FxCTrol1JSHSMeavCvr
 8bUNil9Jb/ONU/wmp+q55U0k4TCTyerUA7gKnuaJAwBvd4n78/PKmQnbrWinDyJc
 GQXp6zESk9bKt5DXSnVZuVf4POTzpuAsQkkfX1V2y145EHTQYfS3jLENWqEjyZUy
 QtKCHZvRkJfGaFU4Pr+vBo9Iu1GlA5OiOv08QadldTT4OxUI0T6yaLDobHCQfKPE
 sc3wCuCM+/dAnqoKDcGC4hAmF8zDdO0kw65P2m7uC6T9Jm1G35CioKbzo+fzUhuL
 fg5TBpbp2Wwe2oPA5iBm
 =2S5N
 -----END PGP SIGNATURE-----

Merge tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "This is the main pull request for the drm, I think I've got one later
  driver pull for mediatek SoC driver, I'm undecided on if it needs to
  go to you yet.

  Otherwise summary below:

  Core drm:
   - Atomic add driver private objects
   - Deprecate preclose hook in modern drivers
   - MST bandwidth tracking
   - Use kvmalloc in more places
   - Add mode_valid hook for crtc/encoder/bridge
   - Reduce sync_file construction time
   - Documentation updates
   - New DRM synchronisation object support

  New drivers:
   - pl111 - pl111 CLCD display controller

  Panel:
   - Innolux P079ZCA panel driver
   - Add NL12880B20-05, NL192108AC18-02D, P320HVN03 panels
   - panel-samsung-s6e3ha2: Add s6e3hf2 panel support

  i915:
   - SKL+ watermark fixes
   - G4x/G33 reset improvements
   - DP AUX backlight improvements
   - Buffer based GuC/host communication
   - New getparam for (sub)slice infomation
   - Cannonlake and Coffeelake initial patches
   - Execbuf optimisations

  radeon/amdgpu:
   - Lots of Vega10 bug fixes
   - Preliminary raven support
   - KIQ support for compute rings
   - MEC queue management rework
   - DCE6 Audio support
   - SR-IOV improvements
   - Better radeon/amdgpu selection support

  nouveau:
   - HDMI stereoscopic support
   - Display code rework for >= GM20x GPUs

  msm:
   - GEM rework for fine-grained locking
   - Per-process pagetable work
   - HDMI fixes for Snapdragon 820.

  vc4:
   - Remove 256MB CMA limit from vc4
   - Add out-fence support
   - Add support for cygnus
   - Get/set tiling ioctls support
   - Add T-format tiling support for scanout

  zte:
   - add VGA support.

  etnaviv:
   - Thermal throttle support for newer GPUs
   - Restore userspace buffer cache performance
   - dma-buf sync fix

  stm:
   - add stm32f429 display support

  exynos:
   - Rework vblank handling
   - Fixup sw-trigger code

  sun4i:
   - V3s display engine support
   - HDMI support for older SoCs
   - Preliminary work on dual-pipeline SoCs.

  rcar-du:
   - VSP work

  imx-drm:
   - Remove counter load enable from PRE
   - Double read/write reduction flag support

  tegra:
   - Documentation for the host1x and drm driver.
   - Lots of staging ioctl fixes due to grate project work.

  omapdrm:
   - dma-buf fence support
   - TILER rotation fixes"

* tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux: (1270 commits)
  drm: Remove unused drm_file parameter to drm_syncobj_replace_fence()
  drm/amd/powerplay: fix bug fail to remove sysfs when rmmod amdgpu.
  amdgpu: Set cik/si_support to 1 by default if radeon isn't built
  drm/amdgpu/gfx9: fix driver reload with KIQ
  drm/amdgpu/gfx8: fix driver reload with KIQ
  drm/amdgpu: Don't call amd_powerplay_destroy() if we don't have powerplay
  drm/ttm: Fix use-after-free in ttm_bo_clean_mm
  drm/amd/amdgpu: move get memory type function from early init to sw init
  drm/amdgpu/cgs: always set reference clock in mode_info
  drm/amdgpu: fix vblank_time when displays are off
  drm/amd/powerplay: power value format change for Vega10
  drm/amdgpu/gfx9: support the amdgpu.disable_cu option
  drm/amd/powerplay: change PPSMC_MSG_GetCurrPkgPwr for Vega10
  drm/amdgpu: Make amdgpu_cs_parser_init static (v2)
  drm/amdgpu/cs: fix a typo in a comment
  drm/amdgpu: Fix the exported always on CU bitmap
  drm/amdgpu/gfx9: gfx_v9_0_enable_gfx_static_mg_power_gating() can be static
  drm/amdgpu/psp: upper_32_bits/lower_32_bits for address setup
  drm/amd/powerplay/cz: print message if smc message fails
  drm/amdgpu: fix typo in amdgpu_debugfs_test_ib_init
  ...
2017-07-09 18:48:37 -07:00
Linus Torvalds
8d97a6c329 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers fixlet from Thomas Gleixner:
 "Add Frederic Weisbecker as NOHZ/dyntick maintainer"

[ And an unmentioned and unrelated typo fix in the same commit? Hmm.. ]

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add Frederic Weisbecker as nohz/dyntics maintainer
2017-07-09 11:18:59 -07:00
Linus Torvalds
4fde846ac0 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "This scheduler update provides:

   - The (hopefully) final fix for the vtime accounting issues which
     were around for quite some time

   - Use types known to user space in UAPI headers to unbreak user space
     builds

   - Make load balancing respect the current scheduling domain again
     instead of evaluating unrelated CPUs"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
  sched/fair: Fix load_balance() affinity redo path
  sched/cputime: Accumulate vtime on top of nsec clocksource
  sched/cputime: Move the vtime task fields to their own struct
  sched/cputime: Rename vtime fields
  sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime
  vtime, sched/cputime: Remove vtime_account_user()
  Revert "sched/cputime: Refactor the cputime_adjust() code"
2017-07-09 10:52:16 -07:00
Linus Torvalds
c3931a87db Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for perf and kprobes:

   - Add he missing exclude_kernel attribute for the precise_ip level so
     !CAP_SYS_ADMIN users get the proper results.

   - Warn instead of failing completely when perf has no unwind support
     for a particular architectiure built in.

   - Ensure that jprobes are at function entry and not at some random
     place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Ensure that jprobe probepoints are at function entry
  kprobes: Simplify register_jprobes()
  kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()
  perf unwind: Do not fail due to missing unwind support
  perf evsel: Set attr.exclude_kernel when probing max attr.precise_ip
2017-07-09 10:49:47 -07:00