50247 Commits

Author SHA1 Message Date
Linus Torvalds
ec3604c7a5 Writeback error handling fixes for v4.14
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZrTy3AAoJEAAOaEEZVoIVaucP/ApBAj2S5wzvlV1u6l8E6ae7
 ZeEEZfcWwzRYlKjZAkTWqj9XvGpDGO5gLq4wsZK2edFAq++/MJF8ZVtN4tdZ1kUZ
 DUvRodtVOrT08Kp9wZXGT7JOFrf6U/6gMcR6p0MuWnHndeKYvlpcFi9NPT4EC9/z
 Zm9V7gtlPdSOha7eaSjUS0+vLERkxqXLBW3Av9QUOBP/lbI3lqIroGKeHDYnVdya
 2P/k5EcRRJMyJP6TqyYxmmJl+UWjJFMLvnlUDBslHnD/u3mIUhw3JLHYBjn5dZRE
 Xjq56IDPoXDUvzlBhtn/Uqyx+/wtwsNsylpmKv6K5G1JfdeuSsPVsCey+A1cqV64
 LpE5896wf9TmnmI9LNyh6vDn925xPSGBiF45UEp5f9aO7jXeY0MaEZ8g+ENqFIDK
 v4gtZdS9FhYHV+/l4qEwYMKrqSbwKEs1r1FT+f4wnABby1ojfdA57ZPlp5PV2Vjp
 szTp88Zkb7cMvZwEnWwxWofcJNmgS7uNahvnQF3IJ4ITsioEkuyYR3K4ZQMaaaV9
 wCp6G0FhXZaK3OI7o9WiDwaO2elp9Hxc8bnqKpiBbHZkY0NLh7/++5VxpeNbTHFy
 AGijQiiKGNNyYqNj93wq9jpVdMNjB0pXrHRxfav8v7MtQ+WfbEoAENF4T7hN7iXn
 UuF6eSWEC5O1UCRUk1A+
 =LLY3
 -----END PGP SIGNATURE-----

Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull writeback error handling updates from Jeff Layton:
 "This pile continues the work from last cycle on better tracking
  writeback errors. In v4.13 we added some basic errseq_t infrastructure
  and converted a few filesystems to use it.

  This set continues refining that infrastructure, adds documentation,
  and converts most of the other filesystems to use it. The main
  exception at this point is the NFS client"

* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  ecryptfs: convert to file_write_and_wait in ->fsync
  mm: remove optimizations based on i_size in mapping writeback waits
  fs: convert a pile of fsync routines to errseq_t based reporting
  gfs2: convert to errseq_t based writeback error reporting for fsync
  fs: convert sync_file_range to use errseq_t based error-tracking
  mm: add file_fdatawait_range and file_write_and_wait
  fuse: convert to errseq_t based error tracking for fsync
  mm: consolidate dax / non-dax checks for writeback
  Documentation: add some docs for errseq_t
  errseq: rename __errseq_set to errseq_set
2017-09-06 14:11:03 -07:00
Linus Torvalds
066dea8c30 File locking related changes for v4.14
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZrTzyAAoJEAAOaEEZVoIVj8wP/0sOxG+7vEEpe4uj2W52aq9T
 Y39/ZLfRTLm9SqgH61lkN+IyUsvDx+IP1ws2LBhp0IDRD9m40wdILhHZRWXJcRW2
 ApEfmXF+rxnZZ6725ixX9w4Ylab2ZeGmKbzaG4wIjxfddftewZkJvFQcb1LZDfWq
 1N0SF4KWoWN6t26Du5CHmYSj/Sz6YGrWGhF22u3mNfkGL+MmuKbz+kB3W+0q2NUF
 ZjkOIH9WcRiXgSlcHPBLre2EKHqHaNgb0s4Iofd3ZEe50v1NwY/vBMefxuwRdgKS
 kpLhIKIYMawrHn2rpV0jm12qdgCYj+t2kbVIUBDn3unBP2zYA0e/oo5HNIrroVlk
 Q6aGwmW0LN60rpd5qcRuNS1p1h2id2HpxEe98dsski6T8CVnj/nvu7EIxmWM02cf
 g2HeOd7bnl3+uu7SwSTkOVb6G7Kjn+Xufiz/n11mK6fl2jvOyWZZmDqhhjWAYJ8r
 t5mQVWJdEV12+6+A1WSv9DeS3TUgdYPCF8dzDtF+JVn3WEmxYHywH36Y3hKKz+BA
 gFEhnHvlyaVvpXCr8Y5BqNSfEfvZe/YUnmVReHpgBU/U4GJ17iQYk/g2vfmPLmsN
 IZ2OGCrDUc/LfdWc4llRyQBvlGT1KujaT0tbN7xnuWcS2qWdsfX4jDtDUH9E6pvK
 TB6Sw4Ike0ixamG8N8q/
 =VPMU
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking updates from Jeff Layton:
 "This pile just has a few file locking fixes from Ben Coddington. There
  are a couple of cleanup patches + an attempt to bring sanity to the
  l_pid value that is reported back to userland on an F_GETLK request.

  After a few gyrations, he came up with a way for filesystems to
  communicate to the VFS layer code whether the pid should be translated
  according to the namespace or presented as-is to userland"

* tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: restore a warn for leaked locks on close
  fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks
  fs/locks: Use allocation rather than the stack in fcntl_getlk()
2017-09-06 13:43:26 -07:00
Linus Torvalds
c7f396f12f dlm for 4.14
This set includes a bunch of minor code cleanups that
 have accumulated, probably from code analyzers people
 like to run.  There is one nice fix that avoids some
 socket leaks by switching to use sock_create_lite().
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZrtP1AAoJEDgbc8f8gGmqH3YQALZouj0tzatxHfWlGcMHoufL
 M2wmQragG4qOI1w9vNKmo6GctQ+Teqholnt2gRHputxNLzPUXPzNgAR1/O7O8741
 TCB/fhR16KqMMP4bTa2GQ73WVFhohkh8xSvtcWkhCqC+Ti/qx2FN7LZ7Mxn6Muje
 IC7E+Oy2Xr64lUb1CsfpXXel8vs+ujoMIAZiU4P/PgCzYX5FvaFWJ9VCwgYzfIuN
 zj2O1txau5xW2fZmD5GRmgWY/g5wCPcPxwCdZacqrL7yNiU1wsrhYFds0AiGSPJC
 D/wMX9a0GN28L+zW0eLEVI+lIk8f5Az+DOrw5UUFNwDd4ejDWaS0dtMNThYu6VvD
 x6+JZhgZHcj3Df/s4PMZvPkCx+8ZeRGK9RK+jlkEVfO8aIE39gi6mC+EuTJmZe/m
 PAB7O2OG0FTUPoY+t/5wKaz1g6qSHQ2fQZb8rAMoUFWwFJWXp3q7/tlZN4dlwIDI
 2yp9UN09ug3tICcne/gvmJ5x8lVN3Eh6XHkbO1qedsv45SYKdOwPmvyp2XTZooJK
 kg827Z+deRmvTfX3gzEsEO1caabtYDOrZ23RHJxqViNdZbMn3Tifc2pBZCIJHfmu
 4My+Midl38Ch9SUx30ePwjyJ9+Ptsm7KiSOIvrtRZV/1bPEqRP03suxEZvmCA/z4
 elMj1gKykj/GHXZ+cHlX
 =lGzS
 -----END PGP SIGNATURE-----

Merge tag 'dlm-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "This set includes a bunch of minor code cleanups that have
  accumulated, probably from code analyzers people like to run. There is
  one nice fix that avoids some socket leaks by switching to use
  sock_create_lite()"

* tag 'dlm-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: use sock_create_lite inside tcp_accept_from_sock
  uapi linux/dlm_netlink.h: include linux/dlmconstants.h
  dlm: avoid double-free on error path in dlm_device_{register,unregister}
  dlm: constify kset_uevent_ops structure
  dlm: print log message when cluster name is not set
  dlm: Delete an unnecessary variable initialisation in dlm_ls_start()
  dlm: Improve a size determination in two functions
  dlm: Use kcalloc() in two functions
  dlm: Use kmalloc_array() in make_member_array()
  dlm: Delete an error message for a failed memory allocation in dlm_recover_waiters_pre()
  dlm: Improve a size determination in dlm_recover_waiters_pre()
  dlm: Use kcalloc() in dlm_scan_waiters()
  dlm: Improve a size determination in table_seq_start()
  dlm: Add spaces for better code readability
  dlm: Replace six seq_puts() calls by seq_putc()
  dlm: Make dismatch error message more clear
  dlm: Fix kernel memory disclosure
2017-09-06 13:39:23 -07:00
Linus Torvalds
be6297e9be Scalability improvements when allocating inodes, and some
miscellaneous bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlms4KwACgkQ8vlZVpUN
 gaMsNwf9EDIaB7uMAFIRw8TzszbYKE3K3T412s18zce+kYea6wZPkQavWH/qMYgU
 r8jmXfDi3KZJJBI8fFW4qh36qs/fJoOeQOD3BTHcczEJqaaOiahaxfSylTwezDfw
 fIkEMCfBj6Vyldo0aKrtM4iU07Njj7QmYBtsiJo1kpyAuZ7wuoaiyizCLRb0fhLB
 AnBOs2ur9fQvn954M03tJIKpxFgmbpofM7yMtJYpW9dHCCWe2G+sIdBy/W6vVTxt
 sJIzUGyyEFs9Hr8U8THbuo5XgScFh+NEOkj/cf60t5ZYxwZzJvY7+Oj7BQwxEM1p
 Efcjz2kRLU8qSYHOHxCar0D3MW+oNQ==
 =E7gL
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Scalability improvements when allocating inodes, and some
  miscellaneous bug fixes and cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: avoid Y2038 overflow in recently_deleted()
  ext4: fix fault handling when mounted with -o dax,ro
  ext4: fix quota inconsistency during orphan cleanup for read-only mounts
  ext4: fix incorrect quotaoff if the quota feature is enabled
  ext4: remove useless test and assignment in strtohash functions
  ext4: backward compatibility support for Lustre ea_inode implementation
  ext4: remove timebomb in ext4_decode_extra_time()
  ext4: use sizeof(*ptr)
  ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
  ext4: reduce lock contention in __ext4_new_inode
  ext4: cleanup goto next group
  ext4: do not unnecessarily allocate buffer in recently_deleted()
2017-09-06 12:59:41 -07:00
Linus Torvalds
5791577963 Updates for 4.14:
- Write unmount record for a ro mount to avoid unnecessary log replay
 - Clean up orphaned inodes when mounting fs readonly
 - Resubmit inode log items when buffer writeback fails to avoid umount hang
 - Fix log recovery corruption problems when log headers wrap around the end
 - Avoid infinite loop searching for free inodes when inode counters are wrong
 - Evict inodes involved with log redo so that we don't leak them later
 - Fix a potential race between reclaim and inode cluster freeing
 - Refactor the inode joining code w.r.t. transaction rolling & deferred ops
 - Fix a bug where the log doesn't properly deal with dirty buffers that
   are about to become ordered buffers
 - Fix the extent swap code to deal with making dirty buffers ordered properly
 - Consolidate page fault handlers
 - Refactor the incore extent manipulation functions to use the iext
   abstractions instead of directly modifying with extent data
 - Disable crashy chattr +/-x until we fix it
 - Don't allow us to set S_DAX for v2 inodes
 - Various cleanups
 - Clarify some documentation
 - Fix a problem where fsync and a log commit race to send the disk a
   flush command, resulting in a small window where power fail data loss
   could occur
 - Simplify some rmap operations in the fcollapse code
 - Fix some use-after-free problems in async writeback
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJZrEAQAAoJEPh/dxk0SrTrxKEP/3y8sLWdy4fUdPpVkwZteXwc
 zGyYaLrmKRc5i6abBNtLCZoRJGfRdvVyPrhQ1q3mt8H//xuURgqgFFyjj3wAdsLf
 sDejIHhdsc8/VcuLLtCW3rEYg58hJ89hW7d1InCP0tvqWmljh9svhzXebtwUvNNF
 /2fHIUXUiAxLbgjv/N2i/smlLl0zdx6C2x1TlJmfwer0UMTAnlmbFWxCqmtUZwSl
 QSuGgn1wo3dkId9aFoNwQmSCFeYcxQlpaInJEzUiVQOA4dbphXHO9Bsx0eOkpDuz
 39waaX0fld8LEfIQGmUQ995UkAwfk/asjgDSApyXdkMayNWhi0KpRl1zXgCb8BbL
 m7vYJhIfJ399+jbNPe1+htn3I16AmpvAai9MNJidFclWwqFEuQEnxZccdtTIAiRv
 XuYiq9hN2NOwlwPUYfrZxfx34fdocRyHmGVs3i7P3/qPWd5Hx6+FpQTOngciS7MN
 6xnM8PbnrLadw3ooMDEKgWsN805BQALiwzDRggoAXG1Pm2SqFnLD/dAR4c7R3nR8
 vvYlfGHnd38aMlW73IALkkGJqZy/bHPFhrbvpjXyIG6SYwCjrWrO0chM0O8MCRrF
 MIW3rM5hYIE8aCkpJ2mxvcQalmSAlSPVKlmgvSK4S1Sz4kcywxskNhch8uNkb5uy
 WUHhrJz+wBjdjrDOU3aL
 =jBdo
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS updates from Darrick Wong:
 "Here are the changes for xfs for 4.14. Most of these are cleanups and
  fixes for bad behavior, as we're mostly focusing on improving
  reliablity this cycle (read: there's potentially a lot of stuff on the
  horizon for 4.15 so better to spend a few weeks killing other bugs
  now).

  Summary:

   - Write unmount record for a ro mount to avoid unnecessary log replay

   - Clean up orphaned inodes when mounting fs readonly

   - Resubmit inode log items when buffer writeback fails to avoid
     umount hang

   - Fix log recovery corruption problems when log headers wrap around
     the end

   - Avoid infinite loop searching for free inodes when inode counters
     are wrong

   - Evict inodes involved with log redo so that we don't leak them
     later

   - Fix a potential race between reclaim and inode cluster freeing

   - Refactor the inode joining code w.r.t. transaction rolling &
     deferred ops

   - Fix a bug where the log doesn't properly deal with dirty buffers
     that are about to become ordered buffers

   - Fix the extent swap code to deal with making dirty buffers ordered
     properly

   - Consolidate page fault handlers

   - Refactor the incore extent manipulation functions to use the iext
     abstractions instead of directly modifying with extent data

   - Disable crashy chattr +/-x until we fix it

   - Don't allow us to set S_DAX for v2 inodes

   - Various cleanups

   - Clarify some documentation

   - Fix a problem where fsync and a log commit race to send the disk a
     flush command, resulting in a small window where power fail data
     loss could occur

   - Simplify some rmap operations in the fcollapse code

   - Fix some use-after-free problems in async writeback"

* tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
  xfs: use kmem_free to free return value of kmem_zalloc
  xfs: open code end_buffer_async_write in xfs_finish_page_writeback
  xfs: don't set v3 xflags for v2 inodes
  xfs: fix compiler warnings
  fsmap: fix documentation of FMR_OF_LAST
  xfs: simplify the rmap code in xfs_bmse_merge
  xfs: remove unused flags arg from xfs_file_iomap_begin_delay
  xfs: fix incorrect log_flushed on fsync
  xfs: disable per-inode DAX flag
  xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves
  xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent
  xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at
  xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents
  xfs: move some code around inside xfs_bmap_shift_extents
  xfs: use xfs_iext_get_extent in xfs_bmap_first_unused
  xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert
  xfs: add a xfs_iext_update_extent helper
  xfs: consolidate the various page fault handlers
  iomap: return VM_FAULT_* codes from iomap_page_mkwrite
  xfs: relog dirty buffers during swapext bmbt owner change
  ...
2017-09-06 12:19:23 -07:00
Linus Torvalds
77d0ab600a We've got a whopping 29 GFS2 patches for this merge window, mainly
because we held some back from the previous merge window until we
 could get them perfected and well tested. We have a couple patch
 sets, including my patch set for protecting glock gl_object and
 Andreas Gruenbacher's patch set to fix the long-standing shrink-
 slab hang, plus a bunch of assorted bugs and cleanups:
 
 1. I fixed a bug whereby an IO error would lead to a double-brelse.
 2. Andreas Gruenbacher made a minor cleanup to call his relatively
    new function, gfs2_holder_initialized, rather than doing it
    manually. This was just missed by a previous patch set.
 3. Jan Kara fixed a bug whereby the SGID was being cleared when
    inheriting ACLs.
 4. Andreas found a bug and fixed it in his previous patch,
    "Get rid of flush_delayed_work in gfs2_evict_inode". A call to
    flush_delayed_work was deleted from *gfs2_inode_lookup and added
    to gfs2_create_inode.
 5. Wang Xibo found and fixed a list_add call in inode_go_lock
    that specified the parameters in the wrong order.
 6. Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
    metadata reads that were accidentally missing them.
 7 - 10. I submitted a 4-patch set to protect the glock gl_object
    field. GFS2 was setting and checking gl_object with no locking
    mechanism, so the value was occasionally stomped on, which caused
    file system corruption.
 11. I submitted a small cleanup to function gfs2_clear_rgrpd.
    It was needlessly adding rgrp glocks to the lru list, then pulling
    them back off immediately. The rgrp glocks don't use the lru list
    anyway, so doing so was just a waste of time.
 12. I submitted a patch that checks the GLOF_LRU flag on a glock
    before trying to remove it from the lru_list. This avoids a lot
    of unnecessary spin_lock contention.
 13. I submitted a patch to delete GFS2's debugfs files only after
    we evict all the glocks. Before this patch, GFS2 would delete the
    debugfs files, and if unmount hung waiting for a glock, there was
    no way to debug the problem. Now, if a hang occurs during umount,
    we can examine the debugfs files to figure out why it's hung.
 14. Andreas Gruenbacher submitted a patch to fix some trivial typos.
 15 - 19. Andreas also submitted a five-part patch set to fix the
    longstanding hang involving the slab shrinker: dlm requires
    memory, calls the inode shrinker, which calls gfs2's evict, which
    calls back into DLM before it can evict an inode.
 20. Abhi Das submitted a patch to forcibly flush the active items
    list to relieve memory pressure. This fixes a long-standing bug
    whereby GFS2 was getting hung permanently in balance_dirty_pages.
 21. Thomas Tai submitted a patch to fix a slab corruption problem
    due to a residual pointer left in the lock_dlm lockstruct.
 22. I submitted a patch to withdraw the file system if IO errors
    are encountered while writing to the journals or statfs system
    file which were previously not being sent back up. Before, some
    IO errors were sometimes not be detected for several hours, and
    at recovery time, the journal errors made journal replay
    impossible.
 23. Andreas has a patch to fix an annoying format-truncation compiler
    warning so GFS2 compiles cleanly.
 24. I have a patch that fixes a handful of sparse compiler warnings.
 25. Andreas fixed up an useless gl_object warning caused by an
    earlier patch.
 26. Arvind Yadav added a patch to properly constify our rhashtable
    params declare.
 27. I added a patch to fix a regression caused by the non-recursive
    delete and truncate patch that caused file system blocks to not
    be properly freed.
 28. Ernesto A. Fernández added a patch to fix a place where GFS2
    would send back the wrong return code setting extended attributes.
 29. Ernesto also added a patch to fix a case in which GFS2 was
    improperly setting an inode's i_mode, potentially granting access
    to the wrong users.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZrMC2AAoJENeLYdPf93o7PIQIAKY4hdC2pMM5tiiIHx5fPAAr
 tjpVuFkDQzyEaTb9sArVLxEdva3ShKERQKoYq/VVxqbAEwPgXbzJFNNil1WTJi1t
 J2gE4wE4G5x1+A7XDzCdPI8KAcF+yX63AaFYlVKyuZSq5w7njIRc1Vk+TFiIexxC
 xb0nP0g9L6Zt114rE8kfi0/GLjTO9vOKM3XsJgG612I3/cs3RUx4gJ+nSUG0bYLA
 qoBIXEJ3SFHw2Zr/LgHZ9QDHnlPVl3bjg03sRQaWZms7XbLegDBYsDSvS1HLZ300
 gjTc0Dgz/6KwzDVJ7cZ/fPNYtIFY58tKs6aqqDTrCncsX9nPjcTAxYkBNWsFyZM=
 =tXJ8
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got a whopping 29 GFS2 patches for this merge window, mainly
  because we held some back from the previous merge window until we
  could get them perfected and well tested. We have a couple patch sets,
  including my patch set for protecting glock gl_object and Andreas
  Gruenbacher's patch set to fix the long-standing shrink- slab hang,
  plus a bunch of assorted bugs and cleanups.

  Summary:

   - I fixed a bug whereby an IO error would lead to a double-brelse.

   - Andreas Gruenbacher made a minor cleanup to call his relatively new
     function, gfs2_holder_initialized, rather than doing it manually.
     This was just missed by a previous patch set.

   - Jan Kara fixed a bug whereby the SGID was being cleared when
     inheriting ACLs.

   - Andreas found a bug and fixed it in his previous patch, "Get rid of
     flush_delayed_work in gfs2_evict_inode". A call to
     flush_delayed_work was deleted from *gfs2_inode_lookup and added to
     gfs2_create_inode.

   - Wang Xibo found and fixed a list_add call in inode_go_lock that
     specified the parameters in the wrong order.

   - Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
     metadata reads that were accidentally missing them.

   - I submitted a 4-patch set to protect the glock gl_object field.
     GFS2 was setting and checking gl_object with no locking mechanism,
     so the value was occasionally stomped on, which caused file system
     corruption.

   - I submitted a small cleanup to function gfs2_clear_rgrpd. It was
     needlessly adding rgrp glocks to the lru list, then pulling them
     back off immediately. The rgrp glocks don't use the lru list
     anyway, so doing so was just a waste of time.

   - I submitted a patch that checks the GLOF_LRU flag on a glock before
     trying to remove it from the lru_list. This avoids a lot of
     unnecessary spin_lock contention.

   - I submitted a patch to delete GFS2's debugfs files only after we
     evict all the glocks. Before this patch, GFS2 would delete the
     debugfs files, and if unmount hung waiting for a glock, there was
     no way to debug the problem. Now, if a hang occurs during umount,
     we can examine the debugfs files to figure out why it's hung.

   - Andreas Gruenbacher submitted a patch to fix some trivial typos.

   - Andreas also submitted a five-part patch set to fix the
     longstanding hang involving the slab shrinker: dlm requires memory,
     calls the inode shrinker, which calls gfs2's evict, which calls
     back into DLM before it can evict an inode.

   - Abhi Das submitted a patch to forcibly flush the active items list
     to relieve memory pressure. This fixes a long-standing bug whereby
     GFS2 was getting hung permanently in balance_dirty_pages.

   - Thomas Tai submitted a patch to fix a slab corruption problem due
     to a residual pointer left in the lock_dlm lockstruct.

   - I submitted a patch to withdraw the file system if IO errors are
     encountered while writing to the journals or statfs system file
     which were previously not being sent back up. Before, some IO
     errors were sometimes not be detected for several hours, and at
     recovery time, the journal errors made journal replay impossible.

   - Andreas has a patch to fix an annoying format-truncation compiler
     warning so GFS2 compiles cleanly.

   - I have a patch that fixes a handful of sparse compiler warnings.

   - Andreas fixed up an useless gl_object warning caused by an earlier
     patch.

   - Arvind Yadav added a patch to properly constify our rhashtable
     params declare.

   - I added a patch to fix a regression caused by the non-recursive
     delete and truncate patch that caused file system blocks to not be
     properly freed.

   - Ernesto A. Fernández added a patch to fix a place where GFS2 would
     send back the wrong return code setting extended attributes.

   - Ernesto also added a patch to fix a case in which GFS2 was
     improperly setting an inode's i_mode, potentially granting access
     to the wrong users"

* tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits)
  gfs2: preserve i_mode if __gfs2_set_acl() fails
  gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing
  GFS2: Fix non-recursive truncate bug
  gfs2: constify rhashtable_params
  GFS2: Fix gl_object warnings
  GFS2: Fix up some sparse warnings
  gfs2: Silence gcc format-truncation warning
  GFS2: Withdraw for IO errors writing to the journal or statfs
  gfs2: fix slab corruption during mounting and umounting gfs file system
  gfs2: forcibly flush ail to relieve memory pressure
  gfs2: Clean up waiting on glocks
  gfs2: Defer deleting inodes under memory pressure
  gfs2: gfs2_evict_inode: Put glocks asynchronously
  gfs2: Get rid of gfs2_set_nlink
  gfs2: gfs2_glock_get: Wait on freeing glocks
  gfs2: Fix trivial typos
  GFS2: Delete debugfs files only after we evict the glocks
  GFS2: Don't waste time locking lru_lock for non-lru glocks
  GFS2: Don't bother trying to add rgrps to the lru list
  GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
  ...
2017-09-06 11:42:31 -07:00
Linus Torvalds
bafb0762cb Char/Misc drivers for 4.14-rc1
Here is the big char/misc driver update for 4.14-rc1.
 
 Lots of different stuff in here, it's been an active development cycle
 for some reason.  Highlights are:
   - updated binder driver, this brings binder up to date with what
     shipped in the Android O release, plus some more changes that
     happened since then that are in the Android development trees.
   - coresight updates and fixes
   - mux driver file renames to be a bit "nicer"
   - intel_th driver updates
   - normal set of hyper-v updates and changes
   - small fpga subsystem and driver updates
   - lots of const code changes all over the driver trees
   - extcon driver updates
   - fmc driver subsystem upadates
   - w1 subsystem minor reworks and new features and drivers added
   - spmi driver updates
 
 Plus a smattering of other minor driver updates and fixes.
 
 All of these have been in linux-next with no reported issues for a
 while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWa1+Ew8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yl26wCgquufNylfhxr65NbJrovduJYzRnUAniCivXg8
 bePIh/JI5WxWoHK+wEbY
 =hYWx
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc driver update for 4.14-rc1.

  Lots of different stuff in here, it's been an active development cycle
  for some reason. Highlights are:

   - updated binder driver, this brings binder up to date with what
     shipped in the Android O release, plus some more changes that
     happened since then that are in the Android development trees.

   - coresight updates and fixes

   - mux driver file renames to be a bit "nicer"

   - intel_th driver updates

   - normal set of hyper-v updates and changes

   - small fpga subsystem and driver updates

   - lots of const code changes all over the driver trees

   - extcon driver updates

   - fmc driver subsystem upadates

   - w1 subsystem minor reworks and new features and drivers added

   - spmi driver updates

  Plus a smattering of other minor driver updates and fixes.

  All of these have been in linux-next with no reported issues for a
  while"

* tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits)
  ANDROID: binder: don't queue async transactions to thread.
  ANDROID: binder: don't enqueue death notifications to thread todo.
  ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
  ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl
  ANDROID: binder: push new transactions to waiting threads.
  ANDROID: binder: remove proc waitqueue
  android: binder: Add page usage in binder stats
  android: binder: fixup crash introduced by moving buffer hdr
  drivers: w1: add hwmon temp support for w1_therm
  drivers: w1: refactor w1_slave_show to make the temp reading functionality separate
  drivers: w1: add hwmon support structures
  eeprom: idt_89hpesx: Support both ACPI and OF probing
  mcb: Fix an error handling path in 'chameleon_parse_cells()'
  MCB: add support for SC31 to mcb-lpc
  mux: make device_type const
  char: virtio: constify attribute_group structures.
  Documentation/ABI: document the nvmem sysfs files
  lkdtm: fix spelling mistake: "incremeted" -> "incremented"
  perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file
  nvmem: include linux/err.h from header
  ...
2017-09-05 11:08:17 -07:00
Linus Torvalds
44b1671fae Driver core update for 4.14-rc1
Here is the "big" driver core update for 4.14-rc1.
 
 It's really not all that big, the largest thing here being some firmware
 tests to help ensure that that crazy api is working properly.
 
 There's also a new uevent for when a driver is bound or unbound from a
 device, fixing a hole in the driver model that's been there since the
 very beginning.  Many thanks to Dmitry for being persistent and pointing
 out how wrong I was about this all along :)
 
 Patches for the new uevents are already in the systemd tree, if people
 want to play around with them.
 
 Otherwise just a number of other small api changes and updates here,
 nothing major.  All of these patches have been in linux-next for a
 while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWa1/IQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn8jACfdQg+YXGxTExonxnyiWgoDMMSO2gAn1ETOaak
 itLO5ll4b6EQ0r3pU27d
 =pCYl
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here is the "big" driver core update for 4.14-rc1.

  It's really not all that big, the largest thing here being some
  firmware tests to help ensure that that crazy api is working properly.

  There's also a new uevent for when a driver is bound or unbound from a
  device, fixing a hole in the driver model that's been there since the
  very beginning. Many thanks to Dmitry for being persistent and
  pointing out how wrong I was about this all along :)

  Patches for the new uevents are already in the systemd tree, if people
  want to play around with them.

  Otherwise just a number of other small api changes and updates here,
  nothing major. All of these patches have been in linux-next for a
  while with no reported issues"

* tag 'driver-core-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits)
  driver core: bus: Fix a potential double free
  Do not disable driver and bus shutdown hook when class shutdown hook is set.
  base: topology: constify attribute_group structures.
  base: Convert to using %pOF instead of full_name
  kernfs: Clarify lockdep name for kn->count
  fbdev: uvesafb: remove DRIVER_ATTR() usage
  xen: xen-pciback: remove DRIVER_ATTR() usage
  driver core: Document struct device:dma_ops
  mod_devicetable: Remove excess description from structured comment
  test_firmware: add batched firmware tests
  firmware: enable a debug print for batched requests
  firmware: define pr_fmt
  firmware: send -EINTR on signal abort on fallback mechanism
  test_firmware: add test case for SIGCHLD on sync fallback
  initcall_debug: add deferred probe times
  Input: axp20x-pek - switch to using devm_device_add_group()
  Input: synaptics_rmi4 - use devm_device_add_group() for attributes in F01
  Input: gpio_keys - use devm_device_add_group() for attributes
  driver core: add devm_device_add_group() and friends
  driver core: add device_{add|remove}_group() helpers
  ...
2017-09-05 10:41:21 -07:00
Linus Torvalds
5f82e71a00 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Add 'cross-release' support to lockdep, which allows APIs like
   completions, where it's not the 'owner' who releases the lock, to be
   tracked. It's all activated automatically under
   CONFIG_PROVE_LOCKING=y.

 - Clean up (restructure) the x86 atomics op implementation to be more
   readable, in preparation of KASAN annotations. (Dmitry Vyukov)

 - Fix static keys (Paolo Bonzini)

 - Add killable versions of down_read() et al (Kirill Tkhai)

 - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

 - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

 - Remove smp_mb__before_spinlock() and convert its usages, introduce
   smp_mb__after_spinlock() (Peter Zijlstra)

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  locking/lockdep/selftests: Fix mixed read-write ABBA tests
  sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
  acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
  locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
  smp: Avoid using two cache lines for struct call_single_data
  locking/lockdep: Untangle xhlock history save/restore from task independence
  locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
  futex: Remove duplicated code and fix undefined behaviour
  Documentation/locking/atomic: Finish the document...
  locking/lockdep: Fix workqueue crossrelease annotation
  workqueue/lockdep: 'Fix' flush_work() annotation
  locking/lockdep/selftests: Add mixed read-write ABBA tests
  mm, locking/barriers: Clarify tlb_flush_pending() barriers
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
  locking/lockdep: Explicitly initialize wq_barrier::done::map
  locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
  locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
  locking/refcounts, x86/asm: Implement fast refcount overflow protection
  locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
  ...
2017-09-04 11:52:29 -07:00
Linus Torvalds
f213a6c84c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - fix affine wakeups (Peter Zijlstra)

   - improve CPU onlining (and general bootup) scalability on systems
     with ridiculous number (thousands) of CPUs (Peter Zijlstra)

   - sched/numa updates (Rik van Riel)

   - sched/deadline updates (Byungchul Park)

   - sched/cpufreq enhancements and related cleanups (Viresh Kumar)

   - sched/debug enhancements (Xie XiuQi)

   - various fixes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  sched/debug: Optimize sched_domain sysctl generation
  sched/topology: Avoid pointless rebuild
  sched/topology, cpuset: Avoid spurious/wrong domain rebuilds
  sched/topology: Improve comments
  sched/topology: Fix memory leak in __sdt_alloc()
  sched/completion: Document that reinit_completion() must be called after complete_all()
  sched/autogroup: Fix error reporting printk text in autogroup_create()
  sched/fair: Fix wake_affine() for !NUMA_BALANCING
  sched/debug: Intruduce task_state_to_char() helper function
  sched/debug: Show task state in /proc/sched_debug
  sched/debug: Use task_pid_nr_ns in /proc/$pid/sched
  sched/core: Remove unnecessary initialization init_idle_bootup_task()
  sched/deadline: Change return value of cpudl_find()
  sched/deadline: Make find_later_rq() choose a closer CPU in topology
  sched/numa: Scale scan period with tasks in group and shared/private
  sched/numa: Slow down scan rate if shared faults dominate
  sched/pelt: Fix false running accounting
  sched: Mark pick_next_task_dl() and build_sched_domain() as static
  sched/cpupri: Don't re-initialize 'struct cpupri'
  sched/deadline: Don't re-initialize 'struct cpudl'
  ...
2017-09-04 09:10:24 -07:00
Ingo Molnar
edc2988c54 Merge branch 'linus' into locking/core, to fix up conflicts
Conflicts:
	mm/page_alloc.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-04 11:01:18 +02:00
Linus Torvalds
69c0067aa3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc fixes from Al Viro:
 "Loose ends and regressions from the last merge window.

  Strictly speaking, only binfmt_flat thing is a build regression per
  se - the rest is 'only sparse cares about that' stuff"

[ This came in before the 4.13 release and could have gone there, but it
  was late in the release and nothing seemed critical enough to care, so
  I'm pulling it in the 4.14 merge window instead  - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  binfmt_flat: fix arch/m32r and arch/microblaze flat_put_addr_at_rp()
  compat_hdio_ioctl: Fix a declaration
  <linux/uaccess.h>: Fix copy_in_user() declaration
  annotate RWF_... flags
  teach SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE to handle __bitwise arguments
2017-09-03 16:09:03 -07:00
Pan Bian
6c370590cf xfs: use kmem_free to free return value of kmem_zalloc
In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-03 10:40:46 -07:00
Christoph Hellwig
8353a814f2 xfs: open code end_buffer_async_write in xfs_finish_page_writeback
Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.

Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-03 10:40:45 -07:00
Christoph Hellwig
dd60687ee5 xfs: don't set v3 xflags for v2 inodes
Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-02 08:22:19 -07:00
Darrick J. Wong
7bf7a193a9 xfs: fix compiler warnings
Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-09-02 08:22:19 -07:00
Linus Torvalds
d0d6ab53c9 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs version warning fix from Steve French:
 "As requested, additional kernel warning messages to clarify the
  default dialect changes"

[ There is still some discussion about exactly which version should be
  the new default.  Longer-term we have auto-negotiation coming, but
  that's not there yet..  - Linus ]

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  Fix warning messages when mounting to older servers
2017-09-01 20:57:27 -07:00
Darrick J. Wong
4cc1ee5e65 xfs: simplify the rmap code in xfs_bmse_merge
In Christoph's patch to refactor xfs_bmse_merge, the updated rmap code
does more work than it needs to (because map-extent auto-merges
records).  Remove the unnecessary unmap and save ourselves a deferred
op.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-09-01 13:08:26 -07:00
Eric Sandeen
f91fb956f2 xfs: remove unused flags arg from xfs_file_iomap_begin_delay
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Amir Goldstein
47c7d0b195 xfs: fix incorrect log_flushed on fsync
When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig
742d842907 xfs: disable per-inode DAX flag
Currently flag switching can be used to easily crash the kernel.  Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig
8bfadd8d03 xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves
Use the existing functionality instead of directly poking into the extent
list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig
e17a5c6f0e xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent
This avoids poking into the internals of the extent list.  Also return
the number of extents as the return value instead of an additional
by reference argument, and make it available to callers outside of
xfs_bmap_util.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig
4c35445b59 xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at
This abstracts the function away from details of the low-level extent
list implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:25 -07:00
Christoph Hellwig
4da6b514ea xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents
This abstracts the function away from details of the low-level extent
list implementation.

Note that it seems like the previous implementation of rmap for
the merge case was completely broken, but it no seems appear to
trigger that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:25 -07:00
Christoph Hellwig
05b7c8ab2b xfs: move some code around inside xfs_bmap_shift_extents
For the first right move we need to look up next_fsb.  That means
our last fsb that contains next_fsb must also be the current extent,
so take advantage of that by moving the code around a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:25 -07:00
Oleg Nesterov
138e4ad67a epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
The race was introduced by me in commit 971316f0503a ("epoll:
ep_unregister_pollwait() can use the freed pwq->whead").  I did not
realize that nothing can protect eventpoll after ep_poll_callback() sets
->whead = NULL, only whead->lock can save us from the race with
ep_free() or ep_remove().

Move ->whead = NULL to the end of ep_poll_callback() and add the
necessary barriers.

TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
before this patch.

Hopefully this explains use-after-free reported by syzcaller:

	BUG: KASAN: use-after-free in debug_spin_lock_before
	...
	 _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
	 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148

this is spin_lock(eventpoll->lock),

	...
	Freed by task 17774:
	...
	 kfree+0xe8/0x2c0 mm/slub.c:3883
	 ep_free+0x22c/0x2a0 fs/eventpoll.c:865

Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
Reported-by: 范龙飞 <long7573@126.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-01 13:07:35 -07:00
Linus Torvalds
b8a78bb4d1 ceph fscache page locking fix from Zheng, marked for stable.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJZqYGlAAoJEEp/3jgCEfOLx58H/1jnP79H03/kchVBCGPLCKjs
 E+pgHpb2922EGeYmEUoxfq627SiCODap/jo6JFVpsd+JnmHLZiMzmEzGpDce6fn9
 /YY5u3WNtmnKtyPvl0kzspK0ujaeCuiRyarULXBiHveL2ZQINKus4F9MiZphNnt4
 X4hgo866+esEf6LocuEkMoEGvgN7vk/Q9nDPgD/YoFrhCuwdvLJpBnE65CGbQyk5
 n3g0qlBR+yorDr1stdlSyVUDPkF5FQjhQTqkpi1oPAhsNPKgVPyZzRIEQEA+nI+N
 wTsQ0SMKfST4PNaRNdUuO1xwszYziqqlLZ2KwaaLIDHlElcbQR1S3GUKz6hddJc=
 =oesm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "ceph fscache page locking fix from Zheng, marked for stable"

* tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client:
  ceph: fix readpage from fscache
2017-09-01 12:46:30 -07:00
Christoph Hellwig
f2285c148c xfs: use xfs_iext_get_extent in xfs_bmap_first_unused
Use the bmap abstraction instead of open-coding bmbt details here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
50bb44c286 xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert
Use the helper instead of open coding it, to provide a better abstraction
for the scalable extent list work.  This also gets an additional assert
and trace point for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
67e4e69cb2 xfs: add a xfs_iext_update_extent helper
This helper is used to update an extent record based on the extent index,
and can be used to provide a level of abstractions between callers that
want to modify in-core extent records and the details of the extent list
implementation.

Also switch all users of the xfs_bmbt_set_all(xfs_iext_get_ext(...))
pattern to this new helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
d522d569d6 xfs: consolidate the various page fault handlers
Add a new __xfs_filemap_fault helper that implements all four page fault
callouts, and make these methods themselves small stubs that set the
correct write_fault flag, and exit early for the non-DAX case for the
hugepage related ones.

Also remove the extra size checking in the pfn_fault path, which is now
handled in the core DAX code.

Life would be so much simpler if we only had one method for all this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
e7647fb491 iomap: return VM_FAULT_* codes from iomap_page_mkwrite
All callers will need the VM_FAULT_* flags, so convert in the helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
2dd3d709fc xfs: relog dirty buffers during swapext bmbt owner change
The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
a5814bceea xfs: disallow marking previously dirty buffers as ordered
Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
6fb10d6d22 xfs: move bmbt owner change to last step of extent swap
The extent swap operation currently resets bmbt block owners before
the inode forks are swapped. The bmbt buffers are marked as ordered
so they do not have to be physically logged in the transaction.

This use of ordered buffers is not safe as bmbt buffers may have
been previously physically logged. The bmbt owner change algorithm
needs to be updated to physically log buffers that are already dirty
when/if they are encountered. This means that an extent swap will
eventually require multiple rolling transactions to handle large
btrees. In addition, all inode related changes must be logged before
the bmbt owner change scan begins and can roll the transaction for
the first time to preserve fs consistency via log recovery.

In preparation for such fixes to the bmbt owner change algorithm,
refactor the bmbt scan out of the extent fork swap code to the last
operation before the transaction is committed. Update
xfs_swap_extent_forks() to only set the inode log flags when an
owner change scan is necessary. Update xfs_swap_extents() to trigger
the owner change based on the inode log flags. Note that since the
owner change now occurs after the extent fork swap, the inode btrees
must be fixed up with the inode number of the current inode (similar
to log recovery).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
99c794c639 xfs: skip bmbt block ino validation during owner change
Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
8dc518dfa7 xfs: don't log dirty ranges for ordered buffers
Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
9684010d38 xfs: refactor buffer logging into buffer dirtying helper
xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer.  One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.

Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
e9385cc6fb xfs: ordered buffer log items are never formatted
Ordered buffers pass through the logging infrastructure without ever
being written to the log. The way this works is that the ordered
buffer status is transferred to the log vector at commit time via
the ->iop_size() callback. In xlog_cil_insert_format_items(),
ordered log vectors bypass ->iop_format() processing altogether.

Therefore it is unnecessary for xfs_buf_item_format() to handle
ordered buffers. Remove the unnecessary logic and assert that an
ordered buffer never reaches this point.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
6453c65d35 xfs: remove unnecessary dirty bli format check for ordered bufs
xfs_buf_item_unlock() historically checked the dirty state of the
buffer by manually checking the buffer log formats for dirty
segments. The introduction of ordered buffers invalidated this check
because ordered buffers have dirty bli's but no dirty (logged)
segments. The check was updated to accommodate ordered buffers by
looking at the bli state first and considering the blf only if the
bli is clean.

This logic is safe but unnecessary. There is no valid case where the
bli is clean yet the blf has dirty segments. The bli is set dirty
whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).

Remove the conditional blf dirty checks and replace with an assert
that should catch any discrepencies between bli and blf dirty
states. Refactor the old blf dirty check into a helper function to
be used by the assert.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster
a4f6cf6b2b xfs: open-code xfs_buf_item_dirty()
It checks a single flag and has one caller. It probably isn't worth
its own function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
8ad7c629b1 xfs: remove the ip argument to xfs_defer_finish
And instead require callers to explicitly join the inode using
xfs_defer_ijoin.  Also consolidate the defer error handling in
a few places using a goto label.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
882d8785fb xfs: rename xfs_defer_join to xfs_defer_ijoin
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig
411350df14 xfs: refactor xfs_trans_roll
Split xfs_trans_roll into a low-level helper that just rolls the
actual transaction and a new higher level xfs_trans_roll_inode
that takes care of logging and rejoining the inode.  This gets
rid of the NULL inode case, and allows to simplify the special
cases in the deferred operation code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Omar Sandoval
f2e9ad212d xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster()
After xfs_ifree_cluster() finds an inode in the radix tree and verifies
that the inode number is what it expected, xfs_reclaim_inode() can swoop
in and free it. xfs_ifree_cluster() will then happily continue working
on the freed inode. Most importantly, it will mark the inode stale,
which will probably be overwritten when the inode slab object is
reallocated, but if it has already been reallocated then we can end up
with an inode spuriously marked stale.

In 8a17d7ddedb4 ("xfs: mark reclaimed inodes invalid earlier") we added
a second check to xfs_iflush_cluster() to detect this race, but the
similar RCU lookup in xfs_ifree_cluster() needs the same treatment.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Darrick J. Wong
799ea9e9c5 xfs: evict all inodes involved with log redo item
When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery.  This also had the
effect of putting linked inodes on the lru instead of evicting them.

Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.

Therefore, evict unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: viro@ZenIV.linux.org.uk
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-09-01 10:55:30 -07:00
Steve French
7e682f766f Fix warning messages when mounting to older servers
When mounting to older servers, such as Windows XP (or even Windows 7),
the limited error messages that can be passed back to user space can
get confusing since the default dialect has changed from SMB1 (CIFS) to
more secure SMB3 dialect. Log additional information when the user chooses
to use the default dialects and when the server does not support the
dialect requested.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2017-09-01 00:18:44 -05:00
Linus Torvalds
e89ce1f89f two cifs bug fixes for stable
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQGcBAABAgAGBQJZpxWIAAoJEIosvXAHck9Rl1AL/2wNdEvDvDYZ48IeJN/R0Agb
 nhsXycDshjjUSMghm5IMXvaOitwQL3bcbVeGBDOF4UYoDgJZSuOeOvzYhAJaykiy
 Cs2rwve9Vuw9pliY1D4705g2jduvca6KP4Rzcmf8zeRqG8siQLDlFI8MqYcWiz6d
 F8O9EH+n+Daqf4L5b9IB/3aXZiQLDXKC5HjwJJlPjF6fhkqMbwFLSgTYPmCoM2M2
 oC4jDIPIvM7+p5K8Hs3UvkUaeUoMV1Hx/bba3gGFLK0lnGiD2davZPNBLR46nt6j
 L6l0ujg1pv0kOnTsyfiZp/m43UF5IGeLacBj5DczDrio/8FYKcLCoTIXpWB/dMn/
 NR9kGw/Y9M7jv+qumwrFzMaqv4ztMc2QssFsm37gHEtkEIBPiPTYixRIG/OkVRo7
 u8sZic1DvFot0YalZqpIX6Ss6egiY5CI4q7AWU8XwpkiW+K/RBN8g31Cmcx6scU2
 6/ywpNm2tP5D7AfH13jQh8CQOstnrFH50fUCyXpASA==
 =VgIs
 -----END PGP SIGNATURE-----

Merge tag 'cifs-fixes-for-4.13-rc7-and-stable' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Two cifs bug fixes for stable"

* tag 'cifs-fixes-for-4.13-rc7-and-stable' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: remove endian related sparse warning
  CIFS: Fix maximum SMB2 header size
2017-08-31 18:45:04 -07:00
Linus Torvalds
ea25c43179 Merge branch 'mmu_notifier_fixes'
Merge mmu_notifier fixes from Jérôme Glisse:
 "The invalidate_page callback suffered from 2 pitfalls. First it used
  to happen after page table lock was release and thus a new page might
  have been setup for the virtual address before the call to
  invalidate_page().

  This is in a weird way fixed by commit c7ab0d2fdc84 ("mm: convert
  try_to_unmap_one() to use page_vma_mapped_walk()") which moved the
  callback under the page table lock. Which also broke several existing
  user of the mmu_notifier API that assumed they could sleep inside this
  callback.

  The second pitfall was invalidate_page being the only callback not
  taking a range of address in respect to invalidation but was giving an
  address and a page. Lot of the callback implementer assumed this could
  never be THP and thus failed to invalidate the appropriate range for
  THP pages.

  By killing this callback we unify the mmu_notifier callback API to
  always take a virtual address range as input.

  There is now two clear API (I am not mentioning the youngess API which
  is seldomly used):

   - invalidate_range_start()/end() callback (which allow you to sleep)

   - invalidate_range() where you can not sleep but happen right after
     page table update under page table lock

  Note that a lot of existing user feels broken in respect to
  range_start/ range_end. Many user only have range_start() callback but
  there is nothing preventing them to undo what was invalidated in their
  range_start() callback after it returns but before any CPU page table
  update take place.

  The code pattern use in kvm or umem odp is an example on how to
  properly avoid such race. In a nutshell use some kind of sequence
  number and active range invalidation counter to block anything that
  might undo what the range_start() callback did.

  If you do not care about keeping fully in sync with CPU page table (ie
  you can live with CPU page table pointing to new different page for a
  given virtual address) then you can take a reference on the pages
  inside the range_start callback and drop it in range_end or when your
  driver is done with those pages.

  Last alternative is to use invalidate_range() if you can do
  invalidation without sleeping as invalidate_range() callback happens
  under the CPU page table spinlock right after the page table is
  updated.

  The first two patches convert existing mmu_notifier_invalidate_page()
  calls to mmu_notifier_invalidate_range() and bracket those call with
  call to mmu_notifier_invalidate_range_start()/end().

  The next ten patches remove existing invalidate_page() callback as it
  can no longer happen.

  Finally the last page remove the invalidate_page() callback completely
  so it can RIP.

  Changes since v1:
   - remove more dead code in kvm (no testing impact)
   - more accurate end address computation (patch 2) in page_mkclean_one
     and try_to_unmap_one
   - added tested-by/reviewed-by gotten so far"

* emailed patches from Jérôme Glisse <jglisse@redhat.com>:
  mm/mmu_notifier: kill invalidate_page
  KVM: update to new mmu_notifier semantic v2
  xen/gntdev: update to new mmu_notifier semantic
  sgi-gru: update to new mmu_notifier semantic
  misc/mic/scif: update to new mmu_notifier semantic
  iommu/intel: update to new mmu_notifier semantic
  iommu/amd: update to new mmu_notifier semantic
  IB/hfi1: update to new mmu_notifier semantic
  IB/umem: update to new mmu_notifier semantic
  drm/amdgpu: update to new mmu_notifier semantic
  powerpc/powernv: update to new mmu_notifier semantic
  mm/rmap: update to new mmu_notifier semantic v2
  dax: update to new mmu_notifier semantic
2017-08-31 17:30:01 -07:00