Add a helper that allows a caller to initialize a new bio_set,
using the settings from an existing bio_set.
Reported-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull misc vfs updates from Al Viro:
"Misc bits and pieces not fitting into anything more specific"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: delete unnecessary assignment in vfs_listxattr
Documentation: filesystems: update filesystem locking documentation
vfs: namei: use path_equal() in follow_dotdot()
fs.h: fix outdated comment about file flags
__inode_security_revalidate() never gets NULL opt_dentry
make xattr_getsecurity() static
vfat: simplify checks in vfat_lookup()
get rid of dead code in d_find_alias()
it's SB_BORN, not MS_BORN...
msdos_rmdir(): kill BS comment
remove rpc_rmdir()
fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJbFIrHAAoJEPfTWPspceCm2+kQAKo7o7HL30aRxJYu+gYafkuW
PV47zr3e4vhMDEzDaMsh1+V7I7bm3uS+NZu6cFbcV+N9KXFpeb4V4Hvvm5cs+OC3
WCOBi4eC1h4qnDQ3ZyySrCMN+KHYJ16pZqddEjqw+fhVudx8i+F+jz3Y4ZMDDc3q
pArKZvjKh2wEuYXUMFTjaXY46IgPt+er94OwvrhyHk+4AcA+Q/oqSfSdDahUC8jb
BVR3FV4I3NOHUaru0RbrUko13sVZSboWPCIFrlTDz8xXcJOnVHzdVS1WLFDXLHnB
O8q9cADCfa4K08kz68RxykcJiNxNvz5ChDaG0KloCFO+q1tzYRoXLsfaxyuUDg57
Zd93OFZC6hAzXdhclDFIuPET9OQIjDzwphodfKKmDsm3wtyOtydpA0o7JUEongp0
O1gQsEfYOXmQsXlo8Ot+Z7Ne/HvtGZ91JahUa/59edxQbcKaMrktoyQsQ/d1nOEL
4kXID18wPcFHWRQHYXyVuw6kbpRtQnh/U2m1eenSZ7tVQHwoe6mF3cfSf5MMseak
k8nAnmsfEvOL4Ar9ftg61GOrImaQlidxOC2A8fmY5r0Sq/ZldvIFIZizsdTTCcni
8SOTxcQowyqPf5NvMNQ8cKqqCJap3ppj4m7anZNhbypDIF2TmOWsEcXcMDn4y9on
fax14DPLo59gBRiPCn5f
=nga/
-----END PGP SIGNATURE-----
Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- clean up how we pass around gfp_t and
blk_mq_req_flags_t (Christoph)
- prepare us to defer scheduler attach (Christoph)
- clean up drivers handling of bounce buffers (Christoph)
- fix timeout handling corner cases (Christoph/Bart/Keith)
- bcache fixes (Coly)
- prep work for bcachefs and some block layer optimizations (Kent).
- convert users of bio_sets to using embedded structs (Kent).
- fixes for the BFQ io scheduler (Paolo/Davide/Filippo)
- lightnvm fixes and improvements (Matias, with contributions from Hans
and Javier)
- adding discard throttling to blk-wbt (me)
- sbitmap blk-mq-tag handling (me/Omar/Ming).
- remove the sparc jsflash block driver, acked by DaveM.
- Kyber scheduler improvement from Jianchao, making it more friendly
wrt merging.
- conversion of symbolic proc permissions to octal, from Joe Perches.
Previously the block parts were a mix of both.
- nbd fixes (Josef and Kevin Vigor)
- unify how we handle the various kinds of timestamps that the block
core and utility code uses (Omar)
- three NVMe pull requests from Keith and Christoph, bringing AEN to
feature completeness, file backed namespaces, cq/sq lock split, and
various fixes
- various little fixes and improvements all over the map
* tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits)
blk-mq: update nr_requests when switching to 'none' scheduler
block: don't use blocking queue entered for recursive bio submits
dm-crypt: fix warning in shutdown path
lightnvm: pblk: take bitmap alloc. out of critical section
lightnvm: pblk: kick writer on new flush points
lightnvm: pblk: only try to recover lines with written smeta
lightnvm: pblk: remove unnecessary bio_get/put
lightnvm: pblk: add possibility to set write buffer size manually
lightnvm: fix partial read error path
lightnvm: proper error handling for pblk_bio_add_pages
lightnvm: pblk: fix smeta write error path
lightnvm: pblk: garbage collect lines with failed writes
lightnvm: pblk: rework write error recovery path
lightnvm: pblk: remove dead function
lightnvm: pass flag on graceful teardown to targets
lightnvm: pblk: check for chunk size before allocating it
lightnvm: pblk: remove unnecessary argument
lightnvm: pblk: remove unnecessary indirection
lightnvm: pblk: return NVM_ error on failed submission
lightnvm: pblk: warn in case of corrupted write buffer
...
If we end up splitting a bio and the queue goes away between
the initial submission and the later split submission, then we
can block forever in blk_queue_enter() waiting for the reference
to drop to zero. This will never happen, since we already hold
a reference.
Mark a split bio as already having entered the queue, so we can
just use the live non-blocking queue enter variant.
Thanks to Tetsuo Handa for the analysis.
Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull networking fixes from David Miller:
1) Infinite loop in _decode_session6(), from Eric Dumazet.
2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
Dumazet.
3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.
4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
Makita.
5) Incorrect idr release in cls_flower, from Paul Blakey.
6) Probe error handling fix in davinci_emac, from Dan Carpenter.
7) Memory leak in XPS configuration, from Alexander Duyck.
8) Use after free with cloned sockets in kcm, from Kirill Tkhai.
9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
Dichtel.
10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
bpf: fix uapi hole for 32 bit compat applications
net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
ip6_tunnel: remove magic mtu value 0xFFF8
ip_tunnel: restore binding to ifaces with a large mtu
net: dsa: b53: Add BCM5389 support
kcm: Fix use-after-free caused by clonned sockets
net-sysfs: Fix memory leak in XPS configuration
ixgbe: fix parsing of TC actions for HW offload
net: ethernet: davinci_emac: fix error handling in probe()
net/ncsi: Fix array size in dumpit handler
cls_flower: Fix incorrect idr release when failing to modify rule
net/sonic: Use dma_mapping_error()
xfrm Fix potential error pointer dereference in xfrm_bundle_create.
vhost_net: flush batched heads before trying to busy polling
tun: Fix NULL pointer dereference in XDP redirect
be2net: Fix error detection logic for BE3
net: qmi_wwan: Add Netgear Aircard 779S
mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
atm: zatm: fix memcmp casting
iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbEvsXAAoJEAx081l5xIa+LncP/355OijfiJxjbze+eJVlz+1r
CIlZsZBMc9aqsGU2HyVCTx3dqglrx0UbXZWRyI5WgPuqgi8x/rgbrTk+RyWr3eKw
QQD+oHMN6vXn07ss3VYyHgW+5tkEZ/bbkTaTVRagag/wCZ9eOFgtww1VllYEw1ZG
QI+tOWKhBpcU1dnoL+Bxu1W4JckPXK5FPlcFF/zkt/bIOSedoj+3MZl5Db7c1NSz
quw0mR8C/luf5H2Ump5WIgqKRD8j3xM/EDnY0gdIg9HMmm3k0xIhNLcU/rwNi5ns
qvurOPGK9Fteu94QnmmIbKj1E/ms/KRDA+71UoqmW2YYKJJAYHYr6t8Q4HXlKFcC
MGq1pDGzaZrTbOtqHPJv6iLcnA28GgjDGQ0nQuNWp7mhjmb+fbqLftFQjLeHNPUc
lu3pmmE8FZWI4lSTiqj5ojM1ceZFgGFN2l52PS+17wVHAHln+WpIMbFpaqkxlFRz
zwMr09d70w8qQiW/0b5Pf8n7hq7ud3SZEOhzaAjQ6ggPNXRQ1czj1s8QUsUNt+3b
o+uYC9kSpS67PxUs4QTPFyicMueZaGB+WT5Y+Cr5d4OJIwenzkmRhlAuR05W755T
P5vbe4SJxih+FqcxfAWFJBFoRIRC49YxB/UXYpCIK4iSXpWVVNYearwzjJxywvzA
QppQU+Y9IfvVPNHQtzxX
=MK3/
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A few final fixes:
i915:
- fix for potential Spectre vector in the new query uAPI
- fix NULL pointer deref (FDO #106559)
- DMI fix to hide LVDS for Radiant P845 (FDO #105468)
amdgpu:
- suspend/resume DC regression fix
- underscan flicker fix on fiji
- gamma setting fix after dpms
omap:
- fix oops regression
core:
- fix PSR timing
dw-hdmi:
- fix oops regression"
* tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/amd/display: Update color props when modeset is required
drm/amd/display: Make atomic-check validate underscan changes
drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
drm/amd/display: Fix BUG_ON during CRTC atomic check update
drm/i915/query: nospec expects no more than an unsigned long
drm/i915/query: Protect tainted function pointer lookup
drm/i915/lvds: Move acpi lid notification registration to registration phase
drm/i915: Disable LVDS on Radiant P845
drm/omap: fix NULL deref crash with SDI displays
drm/psr: Fix missed entry in PSR setup time table.
Here are some old IIO driver fixes that were sitting in my tree for a
few weeks. Sorry about not getting them to you sooner. They fix a
number of small IIO driver issues that have been reported.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxKN4g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymkogCfbe2/iQJcT8kXD0s/73A/KqkKPksAn0NbVWyh
HEroVoD7yDcdm2A+t36A
=OWdB
-----END PGP SIGNATURE-----
Merge tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO driver fixes from Greg KH:
"Here are some old IIO driver fixes that were sitting in my tree for a
few weeks. Sorry about not getting them to you sooner. They fix a
number of small IIO driver issues that have been reported.
All of these have been in linux-next for a while with no reported
problems"
* tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: select buffer for at91-sama5d2_adc
iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume
iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
iio:kfifo_buf: check for uint overflow
iio:buffer: make length types match kfifo types
iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock
iio: adc: stm32-dfsdm: fix successive oversampling settings
iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
If the namespace is unregistered before the LightNVM target is removed
(e.g., on hot unplug) it is too late for the target to store any metadata
on the device - any attempt to write to the device will fail. In this
case, pass on a "gracefull teardown" flag to the target to let it know
when this happens.
In the case of pblk, we pad the open line (close all open chunks) to
improve data retention. In the event of an ungraceful shutdown, avoid
this part and just clean up.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe changes from Christoph:
"Below is another set of NVMe updates for 4.18. Besides the usual bug
fixes this includes more feature completness in terms of AEN and log
page handling on the target."
* 'nvme-4.18' of git://git.infradead.org/nvme:
nvme: use the changed namespaces list log to clear ns data changed AENs
nvme: mark nvme_queue_scan static
nvme: submit AEN event configuration on startup
nvmet: mask pending AENs
nvmet: add AEN configuration support
nvmet: implement the changed namespaces log
nvmet: split log page implementation
nvmet: add a new nvmet_zero_sgl helper
nvme.h: add AEN configuration symbols
nvme.h: add the changed namespace list log
nvme.h: untangle AEN notice definitions
nvmet: fix error return code in nvmet_file_ns_enable()
nvmet: fix a typo in nvmet_file_ns_enable()
nvme-fabrics: allow internal passthrough command on deleting controllers
nvme-loop: add support for multiple ports
nvme-pci: simplify __nvme_submit_cmd
nvme-pci: Rate limit the nvme timeout warnings
nvme: allow duplicate controller if prior controller being deleted
These are only used by the block core. Also move the declarations to
block/blk.h.
Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hannes Reinecke <hare@suse.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Stop including the event type in the definitions for the notice type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
All users have been converted to bioset_init(), kill off the
old API.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert the core block functionality to embedded bio sets.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The dw_hdmi_setup_rx_sense exported function should not use struct device
to recover the dw-hdmi context using drvdata, but take struct dw_hdmi
directly like other exported functions.
This caused a regression using Meson DRM on S905X since v4.17-rc1 :
Internal error: Oops: 96000007 [#1] PREEMPT SMP
[...]
CPU: 0 PID: 124 Comm: irq/32-dw_hdmi_ Not tainted 4.17.0-rc7 #2
Hardware name: Libre Technology CC (DT)
[...]
pc : osq_lock+0x54/0x188
lr : __mutex_lock.isra.0+0x74/0x530
[...]
Process irq/32-dw_hdmi_ (pid: 124, stack limit = 0x00000000adf418cb)
Call trace:
osq_lock+0x54/0x188
__mutex_lock_slowpath+0x10/0x18
mutex_lock+0x30/0x38
__dw_hdmi_setup_rx_sense+0x28/0x98
dw_hdmi_setup_rx_sense+0x10/0x18
dw_hdmi_top_thread_irq+0x2c/0x50
irq_thread_fn+0x28/0x68
irq_thread+0x10c/0x1a0
kthread+0x128/0x130
ret_from_fork+0x10/0x18
Code: 34000964 d00050a2 51000484 9135c042 (f864d844)
---[ end trace 945641e1fbbc07da ]---
note: irq/32-dw_hdmi_[124] exited with preempt_count 1
genirq: exiting task "irq/32-dw_hdmi_" (124) is an active IRQ thread (irq 32)
Fixes: eea034af90c6 ("drm/bridge/synopsys: dw-hdmi: don't clobber drvdata")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Koen Kooi <koen@dominion.thruhere.net>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1527673438-20643-1-git-send-email-narmstrong@baylibre.com
No functional changes in this patch, just a prep patch for utilizing
this in an IO scheduler.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Bsg holding a reference to the parent device may result in a crash if a
bsg file handle is closed after the parent device driver has unloaded.
Holding a reference is not really needed: the parent device must exist
between bsg_register_queue and bsg_unregister_queue. Before the device
goes away the caller does blk_cleanup_queue so that all in-flight
requests to the device are gone and all new requests cannot pass beyond
the queue. The queue itself is a refcounted object and it will stay
alive with a bsg file.
Based on analysis, previous patch and changelog from Anatoliy Glagolev.
Reported-by: Anatoliy Glagolev <glagolig@gmail.com>
Reviewed-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The information about a size change in this case just creates confusion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After the recent timeout handling changes, we have two holes in
the struct. Move the timeout near the deadline, killing both,
and moving related members closer together. On my config on
x86-64, this shrinks struct request from 312 to 304 bytes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command. Fix the symbolic name to reflect that a little better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch simplifies the timeout handling by relying on the request
reference counting to ensure the iterator is operating on an inflight
and truly timed out request. Since the reference counting prevents the
tag from being reallocated, the block layer no longer needs to prevent
drivers from completing their requests while the timeout handler is
operating on it: a driver completing a request is allowed to proceed to
the next state without additional syncronization with the block layer.
This also removes any need for generation sequence numbers since the
request lifetime is prevented from being reallocated as a new sequence
while timeout handling is operating on it.
To enables this a refcount is added to struct request so that request
users can be sure they're operating on the same request without it
changing while they're processing it. The request's tag won't be
released for reuse until both the timeout handler and the completion
are done with it.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: slight cleanups, added back submission side hctx lock, use cmpxchg
for completions]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As far as I can tell this function can't even be called any more, given
that ATA implements its own eh_strategy_handler with ata_scsi_error, which
never calls ->eh_timed_out.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull scheduler fixes from Thomas Gleixner:
"Three fixes for scheduler and kthread code:
- allow calling kthread_park() on an already parked thread
- restore the sched_pi_setprio() tracepoint behaviour
- clarify the unclear string for the scheduling domain debug output"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, tracing: Fix trace_sched_pi_setprio() for deboosting
kthread: Allow kthread_park() on a parked kthread
sched/topology: Clarify root domain(s) debug string
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: fix memory hotplug during boot
kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
checkpatch: fix macro argument precedence test
init/main.c: include <linux/mem_encrypt.h>
kernel/sys.c: fix potential Spectre v1 issue
mm/memory_hotplug: fix leftover use of struct page during hotplug
proc: fix smaps and meminfo alignment
mm: do not warn on offline nodes unless the specific node is explicitly requested
mm, memory_hotplug: make has_unmovable_pages more robust
mm/kasan: don't vfree() nonexistent vm_area
MAINTAINERS: change hugetlbfs maintainer and update files
ipc/shm: fix shmat() nil address after round-down when remapping
Revert "ipc/shm: Fix shmat mmap nil-page protection"
idr: fix invalid ptr dereference on item delete
ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
mm: fix nr_rotate_swap leak in swapon() error case
Pull networking fixes from David Miller:
"Let's begin the holiday weekend with some networking fixes:
1) Whoops need to restrict cfg80211 wiphy names even more to 64
bytes. From Eric Biggers.
2) Fix flags being ignored when using kernel_connect() with SCTP,
from Xin Long.
3) Use after free in DCCP, from Alexey Kodanev.
4) Need to check rhltable_init() return value in ipmr code, from Eric
Dumazet.
5) XDP handling fixes in virtio_net from Jason Wang.
6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
Morgenstein.
8) Prevent out-of-bounds speculation using indexes in BPF, from
Daniel Borkmann.
9) Fix regression added by AF_PACKET link layer cure, from Willem de
Bruijn.
10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
11) Missing config options for PMTU tests, from Stefano Brivio"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
ibmvnic: Fix partial success login retries
selftests/net: Add missing config options for PMTU tests
mlx4_core: allocate ICM memory in page size chunks
enic: set DMA mask to 47 bit
ppp: remove the PPPIOCDETACH ioctl
ipv4: remove warning in ip_recv_error
net : sched: cls_api: deal with egdev path only if needed
vhost: synchronize IOTLB message with dev cleanup
packet: fix reserve calculation
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
bpf: properly enforce index mask to prevent out-of-bounds speculation
net/mlx4: Fix irq-unsafe spinlock usage
net: phy: broadcom: Fix bcm_write_exp()
net: phy: broadcom: Fix auxiliary control register reads
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
ibmvnic: Only do H_EOI for mobility events
tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
virtio-net: fix leaking page for gso packet during mergeable XDP
...
The case of a new numa node got missed in avoiding using the node info
from page_struct during hotplug. In this path we have a call to
register_mem_sect_under_node (which allows us to specify it is hotplug
so don't change the node), via link_mem_sections which unfortunately
does not.
Fix is to pass check_nid through link_mem_sections as well and disable
it in the new numa node path.
Note the bug only 'sometimes' manifests depending on what happens to be
in the struct page structures - there are lots of them and it only needs
to match one of them.
The result of the bug is that (with a new memory only node) we never
successfully call register_mem_sect_under_node so don't get the memory
associated with the node in sysfs and meminfo for the node doesn't
report it.
It came up whilst testing some arm64 hotplug patches, but appears to be
universal. Whilst I'm triggering it by removing then reinserting memory
to a node with no other elements (thus making the node disappear then
appear again), it appears it would happen on hotplugging memory where
there was none before and it doesn't seem to be related the arm64
patches.
These patches call __add_pages (where most of the issue was fixed by
Pavel's patch). If there is a node at the time of the __add_pages call
then all is well as it calls register_mem_sect_under_node from there
with check_nid set to false. Without a node that function returns
having not done the sysfs related stuff as there is no node to use.
This is expected but it is the resulting path that fails...
Exact path to the problem is as follows:
mm/memory_hotplug.c: add_memory_resource()
The node is not online so we enter the 'if (new_node)' twice, on the
second such block there is a call to link_mem_sections which calls
into
drivers/node.c: link_mem_sections() which calls
drivers/node.c: register_mem_sect_under_node() which calls
get_nid_for_pfn and keeps trying until the output of that matches
the expected node (passed all the way down from
add_memory_resource)
It is effectively the same fix as the one referred to in the fixes tag
just in the code path for a new node where the comments point out we
have to rerun the link creation because it will have failed in
register_new_memory (as there was no node at the time). (actually that
comment is wrong now as we don't have register_new_memory any more it
got renamed to hotplug_memory_register in Pavel's patch).
Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar has noticed that we splat
WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9
[...]
CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-20180517-1-default+ #66
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Call Trace:
vmemmap_populate+0xf2/0x2ae
sparse_mem_map_populate+0x28/0x35
sparse_add_one_section+0x4c/0x187
__add_pages+0xe7/0x1a0
add_pages+0x16/0x70
add_memory_resource+0xa3/0x1d0
add_memory+0xe4/0x110
acpi_memory_device_add+0x134/0x2e0
acpi_bus_attach+0xd9/0x190
acpi_bus_scan+0x37/0x70
acpi_device_hotplug+0x389/0x4e0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x146/0x340
worker_thread+0x47/0x3e0
kthread+0xf5/0x130
ret_from_fork+0x35/0x40
when adding memory to a node that is currently offline.
The VM_WARN_ON is just too loud without a good reason. In this
particular case we are doing
alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order)
so we do not insist on allocating from the given node (it is more a
hint) so we can fall back to any other populated node and moreover we
explicitly ask to not warn for the allocation failure.
Soften the warning only to cases when somebody asks for the given node
explicitly by __GFP_THISNODE.
Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Oscar Salvador <osalvador@techadventures.net>
Tested-by: Oscar Salvador <osalvador@techadventures.net>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf 2018-05-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a bug in the original fix to prevent out of bounds speculation when
multiple tail call maps from different branches or calls end up at the
same tail call helper invocation, from Daniel.
2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
case of missing numa support and another one to update kernel config to
properly support xdp_meta.sh test, from Anders.
...
Would be great if you have a chance to merge net into net-next after that.
The verifier fix would be needed later as a dependency in bpf-next for
upcomig work there. When you do the merge there's a trivial conflict on
BPF side with 849fa50662fb ("bpf/verifier: refine retval R0 state for
bpf_get_stack helper"): Resolution is to keep both functions, the
do_refine_retval_range() and record_func_map().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the following commit:
b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")
the sched_pi_setprio trace point shows the "newprio" during a deboost:
|futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio newprio=3D98
|futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio=120
This patch open codes __rt_effective_prio() in the tracepoint as the
'newprio' to get the old behaviour back / the correct priority:
|futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio newprio=3D120
|futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio=120
Peter suggested to open code the new priority so people using tracehook
could get the deadline data out.
Reported-by: Mansky Christian <man@keba.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")
Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmjjzu@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea. It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference. However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances. As reported by syzbot, this can trivially
be used to cause a use-after-free.
Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.
All pppd versions released in the last 15 years just close() the file
descriptor instead.
Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave
a stub in place that prints a one-time warning and returns EINVAL.
Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Remove bouncing addresses from the MAINTAINERS file
- Kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers
- Various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers
- Two fixes for patches already sent during the merge window
- A long standing bug related to not decreasing the pinned pages count in the right
MM was found and fixed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJbByPQAAoJEDht9xV+IJsa164P/AihB/vbn9MBdK3pe1OSUGTm
tKZJ/Y6nY/Q/XTJSeM2wNECk8fOrZbKuLBz2XlPRsB2djp4ugC5WWfK9YbwWMGXG
I5B/lB8VTorQr8E5i9lqqMDQc8aF8VcGJtdqVE3nD4JsVTrQSGiSnw45/BARDUm3
OycJJMDOWhDj2wnNSa+JfjPemIMDM1jse7DnsJfDsGfTMS/G+6nyzjKIlEnnFZ8/
PBxhq0q7C5viNDwwn2GsAVUrATTlW48SY0WYhkgMdSl20d2th9wMZqNMqtniz8NP
lg87SrhzsAPOTlbSWlYYkAnzE7nEhfJyIfYUp2piNJeYuOohYPtO6w99Tqjl/GmU
uLIYIXtZCxAK1Zb/znc49HkRVL5YFDsQGXdtYy7tvRZPwwR32kowUtpKIWaZFz8O
BA/x+Zgqu9AlwqSWwQwxmMbUX42RRwhNJDVyTYlXQSSzhfgFaLIZARqb4K6HxeNN
vZN0BK+x6pX6FI7hpdsqNRtH1oo4SNUBxiuUsrZ7cy7GqYNdUJ6piygDgmERaJxU
svIUJof/+OoU1QyErQ0JgUEK/3jOHbjxSPb/rjQeqxAnCqhaGOuNGMtdfsGqgvBU
x/u3eDcbfi/LBErXR46gYtxnOQ8I2BB+m8erUc/GVvCzWrX+R7ELZYpBrP5Pcu/6
mr2D7hDqgZHbeU8aB8+D
=uFZh
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"This is pretty much just the usual array of smallish driver bugs.
- remove bouncing addresses from the MAINTAINERS file
- kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
hns drivers
- various small LOC behavioral/operational bugs in mlx5, hns, qedr
and i40iw drivers
- two fixes for patches already sent during the merge window
- a long-standing bug related to not decreasing the pinned pages
count in the right MM was found and fixed"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
RDMA/hns: Move the location for initializing tmp_len
RDMA/hns: Bugfix for cq record db for kernel
IB/uverbs: Fix uverbs_attr_get_obj
RDMA/qedr: Fix doorbell bar mapping for dpi > 1
IB/umem: Use the correct mm during ib_umem_release
iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
RDMA/i40iw: Avoid reference leaks when processing the AEQ
RDMA/i40iw: Avoid panic when objects are being created and destroyed
RDMA/hns: Fix the bug with NULL pointer
RDMA/hns: Set NULL for __internal_mr
RDMA/hns: Enable inner_pa_vld filed of mpt
RDMA/hns: Set desc_dma_addr for zero when free cmq desc
RDMA/hns: Fix the bug with rq sge
RDMA/hns: Not support qp transition from reset to reset for hip06
RDMA/hns: Add return operation when configured global param fail
RDMA/hns: Update convert function of endian format
RDMA/hns: Load the RoCE dirver automatically
RDMA/hns: Bugfix for rq record db for kernel
RDMA/hns: Add rq inline flags judgement
...
This reverts the following commits that change CMA design in MM.
3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")
1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA")
bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
Ville reported a following error on i386.
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
microcode: microcode updated early to revision 0x4, date = 2013-06-28
Initializing CPU#0
Initializing HighMem for node 0 (000377fe:00118000)
Initializing Movable for node 0 (00000001:00118000)
BUG: Bad page state in process swapper pfn:377fe
page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
flags: 0x80000000()
raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
Call Trace:
dump_stack+0x60/0x96
bad_page+0x9a/0x100
free_pages_check_bad+0x3f/0x60
free_pcppages_bulk+0x29d/0x5b0
free_unref_page_commit+0x84/0xb0
free_unref_page+0x3e/0x70
__free_pages+0x1d/0x20
free_highmem_page+0x19/0x40
add_highpages_with_active_regions+0xab/0xeb
set_highmem_pages_init+0x66/0x73
mem_init+0x1b/0x1d7
start_kernel+0x17a/0x363
i386_start_kernel+0x95/0x99
startup_32_smp+0x164/0x168
The reason for this error is that the span of MOVABLE_ZONE is extended
to whole node span for future CMA initialization, and, normal memory is
wrongly freed here. I submitted the fix and it seems to work, but,
another problem happened.
It's so late time to fix the later problem so I decide to reverting the
series.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the allocation process is scheduled back and the mapped hw queue is
changed, fake one extra wake up on previous queue for compensating wake
up miss, so other allocations on the previous queue won't be starved.
This patch fixes one request allocation hang issue, which can be
triggered easily in case of very low nr_request.
The race is as follows:
1) 2 hw queues, nr_requests are 2, and wake_batch is one
2) there are 3 waiters on hw queue 0
3) two in-flight requests in hw queue 0 are completed, and only two
waiters of 3 are waken up because of wake_batch, but both the two
waiters can be scheduled to another CPU and cause to switch to hw
queue 1
4) then the 3rd waiter will wait for ever, since no in-flight request
is in hw queue 0 any more.
5) this patch fixes it by the fake wakeup when waiter is scheduled to
another hw queue
Cc: <stable@vger.kernel.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Modified commit message to make it clearer, and make it apply on
top of the 4.18 branch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While reviewing the verifier code, I recently noticed that the
following two program variants in relation to tail calls can be
loaded.
Variant 1:
# bpftool p d x i 15
0: (15) if r1 == 0x0 goto pc+3
1: (18) r2 = map[id:5]
3: (05) goto pc+2
4: (18) r2 = map[id:6]
6: (b7) r3 = 7
7: (35) if r3 >= 0xa0 goto pc+2
8: (54) (u32) r3 &= (u32) 255
9: (85) call bpf_tail_call#12
10: (b7) r0 = 1
11: (95) exit
# bpftool m s i 5
5: prog_array flags 0x0
key 4B value 4B max_entries 4 memlock 4096B
# bpftool m s i 6
6: prog_array flags 0x0
key 4B value 4B max_entries 160 memlock 4096B
Variant 2:
# bpftool p d x i 20
0: (15) if r1 == 0x0 goto pc+3
1: (18) r2 = map[id:8]
3: (05) goto pc+2
4: (18) r2 = map[id:7]
6: (b7) r3 = 7
7: (35) if r3 >= 0x4 goto pc+2
8: (54) (u32) r3 &= (u32) 3
9: (85) call bpf_tail_call#12
10: (b7) r0 = 1
11: (95) exit
# bpftool m s i 8
8: prog_array flags 0x0
key 4B value 4B max_entries 160 memlock 4096B
# bpftool m s i 7
7: prog_array flags 0x0
key 4B value 4B max_entries 4 memlock 4096B
In both cases the index masking inserted by the verifier in order
to control out of bounds speculation from a CPU via b2157399cc98
("bpf: prevent out-of-bounds speculation") seems to be incorrect
in what it is enforcing. In the 1st variant, the mask is applied
from the map with the significantly larger number of entries where
we would allow to a certain degree out of bounds speculation for
the smaller map, and in the 2nd variant where the mask is applied
from the map with the smaller number of entries, we get buggy
behavior since we truncate the index of the larger map.
The original intent from commit b2157399cc98 is to reject such
occasions where two or more different tail call maps are used
in the same tail call helper invocation. However, the check on
the BPF_MAP_PTR_POISON is never hit since we never poisoned the
saved pointer in the first place! We do this explicitly for map
lookups but in case of tail calls we basically used the tail
call map in insn_aux_data that was processed in the most recent
path which the verifier walked. Thus any prior path that stored
a pointer in insn_aux_data at the helper location was always
overridden.
Fix it by moving the map pointer poison logic into a small helper
that covers both BPF helpers with the same logic. After that in
fixup_bpf_calls() the poison check is then hit for tail calls
and the program rejected. Latter only happens in unprivileged
case since this is the *only* occasion where a rewrite needs to
happen, and where such rewrite is specific to the map (max_entries,
index_mask). In the privileged case the rewrite is generic for
the insn->imm / insn->code update so multiple maps from different
paths can be handled just fine since all the remaining logic
happens in the instruction processing itself. This is similar
to the case of map lookups: in case there is a collision of
maps in fixup_bpf_calls() we must skip the inlined rewrite since
this will turn the generic instruction sequence into a non-
generic one. Thus the patch_call_imm will simply update the
insn->imm location where the bpf_map_lookup_elem() will later
take care of the dispatch. Given we need this 'poison' state
as a check, the information of whether a map is an unpriv_array
gets lost, so enforcing it prior to that needs an additional
state. In general this check is needed since there are some
complex and tail call intensive BPF programs out there where
LLVM tends to generate such code occasionally. We therefore
convert the map_ptr rather into map_state to store all this
w/o extra memory overhead, and the bit whether one of the maps
involved in the collision was from an unpriv_array thus needs
to be retained as well there.
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The err pointer comes from uverbs_attr_get, not from the uobject member,
which does not store an ERR_PTR.
Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
* hwsim radio dump wasn't working for the first radio
* mesh was updating statistics incorrectly
* a netlink message allocation was possibly too short
* wiphy name limit was still too long
* in certain cases regdb query could find a NULL pointer
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlsFNRQACgkQB8qZga/f
l8RTBw//Txtrv6BHZ5VHaibMwFfXB9UIuhzogcuUYzxF1qXzF4l4N2GehUGdlKPy
pjNGwYbqtD1b/mCa/BSGAHcuHXSQNmVRVOv3Vjvb6XtAPTVXiQcYWYqRA+F90IcL
gpfrl1RQKHGoZ8S/DST8YEtzgEog9hRr/WvnOCphVqnDohUM5UIv2iPF8Vp5ylQ8
cIwVa7/QgPJ5vG8EE7aPJTHnga9kNRqvlIAODq8H5QwwFTUPP431AjUHK7/nL/+l
GQF1CXA2mQldPowsNK82vS8guaIykD3wxLeuWBiHCa7EExmX5eA5NlySvvAwd1bE
2fM4/vTA5X0jNzqIYzVqZS4rbEHu7h/Kcm1QWCydl0xeKxONO0nfJj+AdWUBE2YH
g9CHEqnChIJgw43kXbN2E2WflDRL+k4yjQlvhbWIcr9yk/8pdO4mkD6qpwGbBQsn
kyeWbhB58M7IGAkTqrx9FeK7rCnPO5SZGFRD7Rpou0S5ioP7ce/xvkv5gvYN3Knu
OUsQv42mwY23cx2XEyeJ/4pXMaihw1lRyiTHSgpJjha2XdmlvvfPu5/pJcSft4jt
weQTot6ugCGimyBx/5bsJIczuHMVE1pD9ctjty5I0Xxfv/Qj78HSwFlB3RKWsFnG
fzksSC1ik5OSYBOi6vFiRO10EUBlVjNXPhP8x5LnAkBWwHp5juM=
=KWsp
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2018-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A handful of fixes:
* hwsim radio dump wasn't working for the first radio
* mesh was updating statistics incorrectly
* a netlink message allocation was possibly too short
* wiphy name limit was still too long
* in certain cases regdb query could find a NULL pointer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags
param can't be passed into its proto .connect where this flags is really
needed.
sctp works around it by getting flags from socket file in __sctp_connect.
It works for connecting from userspace, as inherently the user sock has
socket file and it passes f_flags as the flags param into the proto_ops
.connect.
However, the sock created by sock_create_kern doesn't have a socket file,
and it passes the flags (like O_NONBLOCK) by using the flags param in
kernel_connect, which calls proto_ops .connect later.
So to fix it, this patch defines a new proto_ops .connect for sctp,
sctp_inet_connect, which calls __sctp_connect() directly with this
flags param. After this, the sctp's proto .connect can be removed.
Note that sctp_inet_connect doesn't need to do some checks that are not
needed for sctp, which makes thing better than with inet_dgram_connect.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs fixes from Al Viro:
"Assorted fixes all over the place"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: fix io_destroy(2) vs. lookup_ioctx() race
ext2: fix a block leak
nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed
cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed
unfuck sysfs_mount()
kernfs: deal with kernfs_fill_super() failures
cramfs: Fix IS_ENABLED typo
befs_lookup(): use d_splice_alias()
affs_lookup: switch to d_splice_alias()
affs_lookup(): close a race with affs_remove_link()
fix breakage caused by d_find_alias() semantics change
fs: don't scan the inode cache before SB_BORN is set
do d_instantiate/unlock_new_inode combinations safely
iov_iter: fix memory leak in pipe_get_pages_alloc()
iov_iter: fix return type of __pipe_get_pages()
Merge speculative store buffer bypass fixes from Thomas Gleixner:
- rework of the SPEC_CTRL MSR management to accomodate the new fancy
SSBD (Speculative Store Bypass Disable) bit handling.
- the CPU bug and sysfs infrastructure for the exciting new Speculative
Store Bypass 'feature'.
- support for disabling SSB via LS_CFG MSR on AMD CPUs including
Hyperthread synchronization on ZEN.
- PRCTL support for dynamic runtime control of SSB
- SECCOMP integration to automatically disable SSB for sandboxed
processes with a filter flag for opt-out.
- KVM integration to allow guests fiddling with SSBD including the new
software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on
AMD.
- BPF protection against SSB
.. this is just the core and x86 side, other architecture support will
come separately.
* 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
bpf: Prevent memory disambiguation attack
x86/bugs: Rename SSBD_NO to SSB_NO
KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
x86/bugs: Rework spec_ctrl base and mask logic
x86/bugs: Remove x86_spec_ctrl_set()
x86/bugs: Expose x86_spec_ctrl_base directly
x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
x86/speculation: Rework speculative_store_bypass_update()
x86/speculation: Add virtualized speculative store bypass disable support
x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
x86/speculation: Handle HT correctly on AMD
x86/cpufeatures: Add FEATURE_ZEN
x86/cpufeatures: Disentangle SSBD enumeration
x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
KVM: SVM: Move spec control call after restore of GS
x86/cpu: Make alternative_msr_write work for 32-bit code
x86/bugs: Fix the parameters alignment and missing void
x86/bugs: Make cpu_show_common() static
...
Pull networking fixes from David Miller:
1) Fix refcounting bug for connections in on-packet scheduling mode of
IPVS, from Julian Anastasov.
2) Set network header properly in AF_PACKET's packet_snd, from Willem
de Bruijn.
3) Fix regressions in 3c59x by converting to generic DMA API. It was
relying upon the hack that the PCI DMA interfaces would accept NULL
for EISA devices. From Christoph Hellwig.
4) Remove RDMA devices before unregistering netdev in QEDE driver, from
Michal Kalderon.
5) Use after free in TUN driver ptr_ring usage, from Jason Wang.
6) Properly check for missing netlink attributes in SMC_PNETID
requests, from Eric Biggers.
7) Set DMA mask before performaing any DMA operations in vmxnet3
driver, from Regis Duchesne.
8) Fix mlx5 build with SMP=n, from Saeed Mahameed.
9) Classifier fixes in bcm_sf2 driver from Florian Fainelli.
10) Tuntap use after free during release, from Jason Wang.
11) Don't use stack memory in scatterlists in tls code, from Matt
Mullins.
12) Not fully initialized flow key object in ipv4 routing code, from
David Ahern.
13) Various packet headroom bug fixes in ip6_gre driver, from Petr
Machata.
14) Remove queues from XPS maps using correct index, from Amritha
Nambiar.
15) Fix use after free in sock_diag, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
net: ip6_gre: fix tunnel metadata device sharing.
cxgb4: fix offset in collecting TX rate limit info
net: sched: red: avoid hashing NULL child
sock_diag: fix use-after-free read in __sk_free
sh_eth: Change platform check to CONFIG_ARCH_RENESAS
net: dsa: Do not register devlink for unused ports
net: Fix a bug in removing queues from XPS map
bpf: fix truncated jump targets on heavy expansions
bpf: parse and verdict prog attach may race with bpf map update
bpf: sockmap update rollback on error can incorrectly dec prog refcnt
net: test tailroom before appending to linear skb
net: ip6_gre: Fix ip6erspan hlen calculation
net: ip6_gre: Split up ip6gre_changelink()
net: ip6_gre: Split up ip6gre_newlink()
net: ip6_gre: Split up ip6gre_tnl_change()
net: ip6_gre: Split up ip6gre_tnl_link_config()
net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
net: ip6_gre: Request headroom in __gre6_xmit()
selftests/bpf: check return value of fopen in test_verifier.c
erspan: fix invalid erspan version.
...
Pull locking fixes from Thomas Gleixner:
"Two fixes to address shortcomings of the rwsem/percpu-rwsem lock
debugging code which emits false positive warnings when the rwsem is
anonymously locked and unlocked"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag
Pull EFI fixes from Thomas Gleixner:
- Use explicitely sized type for the romimage pointer in the 32bit EFI
protocol struct so a 64bit kernel does not expand it to 64bit. Ditto
for the 64bit struct to avoid the reverse issue on 32bit kernels.
- Handle randomized tex offset correctly in the ARM64 EFI stub to avoid
unaligned data resulting in stack corruption and other hard to
diagnose wreckage.
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm64: Handle randomized TEXT_OFFSET
efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode