674155 Commits

Author SHA1 Message Date
David S. Miller
4e9c3a6671 selftests: bpf: Use bpf_endian.h in test_xdp.c
This fixes the testcase on big-endian.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02 07:52:01 -07:00
Paolo Abeni
24b43c9964 infiniband: avoid dereferencing uninitialized dst on error path
With commit eea40b8f624f ("infiniband: call ipv6 route lookup
via the stub interface"), if the route lookup fails due to
ipv6 being disabled, the dst variable is left untouched, and
the following dst_release() may access uninitialized memory.

Since ipv6_dst_lookup() always sets dst to NULL in case of
lookup failure with ipv6 enabled, fix the above just
returning the error code if the lookup fails.

Fixes: eea40b8f624 ("infiniband: call ipv6 route lookup via the stub interface")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-02 10:45:45 -04:00
Paul Moore
48d0e023af audit: fix the RCU locking for the auditd_connection structure
Cong Wang correctly pointed out that the RCU read locking of the
auditd_connection struct was wrong, this patch correct this by
adopting a more traditional, and correct RCU locking model.

This patch is heavily based on an earlier prototype by Cong Wang.

Cc: <stable@vger.kernel.org> # 4.11.x-
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:05 -04:00
Paul Moore
8cc96382d9 audit: use kmem_cache to manage the audit_buffer cache
The audit subsystem implemented its own buffer cache mechanism which
is a bit silly these days when we could use the kmem_cache construct.

Some credit is due to Florian Westphal for originally proposing that
we remove the audit cache implementation in favor of simple
kmalloc()/kfree() calls, but I would rather have a dedicated slab
cache to ease debugging and future stats/performance work.

Cc: Florian Westphal <fw@strlen.de>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:05 -04:00
Deepa Dinamani
2115bb250f audit: Use timespec64 to represent audit timestamps
struct timespec is not y2038 safe.
Audit timestamps are recorded in string format into
an audit buffer for a given context.
These mark the entry timestamps for the syscalls.
Use y2038 safe struct timespec64 to represent the times.
The log strings can handle this transition as strings can
hold upto 1024 characters.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:05 -04:00
Paul Moore
b6c7c115c2 audit: store the auditd PID as a pid struct instead of pid_t
This is arguably the right thing to do, and will make it easier when
we start supporting multiple audit daemons in different namespaces.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:05 -04:00
Paul Moore
45a0642b4d audit: kernel generated netlink traffic should have a portid of 0
We were setting the portid incorrectly in the netlink message headers,
fix that to always be 0 (nlmsg_pid = 0).

Signed-off-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
2017-05-02 10:16:05 -04:00
Paul Moore
a9d1620877 audit: combine audit_receive() and audit_receive_skb()
There is no reason to have both of these functions, combine the two.

Signed-off-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
2017-05-02 10:16:05 -04:00
Elena Reshetova
bd120ded6a audit: convert audit_watch.count from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
[PM: fix subject line, add #include]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:05 -04:00
Elena Reshetova
9d2378f8c8 audit: convert audit_tree.count from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
[PM: fix subject line, add #include]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:04 -04:00
Richard Guy Briggs
2173c519d5 audit: normalize NETFILTER_PKT
Eliminate flipping in and out of message fields, dropping fields in the
process.

Sample raw message format IPv4 UDP:
type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
Sample raw message format IPv6 ICMP6:
type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]

Issue: https://github.com/linux-audit/audit-kernel/issues/11
Test case: https://github.com/linux-audit/audit-testsuite/issues/43

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:04 -04:00
Richard Guy Briggs
0cb88b6ff0 netfilter: use consistent ipv4 network offset in xt_AUDIT
Even though the skb->data pointer has been moved from the link layer
header to the network layer header, use the same method to calculate the
offset in ipv4 and ipv6 routines.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: munged subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:04 -04:00
Richard Guy Briggs
f6276ac95b audit: log module name on delete_module
When a sysadmin wishes to monitor module unloading with a syscall rule such as:
 -a always,exit -F arch=x86_64 -S delete_module -F key=mod-unload
the SYSCALL record doesn't tell us what module was requested for unloading.

Use the new KERN_MODULE auxiliary record to record it.
The SYSCALL record result code will list the return code.

See: https://github.com/linux-audit/audit-kernel/issues/37
    https://github.com/linux-audit/audit-kernel/issues/7
    https://github.com/linux-audit/audit-kernel/wiki/RFE-Module-Load-Record-Format

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:04 -04:00
Nicholas Mc Guire
9aab4f4ea7 audit: remove unnecessary semicolon in audit_watch_handle_event()
The excess ; after the closing parenthesis is just code-noise it has no
and can be removed.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:04 -04:00
Nicholas Mc Guire
b5239fba69 audit: remove unnecessary semicolon in audit_mark_handle_event()
The excess ; after the closing parenthesis is just code-noise it has no
and can be removed.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:04 -04:00
Nicholas Mc Guire
b7a84deaf8 audit: remove unnecessary semicolon in audit_field_valid()
The excess ; after the closing parenthesis is just code-noise it has no
and can be removed.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
[PM: tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02 10:16:03 -04:00
Jakub Kicinski
b5d60989c6 xdp: fix parameter kdoc for extack
Fix kdoc parameter spelling from extact to extack.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02 09:29:35 -04:00
Daniel Borkmann
eb6211d360 bpf, samples: fix build warning in cookie_uid_helper_example
Fix the following warnings triggered by 51570a5ab2b7 ("A Sample of
using socket cookie and uid for traffic monitoring"):

  In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
     -32 + offsetof(struct stats, uid)),
                           ^
  /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
   .off   = OFF,     \
            ^
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
     -32 + offsetof(struct stats, packets), 1),
                           ^
  /home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
   .off   = OFF,     \
            ^
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
     -32 + offsetof(struct stats, bytes)),
                           ^
  /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
   .off   = OFF,     \
            ^
  HOSTLD  /home/foo/net-next/samples/bpf/per_socket_stats_example

Fixes: 51570a5ab2b7 ("A Sample of using socket cookie and uid for traffic monitoring")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02 09:29:35 -04:00
Russell Currey
c0b64978f0 powerpc/eeh: Clean up and document event handling functions
Remove unnecessary tags in eeh_handle_normal_event(), and add function
comments for eeh_handle_normal_event() and eeh_handle_special_event().

The only functional difference is that in the case of a PE reaching the
maximum number of failures, rather than one message telling you of this
and suggesting you reseat the device, there are two separate messages.

Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-02 22:41:43 +10:00
Russell Currey
daeba2956f powerpc/eeh: Avoid use after free in eeh_handle_special_event()
eeh_handle_special_event() is called when an EEH event is detected but
can't be narrowed down to a specific PE.  This function looks through
every PE to find one in an erroneous state, then calls the regular event
handler eeh_handle_normal_event() once it knows which PE has an error.

However, if eeh_handle_normal_event() found that the PE cannot possibly
be recovered, it will free it, rendering the passed PE stale.
This leads to a use after free in eeh_handle_special_event() as it attempts to
clear the "recovering" state on the PE after eeh_handle_normal_event() returns.

Thus, make sure the PE is valid when attempting to clear state in
eeh_handle_special_event().

Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event")
Cc: stable@vger.kernel.org # v3.11+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-02 22:41:43 +10:00
Alastair D'Silva
a715626a8e cxl: Mask slice error interrupts after first occurrence
In some situations, a faulty AFU slice may create an interrupt storm of
slice errors, rendering the machine unusable. Since these interrupts are
informational only, present the interrupt once, then mask it off to
prevent it from being retriggered until the AFU is reset.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-02 22:41:42 +10:00
Vaibhav Jain
4f58f0bf15 cxl: Route eeh events to all drivers in cxl_pci_error_detected()
Fix a boundary condition where in some cases an eeh event that results
in card reset isn't passed on to a driver attached to the virtual PCI
device associated with a slice. This will happen in case when a slice
attached device driver returns a value other than
PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
would result in an early return from cxl_pci_error_detected() and
other drivers attached to other AFUs on the card wont be notified.

The patch fixes this by making sure that all slice attached
device-drivers are notified and the return values from
error_detected() callback are aggregated in a scheme where request for
'disconnect' trumps all and 'none' trumps 'need_reset'.

Fixes: 9e8df8a21963 ("cxl: EEH support")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-02 22:41:42 +10:00
Vaibhav Jain
ea9a26d117 cxl: Force context lock during EEH flow
During an eeh event when the cxl card is fenced and card sysfs attr
perst_reloads_same_image is set following warning message is seen in the
kernel logs:

  Adapter context unlocked with 0 active contexts
  ------------[ cut here ]------------
  WARNING: CPU: 12 PID: 627 at
  ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]

Even though this warning is harmless, it clutters the kernel log
during an eeh event. This warning is triggered as the EEH callback
cxl_pci_error_detected doesn't obtain a context-lock before forcibly
detaching all active context and when context-lock is released during
call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
cxl_adapter_context_unlock is triggered.

To fix this warning, we acquire the adapter context-lock via
cxl_adapter_context_lock() in the eeh callback
cxl_pci_error_detected() once all the virtual AFU PHBs are notified
and their contexts detached. The context-lock is released in
cxl_pci_slot_reset() after the adapter is successfully reconfigured
and before the we call the slot_reset callback on slice attached
device-drivers.

Fixes: 70b565bbdb91 ("cxl: Prevent adapter reset if an active context exists")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Tested-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-02 22:41:41 +10:00
Julien Grall
e371fd7607 xen: Implement EFI reset_system callback
When rebooting DOM0 with ACPI on ARM64, the kernel is crashing with the stack
trace [1].

This is happening because when EFI runtimes are enabled, the reset code
(see machine_restart) will first try to use EFI restart method.

However, the EFI restart code is expecting the reset_system callback to
be always set. This is not the case for Xen and will lead to crash.

The EFI restart helper is used in multiple places and some of them don't
not have fallback (see machine_power_off). So implement reset_system
callback as a call to xen_reboot when using EFI Xen.

[   36.999270] reboot: Restarting system
[   37.002921] Internal error: Attempting to execute userspace memory: 86000004 [#1] PREEMPT SMP
[   37.011460] Modules linked in:
[   37.014598] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.11.0-rc1-00003-g1e248b60a39b-dirty #506
[   37.023903] Hardware name: (null) (DT)
[   37.027734] task: ffff800902068000 task.stack: ffff800902064000
[   37.033739] PC is at 0x0
[   37.036359] LR is at efi_reboot+0x94/0xd0
[   37.040438] pc : [<0000000000000000>] lr : [<ffff00000880f2c4>] pstate: 404001c5
[   37.047920] sp : ffff800902067cf0
[   37.051314] x29: ffff800902067cf0 x28: ffff800902068000
[   37.056709] x27: ffff000008992000 x26: 000000000000008e
[   37.062104] x25: 0000000000000123 x24: 0000000000000015
[   37.067499] x23: 0000000000000000 x22: ffff000008e6e250
[   37.072894] x21: ffff000008e6e000 x20: 0000000000000000
[   37.078289] x19: ffff000008e5d4c8 x18: 0000000000000010
[   37.083684] x17: 0000ffffa7c27470 x16: 00000000deadbeef
[   37.089079] x15: 0000000000000006 x14: ffff000088f42bef
[   37.094474] x13: ffff000008f42bfd x12: ffff000008e706c0
[   37.099870] x11: ffff000008e70000 x10: 0000000005f5e0ff
[   37.105265] x9 : ffff800902067a50 x8 : 6974726174736552
[   37.110660] x7 : ffff000008cc6fb8 x6 : ffff000008cc6fb0
[   37.116055] x5 : ffff000008c97dd8 x4 : 0000000000000000
[   37.121453] x3 : 0000000000000000 x2 : 0000000000000000
[   37.126845] x1 : 0000000000000000 x0 : 0000000000000000
[   37.132239]
[   37.133808] Process systemd-shutdow (pid: 1, stack limit = 0xffff800902064000)
[   37.141118] Stack: (0xffff800902067cf0 to 0xffff800902068000)
[   37.146949] 7ce0:                                   ffff800902067d40 ffff000008085334
[   37.154869] 7d00: 0000000000000000 ffff000008f3b000 ffff800902067d40 ffff0000080852e0
[   37.162787] 7d20: ffff000008cc6fb0 ffff000008cc6fb8 ffff000008c7f580 ffff000008c97dd8
[   37.170706] 7d40: ffff800902067d60 ffff0000080e2c2c 0000000000000000 0000000001234567
[   37.178624] 7d60: ffff800902067d80 ffff0000080e2ee8 0000000000000000 ffff0000080e2df4
[   37.186544] 7d80: 0000000000000000 ffff0000080830f0 0000000000000000 00008008ff1c1000
[   37.194462] 7da0: ffffffffffffffff 0000ffffa7c4b1cc 0000000000000000 0000000000000024
[   37.202380] 7dc0: ffff800902067dd0 0000000000000005 0000fffff24743c8 0000000000000004
[   37.210299] 7de0: 0000fffff2475f03 0000000000000010 0000fffff2474418 0000000000000005
[   37.218218] 7e00: 0000fffff2474578 000000000000000a 0000aaaad6b722c0 0000000000000001
[   37.226136] 7e20: 0000000000000123 0000000000000038 ffff800902067e50 ffff0000081e7294
[   37.234055] 7e40: ffff800902067e60 ffff0000081e935c ffff800902067e60 ffff0000081e9388
[   37.241973] 7e60: ffff800902067eb0 ffff0000081ea388 0000000000000000 00008008ff1c1000
[   37.249892] 7e80: ffffffffffffffff 0000ffffa7c4a79c 0000000000000000 ffff000000020000
[   37.257810] 7ea0: 0000010000000004 0000000000000000 0000000000000000 ffff0000080830f0
[   37.265729] 7ec0: fffffffffee1dead 0000000028121969 0000000001234567 0000000000000000
[   37.273651] 7ee0: ffffffffffffffff 8080000000800000 0000800000008080 feffa9a9d4ff2d66
[   37.281567] 7f00: 000000000000008e feffa9a9d5b60e0f 7f7fffffffff7f7f 0101010101010101
[   37.289485] 7f20: 0000000000000010 0000000000000008 000000000000003a 0000ffffa7ccf588
[   37.297404] 7f40: 0000aaaad6b87d00 0000ffffa7c4b1b0 0000fffff2474be0 0000aaaad6b88000
[   37.305326] 7f60: 0000fffff2474fb0 0000000001234567 0000000000000000 0000000000000000
[   37.313240] 7f80: 0000000000000000 0000000000000001 0000aaaad6b70d4d 0000000000000000
[   37.321159] 7fa0: 0000000000000001 0000fffff2474ea0 0000aaaad6b5e2e0 0000fffff2474e80
[   37.329078] 7fc0: 0000ffffa7c4b1cc 0000000000000000 fffffffffee1dead 000000000000008e
[   37.336997] 7fe0: 0000000000000000 0000000000000000 9ce839cffee77eab fafdbf9f7ed57f2f
[   37.344911] Call trace:
[   37.347437] Exception stack(0xffff800902067b20 to 0xffff800902067c50)
[   37.353970] 7b20: ffff000008e5d4c8 0001000000000000 0000000080f82000 0000000000000000
[   37.361883] 7b40: ffff800902067b60 ffff000008e17000 ffff000008f44c68 00000001081081b4
[   37.369802] 7b60: ffff800902067bf0 ffff000008108478 0000000000000000 ffff000008c235b0
[   37.377721] 7b80: ffff800902067ce0 0000000000000000 0000000000000000 0000000000000015
[   37.385643] 7ba0: 0000000000000123 000000000000008e ffff000008992000 ffff800902068000
[   37.393557] 7bc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   37.401477] 7be0: 0000000000000000 ffff000008c97dd8 ffff000008cc6fb0 ffff000008cc6fb8
[   37.409396] 7c00: 6974726174736552 ffff800902067a50 0000000005f5e0ff ffff000008e70000
[   37.417318] 7c20: ffff000008e706c0 ffff000008f42bfd ffff000088f42bef 0000000000000006
[   37.425234] 7c40: 00000000deadbeef 0000ffffa7c27470
[   37.430190] [<          (null)>]           (null)
[   37.434982] [<ffff000008085334>] machine_restart+0x6c/0x70
[   37.440550] [<ffff0000080e2c2c>] kernel_restart+0x6c/0x78
[   37.446030] [<ffff0000080e2ee8>] SyS_reboot+0x130/0x228
[   37.451337] [<ffff0000080830f0>] el0_svc_naked+0x24/0x28
[   37.456737] Code: bad PC value
[   37.459891] ---[ end trace 76e2fc17e050aecd ]---

Signed-off-by: Julien Grall <julien.grall@arm.com>

--

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

The x86 code has theoritically a similar issue, altought EFI does not
seem to be the preferred method. I have only built test it on x86.

This should also probably be fixed in stable tree.

    Changes in v2:
        - Implement xen_efi_reset_system using xen_reboot
        - Move xen_efi_reset_system in drivers/xen/efi.c
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 12:06:50 +02:00
Julien Grall
fa12a870a9 arm/xen: Consolidate calls to shutdown hypercall in a single helper
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 12:05:32 +02:00
Julien Grall
5d9404e118 xen: Export xen_reboot
The helper xen_reboot will be called by the EFI code in a later patch.

Note that the ARM version does not yet exist and will be added in a
later patch too.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:50:06 +02:00
Boris Ostrovsky
f31b969217 xen/x86: Call xen_smp_intr_init_pv() on BSP
Recent code rework that split handling ov PV, HVM and PVH guests into
separate files missed calling xen_smp_intr_init_pv() on CPU0.

Add this call.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:18:13 +02:00
Boris Ostrovsky
84d582d236 xen: Revert commits da72ff5bfcb0 and 72a9b186292d
Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741)
established that commit 72a9b186292d ("xen: Remove event channel
notification through Xen PCI platform device") (and thus commit
da72ff5bfcb0 ("partially revert "xen: Remove event channel
notification through Xen PCI platform device"")) are unnecessary and,
in fact, prevent HVM guests from booting on Xen releases prior to 4.0

Therefore we revert both of those commits.

The summary of that discussion is below:

  Here is the brief summary of the current situation:

  Before the offending commit (72a9b186292):

  1) INTx does not work because of the reset_watches path.
  2) The reset_watches path is only taken if you have Xen > 4.0
  3) The Linux Kernel by default will use vector inject if the hypervisor
     support. So even INTx does not work no body running the kernel with
     Xen > 4.0 would notice. Unless he explicitly disabled this feature
     either in the kernel or in Xen (and this can only be disabled by
     modifying the code, not user-supported way to do it).

  After the offending commit (+ partial revert):

  1) INTx is no longer support for HVM (only for PV guests).
  2) Any HVM guest The kernel will not boot on Xen < 4.0 which does
     not have vector injection support. Since the only other mode
     supported is INTx which.

  So based on this summary, I think before commit (72a9b186292) we were
  in much better position from a user point of view.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:18:05 +02:00
Boris Ostrovsky
5f6a1614fa xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams()
e820 map is updated with information from the zeropage (i.e. pvh_bootparams)
by default_machine_specific_memory_setup(). With the way things are done
now,  we end up with a duplicated e820 map.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:17:39 +02:00
Geliang Tang
6483e3135a xen/scsifront: use offset_in_page() macro
Use offset_in_page() macro instead of open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:53 +02:00
Stefano Stabellini
d5ff5061c3 xen/arm,arm64: rename __generic_dma_ops to xen_get_dma_ops
Now that __generic_dma_ops is a xen specific function, rename it to
xen_get_dma_ops. Change all the call sites appropriately.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: linux@armlinux.org.uk
CC: catalin.marinas@arm.com
CC: will.deacon@arm.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: Julien Grall <julien.grall@arm.com>
2017-05-02 11:14:47 +02:00
Stefano Stabellini
e058632670 xen/arm,arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..."
The following commit:

  commit 815dd18788fe0d41899f51b91d0560279cf16b0d
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Fri Jan 20 13:04:04 2017 -0800

      treewide: Consolidate get_dma_ops() implementations

rearranges get_dma_ops in a way that xen_dma_ops are not returned when
running on Xen anymore, dev->dma_ops is returned instead (see
arch/arm/include/asm/dma-mapping.h:get_arch_dma_ops and
include/linux/dma-mapping.h:get_dma_ops).

Fix the problem by storing dev->dma_ops in dev_archdata, and setting
dev->dma_ops to xen_dma_ops. This way, xen_dma_ops is returned naturally
by get_dma_ops. The Xen code can retrieve the original dev->dma_ops from
dev_archdata when needed. It also allows us to remove __generic_dma_ops
from common headers.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>        [4.11+]
CC: linux@armlinux.org.uk
CC: catalin.marinas@arm.com
CC: will.deacon@arm.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: Julien Grall <julien.grall@arm.com>
2017-05-02 11:14:42 +02:00
Arnd Bergmann
4a80601622 xen/9pfs: select CONFIG_XEN_XENBUS_FRONTEND
All Xen frontends need to select this symbol to avoid a link error:

net/built-in.o: In function `p9_trans_xen_init':
:(.text+0x149e9c): undefined reference to `__xenbus_register_frontend'

Fixes: d4b40a02f837 ("xen/9pfs: build 9pfs Xen transport driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-05-02 11:14:36 +02:00
Juergen Gross
65f9d65443 x86/cpu: remove hypervisor specific set_cpu_features
There is no user of x86_hyper->set_cpu_features() any more. Remove it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:30 +02:00
Juergen Gross
d40342a2ac vmware: set cpu capabilities during platform initialization
There is no need to set the same capabilities for each cpu
individually. This can be done for all cpus in platform initialization.

Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Alok Kataria <akataria@vmware.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:24 +02:00
Juergen Gross
6807cf65f5 x86/xen: use capabilities instead of fake cpuid values for xsave
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the xsave feature availability is
indicated by special casing the related cpuid leaf.

Instead of delivering fake cpuid values set or clear the cpu
capability bits for xsave instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:17 +02:00
Juergen Gross
e657fccb79 x86/xen: use capabilities instead of fake cpuid values for x2apic
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the x2apic feature is indicated as not
being present by special casing the related cpuid leaf.

Instead of delivering fake cpuid values clear the cpu capability bit
for x2apic instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:11 +02:00
Juergen Gross
ea01598b4b x86/xen: use capabilities instead of fake cpuid values for mwait
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the mwait feature is indicated to be
present or not by special casing the related cpuid leaf.

Instead of delivering fake cpuid values use the cpu capability bit
for mwait instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:05 +02:00
Juergen Gross
b778d6bf63 x86/xen: use capabilities instead of fake cpuid values for acpi
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the acpi feature is indicated as not
being present by special casing the related cpuid leaf in case we
are not the initial domain.

Instead of delivering fake cpuid values clear the cpu capability bit
for acpi instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:00 +02:00
Juergen Gross
aa10715629 x86/xen: use capabilities instead of fake cpuid values for acc
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the acc feature (thermal monitoring)
is indicated as not being present by special casing the related
cpuid leaf.

Instead of delivering fake cpuid values clear the cpu capability bit
for acc instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:53 +02:00
Juergen Gross
88f3256f21 x86/xen: use capabilities instead of fake cpuid values for mtrr
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the mtrr feature is indicated as not
being present by special casing the related cpuid leaf.

Instead of delivering fake cpuid values clear the cpu capability bit
for mtrr instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:48 +02:00
Juergen Gross
fd9145fd27 x86/xen: use capabilities instead of fake cpuid values for aperf
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the aperf/mperf feature is indicated
as not being present by special casing the related cpuid leaf.

Instead of delivering fake cpuid values clear the cpu capability bit
for aperf/mperf instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:42 +02:00
Juergen Gross
3ee99df333 x86/xen: don't indicate DCA support in pv domains
Xen doesn't support DCA (direct cache access) for pv domains. Clear
the corresponding capability indicator.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:36 +02:00
Juergen Gross
0808e80cb7 xen: set cpu capabilities from xen_start_kernel()
There is no need to set the same capabilities for each cpu
individually. This can easily be done for all cpus when starting the
kernel.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:29 +02:00
Stefano Stabellini
31d47266c6 xen/9pfs: initialize len to 0 to detect xenbus_read errors
In order to use "len" to check for xenbus_read errors properly, we need
to initialize len to 0 before passing it to xenbus_read.

CC: dan.carpenter@oracle.com
CC: jgross@suse.com
CC: boris.ostrovsky@oracle.com
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:23 +02:00
Juergen Gross
29985b0961 xen,kdump: handle pv domain in paddr_vmcoreinfo_note()
For kdump to work correctly it needs the physical address of
vmcoreinfo_note. When running as dom0 this means the virtual address
has to be translated to the related machine address.

paddr_vmcoreinfo_note() is meant to do the translation via
__pa_symbol() only, but being attributed "weak" it can be replaced
easily in Xen case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Petr Tesarik <ptesarik@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:17 +02:00
Oleksandr Andrushchenko
8945c08e2b xen/displif: add ABI for para-virtual display
This is the ABI for the two halves of a para-virtualized
display driver.

This protocol aims to provide a unified protocol which fits more
sophisticated use-cases than a framebuffer device can handle. At the
moment basic functionality is supported with the intention to extend:
  o multiple dynamically allocated/destroyed framebuffers
  o buffers of arbitrary sizes
  o better configuration options including multiple display support

Note: existing fbif can be used together with displif running at the
same time, e.g. on Linux one provides framebuffer and another DRM/KMS

Future extensions to the existing protocol may include:
  o allow display/connector cloning
  o allow allocating objects other than display buffers
  o add planes/overlays support
  o support scaling
  o support rotation

Note, that this protocol doesn't use ring macros for
bi-directional exchange (PV calls/9pfs) bacause:
  o it statically defines the use of a single page
    for the ring buffer
  o it uses direct memory access to ring's contents
    w/o memory copying
  o re-uses the same idea that kbdif/fbif use
    which for this use-case seems to be appropriate

==================================================
Rationale for introducing this protocol instead of
using the existing fbif:
==================================================

1. In/out event sizes
  o fbif - 40 octets
  o displif - 40 octets
This is only the initial version of the displif protocol
which means that there could be requests which will not fit
(WRT introducing some GPU related functionality
later on). In that case we cannot alter fbif sizes as we need to
be backward compatible an will be forced to handle those
apart of fbif.

2. Shared page
Displif doesn't use anything like struct xenfb_page, but
DEFINE_RING_TYPES(xen_displif, struct xendispl_req, struct
xendispl_resp) which is a better and more common way.
Output events use a shared page which only has in_cons and in_prod
and all the rest is used for incoming events. Here struct xenfb_page
could probably be used as is despite the fact that it only has a half
of a page for incoming events which is only 50 events. (consider
something like 60Hz display)

3. Amount of changes.
fbif only provides XENFB_TYPE_UPDATE and XENFB_TYPE_RESIZE
events, so it looks like it is easier to get fb support into displif
than vice versa. displif at the moment has 6 requests and 1 event,
multiple connector support, etc.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:09 +02:00
Oleksandr Andrushchenko
2843531fb6 xen/sndif: add sound-device ABI
Add ABI for the two halves of a para-virtualized
sound driver to communicate with each other.

The ABI allows implementing audio playback and capture as
well as volume control and possibility to mute/unmute
audio sources.

Note: depending on the use-case backend can expose more sound
cards and PCM devices/streams than the underlying HW physically
has by employing SW mixers, configuring virtual sound streams,
channels etc. Thus, allowing fine tunned configurations per
frontend.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com>
Signed-off-by: Iurii Konovalenko <iurii.konovalenko@globallogic.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:13:03 +02:00
Oleksandr Andrushchenko
f9ebfc22cc xen/kbdif: add multi-touch support
Multi-touch fields re-use the page that is used by the other features
which means that you can interleave multi-touch, motion, and key
events.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:12:56 +02:00
Oleksandr Andrushchenko
8ec9dd0e69 xen/kbdif: update protocol description
The patch clarifies the protocol that is used by the PV keyboard
drivers.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:12:50 +02:00