Commit Graph

226 Commits

Author SHA1 Message Date
Heiko Carstens
58d4a785da s390/lib: use call_on_stack() macro
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08 22:12:18 +02:00
Heiko Carstens
a0ae5cd235 s390/lib,string: fix strcat() inline asm constraint modifier
"dummy" is not only used as output but also as input. Therefore use
the correct "+" constraint modifier.

Fixes: 8cf23c8e1f ("s390/lib,string: get rid of register asm")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Heiko Carstens
07f3a35df1 s390/lib,uaccess: fix copy_in_user_mvcos() inline asm clobber list
General register 0 is clobbered within the inline assembly and
therefore must be listed in the clobber list.

Fixes:  d1e18efa8f ("s390/lib,uaccess: get rid of register asm")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Heiko Carstens
8cf23c8e1f s390/lib,string: get rid of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-28 11:18:28 +02:00
Heiko Carstens
d1e18efa8f s390/lib,uaccess: get rid of register asm
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-28 11:18:28 +02:00
Heiko Carstens
7e86f967f4 s390/lib,xor: get rid of register asm
Looking at the generate code this was just a micro-optimization.
However given that as many register asm constructs as possible
will be removed from s390 code, remove this one as well.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-28 11:18:28 +02:00
Vasily Gorbik
5d8da6951e s390/test_unwind: print test suite start/end info
Add couple of additional info lines to make it easier to match
test suite output and results.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-15 17:47:42 +02:00
Vasily Gorbik
9d42a4d3e2 s390/test_unwind: add WARN if tests failed
Trigger a warning if any of unwinder tests fail. This should help to
prevent quiet ignoring of test results when panic_on_warn is enabled.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:46:44 +02:00
Vasily Gorbik
f169f42130 s390/test_unwind: unify error handling paths
Handle the case of "unwind state reliable but addr is 0" like other error
cases in this function and trigger output of failing stacktrace to aid
debugging.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:46:44 +02:00
Sven Schnelle
56e62a7370 s390: convert to generic entry
This patch converts s390 to use the generic entry infrastructure from
kernel/entry/*.

There are a few special things on s390:

- PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
  know about our PIF flags in exit_to_user_mode_loop().

- The old code had several ways to restart syscalls:

  a) PIF_SYSCALL_RESTART, which was only set during execve to force a
     restart after upgrading a process (usually qemu-kvm) to pgste page
     table extensions.

  b) PIF_SYSCALL, which is set by do_signal() to indicate that the
     current syscall should be restarted. This is changed so that
     do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
     PIF_SYSCALL doesn't work with the generic code, and changing it
     to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
     more unique.

- On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
executing a svc instruction on the process stack which causes a fault.
While handling that fault the fault code sets PIF_SYSCALL to hand over
processing to the syscall code on exit to usermode.

The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
a return value for a syscall. The s390x ptrace ABI uses r2 both for the
syscall number and return value, so ptrace cannot set the syscall number +
return value at the same time. The flag makes handling that a bit easier.
do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
is set.

CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
CR1/7/13 will be checked both on kernel entry and exit to contain the
correct asces.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19 12:29:26 +01:00
Heiko Carstens
e0d62dcb20 s390/delay: remove udelay_simple()
udelay_simple() callers can make use of the now simplified udelay()
implementation. No need to keep it.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
dd6cfe5532 s390/delay: simplify udelay
udelay is implemented by using quite subtle details to make it
possible to load an idle psw and waiting for an interrupt even in irq
context or when interrupts are disabled. Also handling (or better: no
handling) of softirqs is taken into account.

All this is done to optimize for something which should in normal
circumstances never happen: calling udelay to busy wait. Therefore get
rid of the whole complexity and just busy loop like other
architectures are doing it also.

It could have been possible to use diag 0x44 instead of cpu_relax() in
the busy loop, however we have seen too many bad things happen with
diag 0x44 that it seems to be better to simply busy loop.

Also note that with this new implementation kernel preemption does
work when within the udelay loop. This did not work before.

To get a feeling what the former code optimizes for: IPL'ing a kernel
with 'defconfig' and afterwards compiling a kernel ends with a total
of zero udelay calls.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
91c2bad6ae s390/test_unwind: use timer instead of udelay
Instead of registering an external interrupt handler and relying on
the udelay implementation, simply use a timer to get into irq context.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Heiko Carstens
f22b9c219a s390/test_unwind: fix CALL_ON_STACK tests
The CALL_ON_STACK tests use the no_dat stack to switch to a different
stack for unwinding tests. If an interrupt or machine check happens
while using that stack, and previously being on the async stack, the
interrupt / machine check entry code (SWITCH_ASYNC) will assume that
the previous context did not use the async stack and happily use the
async stack again.

This will lead to stack corruption of the previous context.

To solve this disable both interrupts and machine checks before
switching to the no_dat stack.

Fixes: 7868249fbb ("s390/test_unwind: add CALL_ON_STACK tests")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16 14:55:49 +01:00
Linus Torvalds
586592478b - Add support for the hugetlb_cma command line option to allocate gigantic
hugepages using CMA:
 
 - Add arch_get_random_long() support.
 
 - Add ap bus userspace notifications.
 
 - Increase default size of vmalloc area to 512GB and otherwise let it increase
   dynamically by the size of physical memory. This should fix all occurrences
   where the vmalloc area was not large enough.
 
 - Completely get rid of set_fs() (aka select SET_FS) and rework address space
   handling while doing that; making address space handling much more simple.
 
 - Reimplement getcpu vdso syscall in C.
 
 - Add support for extended SCLP responses (> 4k). This allows e.g. to handle
   also potential large system configurations.
 
 - Simplify KASAN by removing 3-level page table support and only supporting
   4-levels from now on.
 
 - Improve debug-ability of the kernel decompressor code, which now prints also
   stack traces and symbols in case of problems to the console.
 
 - Remove more power management leftovers.
 
 - Other various fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl/XQAIACgkQIg7DeRsp
 bsIdYA//TCtSTrka/yW03b4b0FuLtKNpKB5zQgaqtEurbgbZhXdZ7/L3N+KavPQH
 njmKAARxebRIJB0DoZ9w9XpSb+mI3Q5y8GMi5xvUzjtJj/c6ahi3cEXIpuDR0PBv
 bf4UYSUpvndOwVFVOEZLeaJwKciCYvdoOwjBCmoKz9orthNVdVh5vztVRE2dMkNl
 y9C/Pb3w4ZMYxrbETuYnxqzueCxUhVOJmwodkGdP6bxBeemOwKn2TLVZQCbGGe7y
 BZpG+xsTaLZV1dZUZuDSOzVi1CTzJBGaJuYy5ewddWfxi7+mxqwEg/4s6nGKAciX
 Fa3T6aqLpUmDDN842Ql9TZHrwR+GYrlAp3XaQETOusUuEQLvP1dKRj/RXiDXN3MZ
 L+Mfa56dbs9GkVaNN/N+L7Y4z/6tZ2caX4X2S22Cp/QzvRTrG4jXVTn0r4WIcY/2
 vn7fEy71LJ97CLQTDryyfJx7YNMdyIlUZY5ICAk1bt8nz1lB/IoZy0YoCBvPxIzb
 cEKcFTOdOtZR4WY3F8+kU0Nv1HQ8yPBzMaAqSNERvNQhMvoCChxntmyYxuVgH5iB
 SACADqEJKQ3hb4nMnxkeTrmmrhH4e0kdF9lAEytX+VYbjAq/6MY+qYo+QHDYkFWh
 BndxI54d6IiktDcKuBcpKJM7S/7N2t+EsLTS6Dhux7dbDZ2+Upw=
 =UR7j
 -----END PGP SIGNATURE-----

Merge tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Add support for the hugetlb_cma command line option to allocate
   gigantic hugepages using CMA

 - Add arch_get_random_long() support.

 - Add ap bus userspace notifications.

 - Increase default size of vmalloc area to 512GB and otherwise let it
   increase dynamically by the size of physical memory. This should fix
   all occurrences where the vmalloc area was not large enough.

 - Completely get rid of set_fs() (aka select SET_FS) and rework address
   space handling while doing that; making address space handling much
   more simple.

 - Reimplement getcpu vdso syscall in C.

 - Add support for extended SCLP responses (> 4k). This allows e.g. to
   handle also potential large system configurations.

 - Simplify KASAN by removing 3-level page table support and only
   supporting 4-levels from now on.

 - Improve debug-ability of the kernel decompressor code, which now
   prints also stack traces and symbols in case of problems to the
   console.

 - Remove more power management leftovers.

 - Other various fixes and improvements all over the place.

* tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (62 commits)
  s390/mm: add support to allocate gigantic hugepages using CMA
  s390/crypto: add arch_get_random_long() support
  s390/smp: perform initial CPU reset also for SMT siblings
  s390/mm: use invalid asce for user space when switching to init_mm
  s390/idle: fix accounting with machine checks
  s390/idle: add missing mt_cycles calculation
  s390/boot: add build-id to decompressor
  s390/kexec_file: fix diag308 subcode when loading crash kernel
  s390/cio: fix use-after-free in ccw_device_destroy_console
  s390/cio: remove pm support from ccw bus driver
  s390/cio: remove pm support from css-bus driver
  s390/cio: remove pm support from IO subchannel drivers
  s390/cio: remove pm support from chsc subchannel driver
  s390/vmur: remove unused pm related functions
  s390/tape: remove unsupported PM functions
  s390/cio: remove pm support from eadm-sch drivers
  s390: remove pm support from console drivers
  s390/dasd: remove unused pm related functions
  s390/zfcp: remove pm support from zfcp driver
  s390/ap: let bus_register() add the AP bus sysfs attributes
  ...
2020-12-14 16:22:26 -08:00
Heiko Carstens
b1cae1f84a s390: fix irq state tracing
With commit 58c644ba51 ("sched/idle: Fix arch_cpu_idle() vs
tracing") common code calls arch_cpu_idle() with a lockdep state that
tells irqs are on.

This doesn't work very well for s390: psw_idle() will enable interrupts
to wait for an interrupt. As soon as an interrupt occurs the interrupt
handler will verify if the old context was psw_idle(). If that is the
case the interrupt enablement bits in the old program status word will
be cleared.

A subsequent test in both the external as well as the io interrupt
handler checks if in the old context interrupts were enabled. Due to
the above patching of the old program status word it is assumed the
old context had interrupts disabled, and therefore a call to
TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
turn makes lockdep incorrectly "think" that interrupts are enabled
within the interrupt handler.

Fix this by unconditionally calling TRACE_IRQS_OFF when entering
interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
leaving interrupts handlers.

This leaves the special psw_idle() case, which now returns with
interrupts disabled, but has an "irqs on" lockdep state. So callers of
psw_idle() must adjust the state on their own, if required. This is
currently only __udelay_disabled().

Fixes: 58c644ba51 ("sched/idle: Fix arch_cpu_idle() vs tracing")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-02 18:17:50 +01:00
Heiko Carstens
062e527956 s390/mm: add debug user asce support
Verify on exit to user space that always
- the primary ASCE (cr1) is set to kernel ASCE
- the secondary ASCE (cr7) is set to user ASCE

If this is not the case: panic since something went terribly wrong.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:01:12 +01:00
Heiko Carstens
87d5986345 s390/mm: remove set_fs / rework address space handling
Remove set_fs support from s390. With doing this rework address space
handling and simplify it. As a result address spaces are now setup
like this:

CPU running in              | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
----------------------------|-----------|-----------|-----------
user space                  |  user     |  user     |  kernel
kernel, normal execution    |  kernel   |  user     |  kernel
kernel, kvm guest execution |  gmap     |  user     |  kernel

To achieve this the getcpu vdso syscall is removed in order to avoid
secondary address mode and a separate vdso address space in for user
space. The getcpu vdso syscall will be implemented differently with a
subsequent patch.

The kernel accesses user space always via secondary address space.
This happens in different ways:
- with mvcos in home space mode and directly read/write to secondary
  address space
- with mvcs/mvcp in primary space mode and copy from primary space to
  secondary space or vice versa
- with e.g. cs in secondary space mode and access secondary space

Switching translation modes happens with sacf before and after
instructions which access user space, like before.

Lazy handling of control register reloading is removed in the hope to
make everything simpler, but at the cost of making kernel entry and
exit a bit slower. That is: on kernel entry the primary asce is always
changed to contain the kernel asce, and on kernel exit the primary
asce is changed again so it contains the user asce.

In kernel mode there is only one exception to the primary asce: when
kvm guests are executed the primary asce contains the gmap asce (which
describes the guest address space). The primary asce is reset to
kernel asce whenever kvm guest execution is interrupted, so that this
doesn't has to be taken into account for any user space accesses.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23 12:01:12 +01:00
Vasily Gorbik
85cde0192a s390/udelay: make it work for the early code
Currently udelay relies on working EXT interrupts handler, which is not
the case during early startup. In such cases udelay_simple() has to be
used instead.

To avoid mistakes of calling udelay too early, which could happen from
the common code as well - make udelay work for the early code by
introducing static branch and redirecting all udelay calls to
udelay_simple until EXT interrupts handler is fully initialized and
async stack is allocated.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:20:58 +01:00
Julian Wiedmann
4aa32ee3c0 s390/lib: fix kernel doc for memcmp()
s/count/n

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-10-07 21:50:01 +02:00
Wang Hai
75d3e7f476 s390/test_unwind: fix possible memleak in test_unwind()
test_unwind() misses to call kfree(bt) in an error path.
Add the missed function call to fix it.

Fixes: 0610154650 ("s390/test_unwind: print verbose unwinding results")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-11 18:16:16 +02:00
Linus Torvalds
b79675e15a Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "No common topic whatsoever in those, sorry"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: define inode flags using bit numbers
  iov_iter: Move unnecessary inclusion of crypto/hash.h
  dlmfs: clean up dlmfs_file_{read,write}() a bit
2020-08-07 21:14:30 -07:00
Ilya Leoshkevich
73d6eb48d2 s390: enable HAVE_FUNCTION_ERROR_INJECTION
This kernel feature is required for enabling BPF_KPROBE_OVERRIDE.

Define override_function_with_return() and regs_set_return_value()
functions, and fix compile errors in syscall_wrapper.h.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27 10:33:28 +02:00
Herbert Xu
7999096fa9 iov_iter: Move unnecessary inclusion of crypto/hash.h
The header file linux/uio.h includes crypto/hash.h which pulls in
most of the Crypto API.  Since linux/uio.h is used throughout the
kernel this means that every tiny bit of change to the Crypto API
causes the entire kernel to get rebuilt.

This patch fixes this by moving it into lib/iov_iter.c instead
where it is actually used.

This patch also fixes the ifdef to use CRYPTO_HASH instead of just
CRYPTO which does not guarantee the existence of ahash.

Unfortunately a number of drivers were relying on linux/uio.h to
provide access to linux/slab.h.  This patch adds inclusions of
linux/slab.h as detected by build failures.

Also skbuff.h was relying on this to provide a declaration for
ahash_request.  This patch adds a forward declaration instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-30 09:34:23 -04:00
Linus Torvalds
23fc02e36e s390 updates for the 5.8 merge window
- Add support for multi-function devices in pci code.
 
 - Enable PF-VF linking for architectures using the
   pdev->no_vf_scan flag (currently just s390).
 
 - Add reipl from NVMe support.
 
 - Get rid of critical section cleanup in entry.S.
 
 - Refactor PNSO CHSC (perform network subchannel operation) in cio
   and qeth.
 
 - QDIO interrupts and error handling fixes and improvements, more
   refactoring changes.
 
 - Align ioremap() with generic code.
 
 - Accept requests without the prefetch bit set in vfio-ccw.
 
 - Enable path handling via two new regions in vfio-ccw.
 
 - Other small fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl7eVGcACgkQjYWKoQLX
 FBhweQgAkicvx31x230rdfG+jQkQkl0UqF99vvWrJHEll77SqadfjzKAGIjUB+K0
 EoeHVD5Wcj7BogDGcyHeQ0bZpu4WzE+y1nmnrsvu7TEEvcBmkJH0rF2jF+y0sb/O
 3qvwFkX/CB5OqaMzKC/AEeRpcCKR+ZUXkWu1irbYth7CBXaycD9EAPc4cj8CfYGZ
 r5njUdYOVk77TaO4aV+t5pCYc5TCRJaWXSsWaAv/nuLcIqsFBYOy2q+L47zITGXp
 utZVanIDjzx+ikpaKicOIfC3hJsRuNX9MnlZKsQFwpVEZAUZmIUm29XdhGJTWSxU
 RV7m1ORINbFP1nGAqWqkOvGo/LC0ZA==
 =VhXR
 -----END PGP SIGNATURE-----

Merge tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for multi-function devices in pci code.

 - Enable PF-VF linking for architectures using the pdev->no_vf_scan
   flag (currently just s390).

 - Add reipl from NVMe support.

 - Get rid of critical section cleanup in entry.S.

 - Refactor PNSO CHSC (perform network subchannel operation) in cio and
   qeth.

 - QDIO interrupts and error handling fixes and improvements, more
   refactoring changes.

 - Align ioremap() with generic code.

 - Accept requests without the prefetch bit set in vfio-ccw.

 - Enable path handling via two new regions in vfio-ccw.

 - Other small fixes and improvements all over the code.

* tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
  vfio-ccw: make vfio_ccw_regops variables declarations static
  vfio-ccw: Add trace for CRW event
  vfio-ccw: Wire up the CRW irq and CRW region
  vfio-ccw: Introduce a new CRW region
  vfio-ccw: Refactor IRQ handlers
  vfio-ccw: Introduce a new schib region
  vfio-ccw: Refactor the unregister of the async regions
  vfio-ccw: Register a chp_event callback for vfio-ccw
  vfio-ccw: Introduce new helper functions to free/destroy regions
  vfio-ccw: document possible errors
  vfio-ccw: Enable transparent CCW IPL from DASD
  s390/pci: Log new handle in clp_disable_fh()
  s390/cio, s390/qeth: cleanup PNSO CHSC
  s390/qdio: remove q->first_to_kick
  s390/qdio: fix up qdio_start_irq() kerneldoc
  s390: remove critical section cleanup from entry.S
  s390: add machine check SIGP
  s390/pci: ioremap() align with generic code
  s390/ap: introduce new ap function ap_get_qdev()
  Documentation/s390: Update / remove developerWorks web links
  ...
2020-06-08 12:05:31 -07:00
Sven Schnelle
0b0ed657fe s390: remove critical section cleanup from entry.S
The current code is rather complex and caused a lot of subtle
and hard to debug bugs in the past. Simplify the code by calling
the system_call handler with interrupts disabled, save
machine state, and re-enable them later.

This requires significant changes to the machine check handling code
as well. When the machine check interrupt arrived while being in kernel
mode the new code will signal pending machine checks with a SIGP external
call. When userspace was interrupted, the handler will switch to the
kernel stack and directly execute s390_handle_mcck().

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-28 12:21:54 +02:00
Christian Borntraeger
316ec15481 s390/mm: fix page table upgrade vs 2ndary address mode accesses
A page table upgrade in a kernel section that uses secondary address
mode will mess up the kernel instructions as follows:

Consider the following scenario: two threads are sharing memory.
On CPU1 thread 1 does e.g. strnlen_user().  That gets to
        old_fs = enable_sacf_uaccess();
        len = strnlen_user_srst(src, size);
and
                "   la    %2,0(%1)\n"
                "   la    %3,0(%0,%1)\n"
                "   slgr  %0,%0\n"
                "   sacf  256\n"
                "0: srst  %3,%2\n"
in strnlen_user_srst().  At that point we are in secondary space mode,
control register 1 points to kernel page table and instruction fetching
happens via c1, rather than usual c13.  Interrupts are not disabled, for
obvious reasons.

On CPU2 thread 2 does MAP_FIXED mmap(), forcing the upgrade of page table
from 3-level to e.g. 4-level one.  We'd allocated new top-level table,
set it up and now we hit this:
                notify = 1;
                spin_unlock_bh(&mm->page_table_lock);
        }
        if (notify)
                on_each_cpu(__crst_table_upgrade, mm, 0);
OK, we need to actually change over to use of new page table and we
need that to happen in all threads that are currently running.  Which
happens to include the thread 1.  IPI is delivered and we have
static void __crst_table_upgrade(void *arg)
{
        struct mm_struct *mm = arg;

        if (current->active_mm == mm)
                set_user_asce(mm);
        __tlb_flush_local();
}
run on CPU1.  That does
static inline void set_user_asce(struct mm_struct *mm)
{
        S390_lowcore.user_asce = mm->context.asce;
OK, user page table address updated...
        __ctl_load(S390_lowcore.user_asce, 1, 1);
... and control register 1 set to it.
        clear_cpu_flag(CIF_ASCE_PRIMARY);
}

IPI is run in home space mode, so it's fine - insns are fetched
using c13, which always points to kernel page table.  But as soon
as we return from the interrupt, previous PSW is restored, putting
CPU1 back into secondary space mode, at which point we no longer
get the kernel instructions from the kernel mapping.

The fix is to only fixup the control registers that are currently in use
for user processes during the page table update.  We must also disable
interrupts in enable_sacf_uaccess to synchronize the cr and
thread.mm_segment updates against the on_each-cpu.

Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # 4.15+
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
References: CVE-2020-11884
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-21 15:16:43 +02:00
Colin Ian King
7e914fd17e s390/test_unwind: fix spelling mistake "reqister" -> "register"
There is a spelling mistake in a pr_info message. Fix it.

Link: https://lkml.kernel.org/r/20191202090215.28766-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11 19:53:23 +01:00
Vasily Gorbik
b62b6cf170 s390/spinlock: remove confusing comment in arch_spin_lock_wait
arch_spin_lock_wait does not take steal time into consideration.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11 19:53:23 +01:00
Vasily Gorbik
de6921ccbd s390/test_unwind: add program check context tests
Add unwinding from program check handler tests. Unwinder should be able
to unwind through pt_regs stored by program check handler on task stack.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Vasily Gorbik
e7409367ab s390/test_unwind: add irq context tests
Add unwinding from irq context tests. Unwinder should be able to unwind
through irq stack to task stack up to task pt_regs.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Vasily Gorbik
0610154650 s390/test_unwind: print verbose unwinding results
Add stack name, sp and reliable information into test unwinding
results. Also consider ip outside of kernel text as failure if the
state is reported reliable.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Vasily Gorbik
7868249fbb s390/test_unwind: add CALL_ON_STACK tests
Add CALL_ON_STACK helper testing. Tests make sure that we can unwind from
switched stack to original one up to task pt_regs (nodat -> task stack).

UWM_SWITCH_STACK could not be used together with UWM_THREAD because
get_stack_info explicitly restricts unwinding to task stack if
task != current.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:47 +01:00
Vasily Gorbik
f44fa79b10 s390/test_unwind: require that unwinding ended successfully
Currently unwinder test passes if unwinding results contain unwindme_func2
and unwindme_func1 functions.
Now that unwinder reports success upon reaching task pt_regs, check
that unwinding ended successfully in every test.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:46 +01:00
Ilya Leoshkevich
badbf39790 s390/unwind: add a test for the internal API
unwind_for_each_frame can take at least 8 different sets of parameters.
Add a test to make sure they all are handled in a sane way.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Co-developed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:46 +01:00
Heiko Carstens
cceb018377 s390/alternatives: make use of asm_inline
This is the s390 version of commit 40576e5e63 ("x86: alternative.h:
use asm_inline for all alternative variants").

See commit eb11186930 ("compiler-types.h: add asm_inline
definition") for more details.

With this change the compiler will not generate many out-of-line
versions for the three instruction sized arch_spin_unlock() function
anymore. Due to this gcc seems to change a lot of other inline
decisions which results in a net 6k text size growth according to
bloat-o-meter (gcc 9.2 with defconfig).
But that's still better than having many out-of-line versions of
arch_spin_unlock().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:20:51 +01:00
Linus Torvalds
d590284419 s390 updates for the 5.4 merge window
- Add support for IBM z15 machines.
 
 - Add SHA3 and CCA AES cipher key support in zcrypt and pkey refactoring.
 
 - Move to arch_stack_walk infrastructure for the stack unwinder.
 
 - Various kasan fixes and improvements.
 
 - Various command line parsing fixes.
 
 - Improve decompressor phase debuggability.
 
 - Lift no bss usage restriction for the early code.
 
 - Use refcount_t for reference counters for couple of places in
   mm code.
 
 - Logging improvements and return code fix in vfio-ccw code.
 
 - Couple of zpci fixes and minor refactoring.
 
 - Remove some outdated documentation.
 
 - Fix secure boot detection.
 
 - Other various minor code clean ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl1/pRoACgkQjYWKoQLX
 FBjxLQf/Y1nlmoc8URLqaqfNTczIvUzdfXuahI7L75RoIIiqHtcHBrVwauSr7Lma
 XVRzK/+6q0UPISrOIZEEtQKsMMM7rGuUv/+XTyrOB/Tsc31kN2EIRXltfXI/lkb8
 BZdgch4Xs2rOD7y6TvqpYJsXYXsnLMWwCk8V+48V/pok4sEgMDgh0bTQRHPHYmZ6
 1cv8ZQ0AeuVxC6ChM30LhajGRPkYd8RQ82K7fU7jxT0Tjzu66SyrW3pTwA5empBD
 RI2yBZJ8EXwJyTCpvN8NKiBgihDs9oUZl61Dyq3j64Mb1OuNUhxXA/8jmtnGn0ok
 O9vtImCWzExhjSMkvotuhHEC05nEEQ==
 =LCgE
 -----END PGP SIGNATURE-----

Merge tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for IBM z15 machines.

 - Add SHA3 and CCA AES cipher key support in zcrypt and pkey
   refactoring.

 - Move to arch_stack_walk infrastructure for the stack unwinder.

 - Various kasan fixes and improvements.

 - Various command line parsing fixes.

 - Improve decompressor phase debuggability.

 - Lift no bss usage restriction for the early code.

 - Use refcount_t for reference counters for couple of places in mm
   code.

 - Logging improvements and return code fix in vfio-ccw code.

 - Couple of zpci fixes and minor refactoring.

 - Remove some outdated documentation.

 - Fix secure boot detection.

 - Other various minor code clean ups.

* tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
  s390: remove pointless drivers-y in drivers/s390/Makefile
  s390/cpum_sf: Fix line length and format string
  s390/pci: fix MSI message data
  s390: add support for IBM z15 machines
  s390/crypto: Support for SHA3 via CPACF (MSA6)
  s390/startup: add pgm check info printing
  s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
  vfio-ccw: fix error return code in vfio_ccw_sch_init()
  s390: vfio-ap: fix warning reset not completed
  s390/base: remove unused s390_base_mcck_handler
  s390/sclp: Fix bit checked for has_sipl
  s390/zcrypt: fix wrong handling of cca cipher keygenflags
  s390/kasan: add kdump support
  s390/setup: avoid using strncmp with hardcoded length
  s390/sclp: avoid using strncmp with hardcoded length
  s390/module: avoid using strncmp with hardcoded length
  s390/pci: avoid using strncmp with hardcoded length
  s390/kaslr: reserve memory for kasan usage
  s390/mem_detect: provide single get_mem_detect_end
  s390/cmma: reuse kstrtobool for option value parsing
  ...
2019-09-17 14:04:43 -07:00
Vasily Gorbik
2e83e0eb85 s390: clean .bss before running uncompressed kernel
Clean uncompressed kernel .bss section in the startup code before
the uncompressed kernel is executed. At this point of time initrd and
certificates have been already rescued. Uncompressed kernel .bss size
is known from vmlinux_info. It is also taken into consideration during
uncompressed kernel positioning by kaslr (so it is safe to clean it).

With that uncompressed kernel is starting with .bss section zeroed and
no .bss section usage restrictions apply. Which makes chkbss checks for
uncompressed kernel objects obsolete and they can be removed.

early_nobss.c is also not needed anymore. Parts of it which are still
relevant are moved to early.c. Kasan initialization code is now called
directly from head64 (early.c is instrumented and should not be
executed before kasan shadow memory is set up).

Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21 12:58:52 +02:00
Vasily Gorbik
d25220d2f2 s390/lib: add missing include
Include <asm/xor.h> into arch/s390/lib/xor.c to expose xor_block_xc
declaration and avoid the following sparse warning:
arch/s390/lib/xor.c:128:27: warning: symbol 'xor_block_xc' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-29 18:05:02 +02:00
Heiko Carstens
67626fadd2 s390: enforce CONFIG_SMP
There never have been distributions that shiped with CONFIG_SMP=n for
s390. In addition the kernel currently doesn't even compile with
CONFIG_SMP=n for s390. Most likely it wouldn't even work, even if we
fix the compile error, since nobody tests it, since there is no use
case that I can think of.
Therefore simply enforce CONFIG_SMP and get rid of some more or
less unused code.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-06-07 10:09:37 +02:00
Martin Schwidefsky
26a374ae7a s390: add missing ENDPROC statements to assembler functions
The assembler code in arch/s390 misses proper ENDPROC statements
to properly end functions in a few places. Add them.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02 13:54:11 +02:00
Vasily Gorbik
7e0d92f002 s390/kasan: improve string/memory functions checks
Avoid using arch specific implementations of string/memory functions
with KASAN since gcc cannot instrument asm code memory accesses and
many bugs could be missed.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-01-18 09:34:18 +01:00
Vasily Gorbik
b6cbe3e8bd s390/kasan: avoid user access code instrumentation
Kasan instrumentation adds "store" check for variables marked as
modified by inline assembly. With user pointers containing addresses
from another address space this produces false positives.

static inline unsigned long clear_user_xc(void __user *to, ...)
{
	asm volatile(
	...
	: "+a" (to) ...

User space access functions are wrapped by manually instrumented
functions in kasan common code, which should be sufficient to catch
errors. So, we just disable uaccess.o instrumentation altogether.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:21 +02:00
Vasily Gorbik
fb594ec13e s390/kasan: replace some memory functions
Follow the common kasan approach:

    "KASan replaces memory functions with manually instrumented
    variants.  Original functions declared as weak symbols so strong
    definitions in mm/kasan/kasan.c could replace them. Original
    functions have aliases with '__' prefix in name, so we could call
    non-instrumented variant if needed."

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:18 +02:00
Martin Schwidefsky
5eda25b102 s390/lib: use expoline for all bcr instructions
The memove, memset, memcpy, __memset16, __memset32 and __memset64
function have an additional indirect return branch in form of a
"bzr" instruction. These need to use expolines as well.

Cc: <stable@vger.kernel.org> # v4.17+
Fixes: 97489e0663 ("s390/lib: use expoline for indirect branches")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-07 13:38:13 +02:00
Vasily Gorbik
0391fcb5e1 s390: introduce compile time check for empty .bss section
Introduce compile time check for files which should avoid using .bss
section, because of the following reasons:
- .bss section has not been zeroed yet,
- initrd has not been moved to a safe location and could be overlapping
with .bss section.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-05-09 10:55:01 +02:00
Martin Schwidefsky
97489e0663 s390/lib: use expoline for indirect branches
The return from the memmove, memset, memcpy, __memset16, __memset32 and
__memset64 functions are done with "br %r14". These are indirect branches
as well and need to use execute trampolines for CONFIG_EXPOLINE=y.

Cc: stable@vger.kernel.org # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-05-07 09:07:36 +02:00
Martin Schwidefsky
9f37e79754 s390: fix preemption race in disable_sacf_uaccess
With CONFIG_PREEMPT=y there is a possible race in disable_sacf_uaccess.

The new set_fs value needs to be stored the the task structure first,
the control register update needs to be second. Otherwise a preemptive
schedule may interrupt the code right after the control register update
has been done and the next time the task is scheduled we get an incorrect
value in the control register due to the old set_fs setting.

Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-12-15 15:05:21 +01:00
Heiko Carstens
78ca4fe3bb s390/spinlock: fix indentation
checkpatch:
    WARNING: Statements should start on a tabstop
    #9499: FILE: arch/s390/lib/spinlock.c:231:
    +                          return;

sparse:
arch/s390/lib/spinlock.c:81 arch_load_niai4()
    warn: inconsistent indenting

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 22:07:58 +01:00
Martin Schwidefsky
0aaba41b58 s390: remove all code using the access register mode
The vdso code for the getcpu() and the clock_gettime() call use the access
register mode to access the per-CPU vdso data page with the current code.

An alternative to the complicated AR mode is to use the secondary space
mode. This makes the vdso faster and quite a bit simpler. The downside is
that the uaccess code has to be changed quite a bit.

Which instructions are used depends on the machine and what kind of uaccess
operation is requested. The instruction dictates which ASCE value needs
to be loaded into %cr1 and %cr7.

The different cases:

* User copy with MVCOS for z10 and newer machines
  The MVCOS instruction can copy between the primary space (aka user) and
  the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
  ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
  loaded in %cr1.

* User copy with MVCP/MVCS for older machines
  To be able to execute the MVCP/MVCS instructions the kernel needs to
  switch to primary mode. The control register %cr1 has to be set to the
  kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
  on set_fs(KERNEL_DS) vs set_fs(USER_DS).

* Data access in the user address space for strnlen / futex
  To use "normal" instruction with data from the user address space the
  secondary space mode is used. The kernel needs to switch to primary mode,
  %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
  kernel ASCE, dependent on set_fs.

To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
tries to be lazy about it. E.g. for multiple user copies in a row with
MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
done only once. On return to user space a CPU bit is checked that loads the
vdso ASCE again.

To enable and disable the data access via the secondary space two new
functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
that a context is in secondary space uaccess mode is stored in the
mm_segment_t value for the task. The code of an interrupt may use set_fs
as long as it returns to the previous state it got with get_fs with another
call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
set_fs with the current mm_segment_t value for the task.

For CPUs with MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode, lazy    |  user     |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

For CPUs without MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode lazy     |  kernel   |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

The lines with "lazy" refer to the state after a copy via the secondary
space with a delayed reload of %cr1 and %cr7.

There are three hardware address spaces that can cause a DAT exception,
primary, secondary and home space. The exception can be related to
four different fault types: user space fault, vdso fault, kernel fault,
and the gmap faults.

Dependent on the set_fs state and normal vs. sacf mode there are a number
of fault combinations:

1) user address space fault via the primary ASCE
2) gmap address space fault via the primary ASCE
3) kernel address space fault via the primary ASCE for machines with
   MVCOS and set_fs(KERNEL_DS)
4) vdso address space faults via the secondary ASCE with an invalid
   address while running in secondary space in problem state
5) user address space fault via the secondary ASCE for user-copy
   based on the secondary space mode, e.g. futex_ops or strnlen_user
6) kernel address space fault via the secondary ASCE for user-copy
   with secondary space mode with set_fs(KERNEL_DS)
7) kernel address space fault via the primary ASCE for user-copy
   with secondary space mode with set_fs(USER_DS) on machines without
   MVCOS.
8) kernel address space fault via the home space ASCE

Replace user_space_fault() with a new function get_fault_type() that
can distinguish all four different fault types.

With these changes the futex atomic ops from the kernel and the
strnlen_user will get a little bit slower, as well as the old style
uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
fast as before. On the positive side, the user space vdso code is a
lot faster and Linux ceases to use the complicated AR mode.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 11:01:47 +01:00