Commit Graph

39940 Commits

Author SHA1 Message Date
Masahiro Yamada
7ce7e984ab kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}
GZIP-compressed files end with 4 byte data that represents the size
of the original input. The decompressors (the self-extracting kernel)
exploit it to know the vmlinux size beforehand. To mimic the GZIP's
trailer, Kbuild provides cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}.
Unfortunately these macros are used everywhere despite the appended
size data is only useful for the decompressors.

There is no guarantee that such hand-crafted trailers are safely ignored.
In fact, the kernel refuses compressed initramdfs with the garbage data.
That is why usr/Makefile overrides size_append to make it no-op.

To limit the use of such broken compressed files, this commit renames
the existing macros as follows:

  cmd_bzip2   --> cmd_bzip2_with_size
  cmd_lzma    --> cmd_lzma_with_size
  cmd_lzo     --> cmd_lzo_with_size
  cmd_lz4     --> cmd_lz4_with_size
  cmd_xzkern  --> cmd_xzkern_with_size
  cmd_zstd22  --> cmd_zstd22_with_size

To keep the decompressors working, I updated the following Makefiles
accordingly:

  arch/arm/boot/compressed/Makefile
  arch/h8300/boot/compressed/Makefile
  arch/mips/boot/compressed/Makefile
  arch/parisc/boot/compressed/Makefile
  arch/s390/boot/compressed/Makefile
  arch/sh/boot/compressed/Makefile
  arch/x86/boot/compressed/Makefile

I reused the current macro names for the normal usecases; they produce
the compressed data in the proper format.

I did not touch the following:

  arch/arc/boot/Makefile
  arch/arm64/boot/Makefile
  arch/csky/boot/Makefile
  arch/mips/boot/Makefile
  arch/riscv/boot/Makefile
  arch/sh/boot/Makefile
  kernel/Makefile

This means those Makefiles will stop appending the size data.

I dropped the 'override size_append' hack from usr/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2022-01-14 02:54:05 +09:00
Linus Torvalds
9557e60b8c A single fix for a missing __init annotation of prepare_command_line().
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGjr4UTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXEXEAC79s58FkEV5LsKnmNe5pr/XG47Fgbz
 MMTX+P4IUwUrvRHPzEKPdO4lR8ZSFefhSHcPA06cWYyNyeh/UHq/sB4JGysuH7bw
 JAIJhJ3PsLv19TWvIKN3WHB7R3gwqKYzzzsKjKJHfepHv7kzidYz1fu380/bA2Zw
 LDjlMaQFSIA4rk7sBB/FFC3wsCvmU69iwG82x+QXX3ZWk0eB8bmw4Vsz6tsjo1D2
 EtEyxmjh2HjI+AbTORUsqucWvoKq3PvA9CX4SJdT/u+/ZX1zYR6iwsBkB+oiDNFx
 mxN15QKUA0t5NmF3dtoftj0lLM7iT7rd9PAdyPbWDm1o02z7Q780tsv7S8A2njMp
 ACwvCkupE9LSmEpFH9YyQeCBtyppXd0laN0mcL3h1MmSQbZs3KrtqyQDJ6McjPwL
 TiDFa8PKCBejGms7P6gPBv0DCsNNBQ0alAioi+zVsn8WbnMKGBpb0nplAdn1Cxt0
 GuJR63oEpX8Jtejb/018cEdDBiEmGh+7HXVXy4giAKnn92Pd7AhpWXwUYGuoD7j3
 1nRI9TOEoWS9ZcK7goaVp9D42YCEPsn66RPGyg25EaTlWJ0g9tmpbXhGzFPYg8FS
 tGIBw7CwlGMIKng+MWwuGu9hJYk0Tc1NcvrkxYTSVFNDYo4AQDpqwLlGuJsO893c
 8tFApEuQ+w5zkg==
 =QG5U
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 build fix from Thomas Gleixner:
 "A single fix for a missing __init annotation of prepare_command_line()"

* tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Mark prepare_command_line() __init
2021-11-28 09:24:50 -08:00
Linus Torvalds
0757ca01d9 IOMMU Fixes for Linux v5.16-rc2:
Including:
 
   - Intel VT-d fixes:
     - Remove unused PASID_DISABLED
     - Fix RCU locking
     - Fix for the unmap_pages call-back
 
   - Rockchip RK3568 address mask fix
 
   - AMD IOMMUv2 log message clarification
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmGjhDgACgkQK/BELZcB
 GuMOiQ/+MPfnGSpKtEdID/p9d7yo97M/WKRhx5aT27hfW4gyyLtAO22vUTfZ/obK
 Mmwl/mthMm55BuzOes8/ka7zcrkaluJitFGWVLN6dzXZRTZc2nMdYQb1Y25IZVEP
 AwTDdfi8btPRYRrZ1vpXZtDzF5purGe0a5P0psUox7roSq+uWZcXOy3WZDbPNA9b
 pAwPTacnxbeSrElOF6rUyv5eXilvMDMG/1lF4/gFR89xAYcDIPpWNLuRWNYWxu2M
 qTbGBGXnSs/iIzRmBs5rhqbR5VQAb8tXQjjMBb5wWx9i6gaw217wHZPQrpS6ej3Z
 E8vpD/z/kxsMLBpiNM34an/3krgzm6alhA6dalZsvGdAvQHWj4LGma72lCBYsuM1
 +Lre4MvMJ1kvONH9aWoMJWZEITnL2pS3Afu72vhg+7ank4xI/5Ej3bO7mYfN2MV+
 XeEgIUv6HJioJ8ITwYyApJskDk0CLGvfmyHADwe+rj4A+YYzOgEv6kGQlZWzaNWa
 ISDibSMa4od4rw63CLS+rq4vK13MhfyerLZl9IMpvqcUKuWrb3SCiGjTLg5iWkL0
 eqoIIvMuUavmTNG6HMOwfUMNN963osLJPXjwMYkzWbbpusLqb38KvLCed3xGS8tb
 KwhFggAF9Qt++bjo9V8zk5FdAovW0n4m5Nu6WDFcMwPCxccy3xc=
 =+B76
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d fixes:
     - Remove unused PASID_DISABLED
     - Fix RCU locking
     - Fix for the unmap_pages call-back

 - Rockchip RK3568 address mask fix

 - AMD IOMMUv2 log message clarification

* tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix unmap_pages support
  iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock()
  iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568
  iommu/amd: Clarify AMD IOMMUv2 initialization messages
  iommu/vt-d: Remove unused PASID_DISABLED
2021-11-28 07:17:38 -08:00
Joerg Roedel
21e96a2035 iommu/vt-d: Remove unused PASID_DISABLED
The macro is unused after commit 00ecd54013 so it can be removed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 00ecd54013 ("iommu/vt-d: Clean up unused PASID updating functions")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
2021-11-26 22:54:20 +01:00
Linus Torvalds
6b54698aec xen: branch for v5.16-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYaD8mwAKCRCAXGG7T9hj
 vspAAPwLA5SUorji33PTetwmcpLcoRJ3Q4HAPz+bOPdm9iL/PgD/V8MtxFrFebBs
 AJoa+GmBarUNn7XCqKnCcA64iXhrpQw=
 =6GY3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - Kconfig fix to make it possible to control building of the privcmd
   driver

 - three fixes for issues identified by the kernel test robot

 - a five-patch series to simplify timeout handling for Xen PV driver
   initialization

 - two patches to fix error paths in xenstore/xenbus driver
   initialization

* tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: make HYPERVISOR_set_debugreg() always_inline
  xen: make HYPERVISOR_get_debugreg() always_inline
  xen: detect uninitialized xenbus in xenbus_init
  xen: flag xen_snd_front to be not essential for system boot
  xen: flag pvcalls-front to be not essential for system boot
  xen: flag hvc_xen to be not essential for system boot
  xen: flag xen_drm_front to be not essential for system boot
  xen: add "not_essential" flag to struct xenbus_driver
  xen/pvh: add missing prototype to header
  xen: don't continue xenstore initialization in case of errors
  xen/privcmd: make option visible in Kconfig
2021-11-26 09:54:13 -08:00
Juergen Gross
00db58cf21 xen: make HYPERVISOR_set_debugreg() always_inline
HYPERVISOR_set_debugreg() is being called from noinstr code, so it
should be attributed "always_inline".

Fixes: 7361fac046 ("x86/xen: Make set_debugreg() noinstr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211125092056.24758-3-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-25 09:25:39 -06:00
Juergen Gross
b1c45ad53e xen: make HYPERVISOR_get_debugreg() always_inline
HYPERVISOR_get_debugreg() is being called from noinstr code, so it
should be attributed "always_inline".

Fixes: f4afb713e5 ("x86/xen: Make get_debugreg() noinstr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211125092056.24758-2-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-25 09:25:07 -06:00
Borislav Petkov
c0f2077baa x86/boot: Mark prepare_command_line() __init
Fix:

  WARNING: modpost: vmlinux.o(.text.unlikely+0x64d0): Section mismatch in reference \
   from the function prepare_command_line() to the variable .init.data:command_line
  The function prepare_command_line() references
  the variable __initdata command_line.
  This is often because prepare_command_line lacks a __initdata
  annotation or the annotation of command_line is wrong.

Apparently some toolchains do different inlining decisions.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YZySgpmBcNNM2qca@zn.tnic
2021-11-24 12:20:24 +01:00
Linus Torvalds
40c93d7fff Two X86 fixes:
- Move the command line preparation and the early command line parsing
    earlier so that the command line parameters which affect
    early_reserve_memory(), e.g. efi=nosftreserve, are taken into
    account. This was broken when the invocation of early_reserve_memory()
    was moved recently.
 
  - Use an atomic type for the SGX page accounting, which is read and
    written lockless, to plug various race conditions related to it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGaYPoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTM7D/9bivpPDiNzfjUV7kNKx6aTUwPjdFer
 G0RuuDZqkpJm9j7+51VnQNFssIfAFtzKMJn/DuGVoXF0ERxXMEhJVHiSTeOlCjJU
 u1760qFYlAQ1mwvKVNLk2SenWuNZwwgUneY3VvvS4qYsSq7PsbYlekuddPeX0Nws
 AJ1llOoCoBkm5vNZ5c3/CmhY6iPSRQQkDmbA11cZZUyWl2uouSk21+ax24IDCvW3
 E8Aq9QqB6ND2uukB32kQ7Wp7/UZ4inJHTUXF9UF/8P+N1ftDWeKDjQz6y9U19Tsd
 ivuMr6NqqAos/Fpo9PhlGns07C8HeKGf4ronnt9cUMqjzYWfdS+pRT+0pQR+vIPa
 M8+jyHQplzeOX9/nKOkpV+u0tYP2zgx8e7yeu5Sion8TqsKqiNOy9+D0D2utUDmw
 1x3DzuzGx/mK2OX5gjGSx4ZbS4u0DIAWnF8vB9YfgEfcnqpxr6KdbrY0bLatIbKv
 ip9mh0rRYeTkTZ4FGmvy3hFgAmadCODWxva/7AhzbWVZoM+AShwnTDsipkRaaj3V
 nMdgcVix8qVDg9YIAn9ziZbxkXKQUXFJn7lZj3KBeWjKcV2svA89S/9YL6JTaSeW
 TJ4X6wK8EoApKhEasZhufXBNAl9EmQlBS1k9pHiIjKVuRgBGlzMuEhzvrqZM2+rA
 KaUQSwBN6Ij6Dg==
 =CJUK
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Move the command line preparation and the early command line parsing
   earlier so that the command line parameters which affect
   early_reserve_memory(), e.g. efi=nosftreserve, are taken into
   account. This was broken when the invocation of
   early_reserve_memory() was moved recently.

 - Use an atomic type for the SGX page accounting, which is read and
   written locklessly, to plug various race conditions related to it.

* tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Fix free page accounting
  x86/boot: Pull up cmdline preparation and early param parsing
2021-11-21 11:25:19 -08:00
Linus Torvalds
af16bdeae8 A set of perf/X86 fixes:
- Remove unneded PEBS disabling when taking LBR snapshots to prevent an
    unchecked MSR access error.
 
  - Fix IIO event constraints for Snowridge and Skylake server chips.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGaX44THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoarZEACxCahzh+3P3BD/yYs2R4qEWpiFf0ei
 KpoCSL/Xh2+5c+N0Z5qxMKs4//Vkab6GYdDlbQnfcfQPFgxFs9UI0kqBoXFWaqK6
 u12JJW8F3iBxV4UTwduCflve9x5iZ5B7OmCRrTmJm8GJwQqTIFmjx8kDsDbghYvu
 rURM3L7mln/Qx7xcjvhZehXrDupwPvak1tw1SbxPyNjz1dNAN8A7G9xVFDddlzDB
 AuYMEadipn5QsbQD224rwUPMj04jnby+421phLmaaPiduZx2Hi7QjJGEH718R7IQ
 IGA1OeSXJXIsPimUW3UJZEw5OGMG+UY7/raHgk8LnxUCqQoeIjU8vpt6HR7EAD0b
 0LuUvJ1ispVT8dY+7DdzcbuW+Zp4TPUQNlG/bBlsmuduuSkvUiDeMtknJbJ8xE0s
 xFbMAgcwlaylSmtwNGCgM/P1KWbLwZQXS6IP8Iy4bnfwEueTeeHwaEtssrfrhW/z
 9OCVgUIkO2LbAFMmlQK4tfFprmR2oJSDpohJ0e5QZMMyrEefSMbY2U47omnB4bln
 HDZ2Q+ZKI0G43ECyI2TZXJg77SS/cmJxCcgXx8iQGZTDn38iPDPQJWWWWWDYTz0C
 ERVEGvK7jc+9Pu54iWAaSQxGmZQUWHfETt0QuKqx+1Kgl2NmBfeJkLkss4dL/eW4
 zH8qAjblfH7a3g==
 =PeA1
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Thomas Gleixner:

 - Remove unneded PEBS disabling when taking LBR snapshots to prevent an
   unchecked MSR access error.

 - Fix IIO event constraints for Snowridge and Skylake server chips.

* tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/perf: Fix snapshot_branch_stack warning in VM
  perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
  perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
  perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
2021-11-21 11:17:50 -08:00
Linus Torvalds
6b38e2fb70 s390 updates for 5.16-rc2
- Add missing Kconfig option for ftrace direct multi sample, so it can
   be compiled again, and also add s390 support for this sample.
 
 - Update Christian Borntraeger's email address.
 
 - Various fixes for memory layout setup. Besides other this makes it
   possible to load shared DCSS segments again.
 
 - Fix copy to user space of swapped kdump oldmem.
 
 - Remove -mstack-guard and -mstack-size compile options when building
   vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled
   and results in broken vdso code which causes more or less random
   exceptions. Also remove the not needed -nostdlib option.
 
 - Fix memory leak on cpu hotplug and return code handling in kexec
   code.
 
 - Wire up futex_waitv system call.
 
 - Replace snprintf with sysfs_emit where appropriate.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmGZBmIACgkQIg7DeRsp
 bsLevQ//XfCEcvJ1sB4OEiN97xyy5me4FoOo5rWuzG/ZN/YmUH0CkzJHIhjDcCg3
 2FslxH5doOA3zLEBCQKXtcW4uaLSgJcqDgFgpE0TZk/6VKB9RD5q2eSjd+akFMGh
 HFge54pfgpR7pYYwWRvbqOJRyzkU5oHAjMmt2UweOoX3qwynhMhTrT/03Y9pGMgK
 VBHhp+ocfdLGQk3nbehAWsh7AWItWwOtKblsTFoyJ6BW0pxb7Yc6+wrpyxLYCaRK
 rCbyXDStvDqjeBSdx2GZDrA7HbVsrZTHA7sSStIW8yIss1/YJXTP0J2PMXmYNbeE
 ou2WCg/iti1DNwN7AOR0OdPu1NfPQkyW6NmV8814Haa8Ub3GUc6RCo+U4wlCXAbo
 ZcHWlb8sgWgfQMzho3WfgkeXuEohO+nOV/x/JFt+NFcwidNTQKO7FQ8GsyylUcYo
 fBhElbn7p44eS1ivMFEwzptBbpH1JVbb30iV7tMWxyjJQ9TkzpsC3Ph14JimSChk
 oZuUnmgMztss/ikEMFcDLhd3DNedXfz10Boq6FucD8x46cW5j7o0scwIomcNtxmx
 C3Y9JCsDdiXAfS6Et6KGbsuWbigT3NjNKETK0+Be65GYNP/NPD5pXLeKywU++cHe
 e+Lucqiej9polcGN3X97lORMDEx0dXpGkM6ZK2rtX66e7rBbB7M=
 =n7BA
 -----END PGP SIGNATURE-----

Merge tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Add missing Kconfig option for ftrace direct multi sample, so it can
   be compiled again, and also add s390 support for this sample.

 - Update Christian Borntraeger's email address.

 - Various fixes for memory layout setup. Besides other this makes it
   possible to load shared DCSS segments again.

 - Fix copy to user space of swapped kdump oldmem.

 - Remove -mstack-guard and -mstack-size compile options when building
   vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled and
   results in broken vdso code which causes more or less random
   exceptions. Also remove the not needed -nostdlib option.

 - Fix memory leak on cpu hotplug and return code handling in kexec
   code.

 - Wire up futex_waitv system call.

 - Replace snprintf with sysfs_emit where appropriate.

* tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  ftrace/samples: add s390 support for ftrace direct multi sample
  ftrace/samples: add missing Kconfig option for ftrace direct multi sample
  MAINTAINERS: update email address of Christian Borntraeger
  s390/kexec: fix memory leak of ipl report buffer
  s390/kexec: fix return code handling
  s390/dump: fix copying to user-space of swapped kdump oldmem
  s390: wire up sys_futex_waitv system call
  s390/vdso: filter out -mstack-guard and -mstack-size
  s390/vdso: remove -nostdlib compiler flag
  s390: replace snprintf in show functions with sysfs_emit
  s390/boot: simplify and fix kernel memory layout setup
  s390/setup: re-arrange memblock setup
  s390/setup: avoid using memblock_enforce_memory_limit
  s390/setup: avoid reserving memory above identity mapping
2021-11-20 10:55:50 -08:00
Juergen Gross
2a0991929a xen/pvh: add missing prototype to header
The prototype of mem_map_via_hcall() is missing in its header, so add
it.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: a43fb7da53 ("xen/pvh: Move Xen code for getting mem map via hcall out of common file")
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211119153913.21678-1-jgross@suse.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-19 16:36:56 -06:00
Linus Torvalds
7af959b5d5 Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit-vs-signal handling fixes from Eric Biederman:
 "This is a small set of changes where debuggers were no longer able to
  intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
  cleanups.

  This is essentially the change you suggested with all of i's dotted
  and the t's crossed so that ptrace can intercept all of the cases it
  has been able to intercept the past, and all of the cases that made it
  to exit without giving ptrace a chance still don't give ptrace a
  chance"

* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Replace force_fatal_sig with force_exit_sig when in doubt
  signal: Don't always set SA_IMMUTABLE for forced signals
2021-11-19 11:33:31 -08:00
Peter Zijlstra
0dc636b3b7 x86: Pin task-stack in __get_wchan()
When commit 5d1ceb3969 ("x86: Fix __get_wchan() for !STACKTRACE")
moved from stacktrace to native unwind_*() usage, the
try_get_task_stack() got lost, leading to use-after-free issues for
dying tasks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: 5d1ceb3969 ("x86: Fix __get_wchan() for !STACKTRACE")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215031
Link: https://lore.kernel.org/stable/YZV02RCRVHIa144u@fedora64.linuxtx.org/
Reported-by: Justin Forbes <jmforbes@linuxtx.org>
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-19 10:14:57 -08:00
Eric W. Biederman
fcb116bc43 signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.

Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.

Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit.  This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.

This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.

In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig.  That can be done where it matters on
a case-by-case basis with careful analysis.

Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19 09:15:58 -06:00
Linus Torvalds
c46e8ece96 Selftest changes:
* Cleanups for the perf test infrastructure and mapping hugepages
 
 * Avoid contention on mmap_sem when the guests start to run
 
 * Add event channel upcall support to xen_shinfo_test
 
 x86 changes:
 
 * Fixes for Xen emulation
 
 * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
 
 * Fixes for migration of 32-bit nested guests on 64-bit hypervisor
 
 * Compilation fixes
 
 * More SEV cleanups
 
 Generic:
 
 * Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
 and num_online_cpus().  Most architectures were only using one of the two.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGV/PAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMrogf/eAyilGRQL7lLETn3DTVlgLVv82+z
 giX11HlUhUmATHIDluj/wVQUjVcY6AO4SnvFaudX7B+mibndkw4L19IubP/koQZu
 xnKSJTn+mVANdzz3UdsHl0ujbPdQJaFCIPW6iewbn2GRRZMwA5F3vMK/H09XRApL
 I7kq8CPA6sC0I3TPzPN3ROxigexzYunZmGQ4qQe0GUdtxHrJOYQN++ddmWbQoEIC
 gdFTyF7CUQ+lmJe0b/Y88yhISFAJCEBuKFlg9tOTuxSfwvPX6lUu+pi+utEx9M+O
 ckTSQli/apZ4RVcSzxMIwX/BciYqhqOz5uMG+w4DRlJixtGSHtjiEVxGxw==
 =Iij4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Selftest changes:

   - Cleanups for the perf test infrastructure and mapping hugepages

   - Avoid contention on mmap_sem when the guests start to run

   - Add event channel upcall support to xen_shinfo_test

  x86 changes:

   - Fixes for Xen emulation

   - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

   - Fixes for migration of 32-bit nested guests on 64-bit hypervisor

   - Compilation fixes

   - More SEV cleanups

  Generic:

   - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
     and num_online_cpus(). Most architectures were only using one of
     the two"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
  KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
  KVM: x86: Assume a 64-bit hypercall for guests with protected state
  selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
  riscv: kvm: fix non-kernel-doc comment block
  KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
  KVM: SEV: Drop a redundant setting of sev->asid during initialization
  KVM: SEV: WARN if SEV-ES is marked active but SEV is not
  KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
  KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
  KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
  KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
  KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
  KVM: x86/xen: Use sizeof_field() instead of open-coding it
  KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
  KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
  ...
2021-11-18 12:05:22 -08:00
Heiko Carstens
503e451084 ftrace/samples: add missing Kconfig option for ftrace direct multi sample
Currently it is not possible to build the ftrace direct multi example
anymore due to broken config dependencies. Fix this by adding
SAMPLE_FTRACE_DIRECT_MULTI config option.

This broke when merging s390-5.16-1 due to an incorrect merge conflict
resolution proposed by me.

Also rename SAMPLE_FTRACE_MULTI_DIRECT to SAMPLE_FTRACE_DIRECT_MULTI
so it matches the module name.

Fixes: 0b707e572a ("Merge tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux")
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211115195614.3173346-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-18 17:50:54 +01:00
Vitaly Kuznetsov
2845e7353b KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:12:15 -05:00
Tom Lendacky
b5aead0064 KVM: x86: Assume a 64-bit hypercall for guests with protected state
When processing a hypercall for a guest with protected state, currently
SEV-ES guests, the guest CS segment register can't be checked to
determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
expected that communication between the guest and the hypervisor is
performed to shared memory using the GHCB. In order to use the GHCB, the
guest must have been in long mode, otherwise writes by the guest to the
GHCB would be encrypted and not be able to be comprehended by the
hypervisor.

Create a new helper function, is_64_bit_hypercall(), that assumes the
guest is in 64-bit mode when the guest has protected state, and returns
true, otherwise invoking is_64_bit_mode() to determine the mode. Update
the hypercall related routines to use is_64_bit_hypercall() instead of
is_64_bit_mode().

Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
this helper function for a guest running with protected state.

Fixes: f1c6366e30 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:12:13 -05:00
Paolo Bonzini
817506df9d Merge branch 'kvm-5.16-fixes' into kvm-master
* Fixes for Xen emulation

* Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

* Fixes for migration of 32-bit nested guests on 64-bit hypervisor

* Compilation fixes

* More SEV cleanups
2021-11-18 02:11:57 -05:00
Sean Christopherson
8e38e96a4e KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
Rename cmd_allowed_from_miror() to is_cmd_allowed_from_mirror(), fixing
a typo and making it obvious that the result is a boolean where
false means "not allowed".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:10:28 -05:00
Sean Christopherson
ea410ef4da KVM: SEV: Drop a redundant setting of sev->asid during initialization
Remove a fully redundant write to sev->asid during SEV/SEV-ES guest
initialization.  The ASID is set a few lines earlier prior to the call to
sev_platform_init(), which doesn't take "sev" as a param, i.e. can't
muck with the ASID barring some truly magical behind-the-scenes code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:10:27 -05:00
Sean Christopherson
1bd00a4257 KVM: SEV: WARN if SEV-ES is marked active but SEV is not
WARN if the VM is tagged as SEV-ES but not SEV.  KVM relies on SEV and
SEV-ES being set atomically, and guards common flows with "is SEV", i.e.
observing SEV-ES without SEV means KVM has a fatal bug.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:10:27 -05:00
Sean Christopherson
a41fb26e61 KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
Set sev_info.active during SEV/SEV-ES activation before calling any code
that can potentially consume sev_info.es_active, e.g. set "active" and
"es_active" as a pair immediately after the initial sanity checks.  KVM
generally expects that es_active can be true if and only if active is
true, e.g. sev_asid_new() deliberately avoids sev_es_guest() so that it
doesn't get a false negative.  This will allow WARNing in sev_es_guest()
if the VM is tagged as SEV-ES but not SEV.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:10:27 -05:00
Sean Christopherson
79b1114276 KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
Reject COPY_ENC_CONTEXT_FROM if the destination VM has created vCPUs.
KVM relies on SEV activation to occur before vCPUs are created, e.g. to
set VMCB flags and intercepts correctly.

Fixes: 54526d1fd5 ("KVM: x86: Support KVM VMs sharing SEV context")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:10:27 -05:00
David Woodhouse
cee66664dc KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
And thus another call to kvm_vcpu_map() can die.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-7-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:43 -05:00
David Woodhouse
7d0172b3ca KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
Kill another mostly gratuitous kvm_vcpu_map() which could just use the
userspace HVA for it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-6-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:43 -05:00
David Woodhouse
6a834754a5 KVM: x86/xen: Use sizeof_field() instead of open-coding it
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:43 -05:00
David Woodhouse
297d597a6d KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
Using kvm_vcpu_map() for reading from the guest is entirely gratuitous,
when all we do is a single memcpy and unmap it again. Fix it up to use
kvm_read_guest()... but in fact I couldn't bring myself to do that
without also making it use a gfn_to_hva_cache for both that *and* the
copy in the other direction.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:42 -05:00
David Woodhouse
4e8436479a KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
In commit 319afe6856 ("KVM: xen: do not use struct gfn_to_hva_cache") we
stopped storing this in-kernel as a GPA, and started storing it as a GFN.
Which means we probably should have stopped calling gpa_to_gfn() on it
when userspace asks for it back.

Cc: stable@vger.kernel.org
Fixes: 319afe6856 ("KVM: xen: do not use struct gfn_to_hva_cache")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:42 -05:00
Maxim Levitsky
b8453cdcf2 KVM: x86/mmu: include EFER.LMA in extended mmu role
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use.  When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context.  And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.

However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2.  If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.

Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:42 -05:00
Maxim Levitsky
af957eebfc KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
When loading nested state, don't use check vcpu->arch.efer to get the
L1 host's 64-bit vs. 32-bit state and don't check it for consistency
with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
may be stale when KVM_SET_NESTED_STATE is called---and architecturally
does not exist.  When restoring L2 state in KVM, the CPU is placed in
non-root where nested VMX code has no snapshot of L1 host state: VMX
(conditionally) loads host state fields loaded on VM-exit, but they need
not correspond to the state before entry.  A simple case occurs in KVM
itself, where the host RIP field points to vmx_vmexit rather than the
instruction following vmlaunch/vmresume.

However, for the particular case of L1 being in 32- or 64-bit mode
on entry, the exit controls can be treated instead as the source of
truth regarding the state of L1 on entry, and can be used to check
that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
vmcs12.VM_EXIT_LOAD_IA32_EFER is set.  The consistency check on CPU
EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
on VM-Enter.  That's because, again, there's conceptually no "current"
L1 EFER to check on KVM_SET_NESTED_STATE.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:42 -05:00
David Woodhouse
964b7aa0b0 KVM: Fix steal time asm constraints
In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
the [abcd] registers. For which, GCC has the "q" constraint instead of
the less restrictive "r".

Also fix st->preempted, which is an input/output operand rather than an
input.

Fixes: 7e2175ebd6 ("KVM: x86: Fix recording of guest steal time / preempted status")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:41 -05:00
Paul Durrant
dc23a5110b cpuid: kvm_find_kvm_cpuid_features() should be declared 'static'
The lack a static declaration currently results in:

arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features'

when compiling with "W=1".

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 760849b147 ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211115144131.5943-1-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:14 -05:00
Song Liu
f3fd84a3b7 x86/perf: Fix snapshot_branch_stack warning in VM
When running in VM intel_pmu_snapshot_branch_stack triggers WRMSR warning
like:

 [ ] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0000000000000000) at rIP: 0xffffffff81011a5b (intel_pmu_snapshot_branch_stack+0x3b/0xd0)

This can be triggered with BPF selftests:

  tools/testing/selftests/bpf/test_progs -t get_branch_snapshot

This warning is caused by __intel_pmu_pebs_disable_all() in the VM.
Since it is not necessary to disable PEBS for LBR, remove it from
intel_pmu_snapshot_branch_stack and intel_pmu_snapshot_arch_branch_stack.

Fixes: c22ac2a3d4 ("perf: Enable branch record for software events")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20211112054510.2667030-1-songliubraving@fb.com
2021-11-17 14:48:43 +01:00
Alexander Antonov
bdc0feee05 perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
According to the latest uncore document, DATA_REQ_OF_CPU (0x83),
DATA_REQ_BY_CPU (0xc0) and COMP_BUF_OCCUPANCY (0xd5) events have
constraints. Add uncore IIO constraints for Snowridge.

Fixes: 210cc5f9db ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server")
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20211115090334.3789-4-alexander.antonov@linux.intel.com
2021-11-17 14:48:43 +01:00
Alexander Antonov
3866ae319c perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
According to the latest uncore document, COMP_BUF_OCCUPANCY (0xd5) event
can be collected on 2-3 counters. Update uncore IIO event constraints for
Skylake Server.

Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20211115090334.3789-3-alexander.antonov@linux.intel.com
2021-11-17 14:48:43 +01:00
Alexander Antonov
e324234e0a perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
According Uncore Reference Manual: any of the CHA events may be filtered
by Thread/Core-ID by using tid modifier in CHA Filter 0 Register.
Update skx_cha_hw_config() to follow Uncore Guide.

Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20211115090334.3789-2-alexander.antonov@linux.intel.com
2021-11-17 14:48:43 +01:00
Reinette Chatre
ac5d272a0a x86/sgx: Fix free page accounting
The SGX driver maintains a single global free page counter,
sgx_nr_free_pages, that reflects the number of free pages available
across all NUMA nodes. Correspondingly, a list of free pages is
associated with each NUMA node and sgx_nr_free_pages is updated
every time a page is added or removed from any of the free page
lists. The main usage of sgx_nr_free_pages is by the reclaimer
that runs when it (sgx_nr_free_pages) goes below a watermark
to ensure that there are always some free pages available to, for
example, support efficient page faults.

With sgx_nr_free_pages accessed and modified from a few places
it is essential to ensure that these accesses are done safely but
this is not the case. sgx_nr_free_pages is read without any
protection and updated with inconsistent protection by any one
of the spin locks associated with the individual NUMA nodes.
For example:

      CPU_A                                 CPU_B
      -----                                 -----
 spin_lock(&nodeA->lock);              spin_lock(&nodeB->lock);
 ...                                   ...
 sgx_nr_free_pages--;  /* NOT SAFE */  sgx_nr_free_pages--;

 spin_unlock(&nodeA->lock);            spin_unlock(&nodeB->lock);

Since sgx_nr_free_pages may be protected by different spin locks
while being modified from different CPUs, the following scenario
is possible:

      CPU_A                                CPU_B
      -----                                -----
{sgx_nr_free_pages = 100}
 spin_lock(&nodeA->lock);              spin_lock(&nodeB->lock);
 sgx_nr_free_pages--;                  sgx_nr_free_pages--;
 /* LOAD sgx_nr_free_pages = 100 */    /* LOAD sgx_nr_free_pages = 100 */
 /* sgx_nr_free_pages--          */    /* sgx_nr_free_pages--          */
 /* STORE sgx_nr_free_pages = 99 */    /* STORE sgx_nr_free_pages = 99 */
 spin_unlock(&nodeA->lock);            spin_unlock(&nodeB->lock);

In the above scenario, sgx_nr_free_pages is decremented from two CPUs
but instead of sgx_nr_free_pages ending with a value that is two less
than it started with, it was only decremented by one while the number
of free pages were actually reduced by two. The consequence of
sgx_nr_free_pages not being protected is that its value may not
accurately reflect the actual number of free pages on the system,
impacting the availability of free pages in support of many flows.

The problematic scenario is when the reclaimer does not run because it
believes there to be sufficient free pages while any attempt to allocate
a page fails because there are no free pages available. In the SGX driver
the reclaimer's watermark is only 32 pages so after encountering the
above example scenario 32 times a user space hang is possible when there
are no more free pages because of repeated page faults caused by no
free pages made available.

The following flow was encountered:
asm_exc_page_fault
 ...
   sgx_vma_fault()
     sgx_encl_load_page()
       sgx_encl_eldu() // Encrypted page needs to be loaded from backing
                       // storage into newly allocated SGX memory page
         sgx_alloc_epc_page() // Allocate a page of SGX memory
           __sgx_alloc_epc_page() // Fails, no free SGX memory
           ...
           if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer
             wake_up(&ksgxd_waitq);
           return -EBUSY; // Return -EBUSY giving reclaimer time to run
       return -EBUSY;
     return -EBUSY;
   return VM_FAULT_NOPAGE;

The reclaimer is triggered in above flow with the following code:

static bool sgx_should_reclaim(unsigned long watermark)
{
        return sgx_nr_free_pages < watermark &&
               !list_empty(&sgx_active_page_list);
}

In the problematic scenario there were no free pages available yet the
value of sgx_nr_free_pages was above the watermark. The allocation of
SGX memory thus always failed because of a lack of free pages while no
free pages were made available because the reclaimer is never started
because of sgx_nr_free_pages' incorrect value. The consequence was that
user space kept encountering VM_FAULT_NOPAGE that caused the same
address to be accessed repeatedly with the same result.

Change the global free page counter to an atomic type that
ensures simultaneous updates are done safely. While doing so, move
the updating of the variable outside of the spin lock critical
section to which it does not belong.

Cc: stable@vger.kernel.org
Fixes: 901ddbb9ec ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/a95a40743bbd3f795b465f30922dde7f1ea9e0eb.1637004094.git.reinette.chatre@intel.com
2021-11-16 11:17:43 -08:00
黄乐
c5adbb3af0 KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()
In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is
initialized only when Hyper-V context is available, in other path it is
just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack,
which would cause unexpected interrupt delivery/handling issues, e.g. an
*old* linux kernel that relies on PIT to do clock calibration on KVM might
randomly fail to boot.

Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V
context is not available.

Fixes: f2bc14b69c ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context")
Cc: stable@vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Huang Le <huangle1@jd.com>
Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-16 07:51:04 -05:00
Sean Christopherson
f3e613e72f x86/hyperv: Move required MSRs check to initial platform probing
Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing
for running as a Hyper-V guest instead of waiting until hyperv_init() to
detect the bogus configuration.  Add messages to give the admin a heads
up that they are likely running on a broken virtual machine setup.

At best, silently disabling Hyper-V is confusing and difficult to debug,
e.g. the kernel _says_ it's using all these fancy Hyper-V features, but
always falls back to the native versions.  At worst, the half baked setup
will crash/hang the kernel.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-11-15 12:37:08 +00:00
Sean Christopherson
daf972118c x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
Check for a valid hv_vp_index array prior to derefencing hv_vp_index when
setting Hyper-V's TSC change callback.  If Hyper-V setup failed in
hyperv_init(), the kernel will still report that it's running under
Hyper-V, but will have silently disabled nearly all functionality.

  BUG: kernel NULL pointer dereference, address: 0000000000000010
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:set_hv_tscchange_cb+0x15/0xa0
  Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08
  ...
  Call Trace:
   kvm_arch_init+0x17c/0x280
   kvm_init+0x31/0x330
   vmx_init+0xba/0x13a
   do_one_initcall+0x41/0x1c0
   kernel_init_freeable+0x1f2/0x23b
   kernel_init+0x16/0x120
   ret_from_fork+0x22/0x30

Fixes: 93286261de ("x86/hyperv: Reenlightenment notifications support")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-11-15 12:37:08 +00:00
Borislav Petkov
8d48bf8206 x86/boot: Pull up cmdline preparation and early param parsing
Dan reports that Anjaneya Chagam can no longer use the efi=nosoftreserve
kernel command line parameter to suppress "soft reservation" behavior.

This is due to the fact that the following call-chain happens at boot:

early_reserve_memory
|-> efi_memblock_x86_reserve_range
    |-> efi_fake_memmap_early

which does

        if (!efi_soft_reserve_enabled())
                return;

and that would have set EFI_MEM_NO_SOFT_RESERVE after having parsed
"nosoftreserve".

However, parse_early_param() gets called *after* it, leading to the boot
cmdline not being taken into account.

Therefore, carve out the command line preparation into a separate
function which does the early param parsing too. So that it all goes
together.

And then call that function before early_reserve_memory() so that the
params would have been parsed by then.

Fixes: 8aa83e6395 ("x86/setup: Call early_reserve_memory() earlier")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Anjaneya Chagam <anjaneya.chagam@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/e8dd8993c38702ee6dd73b3c11f158617e665607.camel@intel.com
2021-11-15 12:27:40 +01:00
Linus Torvalds
218cc8b860 A single fix for static calls to make the trampoline patching more robust
by placing explicit signature bytes after the call trampoline to prevent
 patching random other jumps like the CFI jump table entries.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDKsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZeNEADEFTbUJKd8812O9vkY9we1GDAtH7bY
 z6sYkh0/rPvYjdPfHuwqW8tUAl+CO2ne2X8FRPKgEdRLg44BY4HaMHmujdbGh3fh
 zpqynUBPoOIgtWxAPGdF+JxjrKlzjFd+WwjG3qBXOF3pjKgCc5knyjTucsl6ced3
 wF293rSYrIJ6uRv2TTNbM5hWJdC0arWbdMFnwQTxeZR54WLpu7Wfm+CCK41w0fAU
 nrfSsv73WEwpmAZNh04wsZsf7h6yCO7dCrIJD/3mpJtrUVBZXuZAKDzUzJPvHJal
 T8LcKwxZQAgPv0ubmOCrolj98Qp6PAPSdDJbzNsCJUYEbBqaB2inJ0PeHcZPspy9
 YyW00EHXD2UKm/GNF/DIlhoiNxOSh8Wn4b6H5ZRML50bS7jsMp8YVbticWEjItL6
 N4/61c45/uPILBS+Lysj0aqyj4TvagiuffJFWjw3YAQ+Gp/pzlJwRNjrw7/4DxAx
 KdpM881IKCR8UowBz3gIiA9FrJv2dGMqq31Rs1fjuauxkIX0gV3c64tAIRWrVscT
 k6GKGvHSis5cT97K3yhmNH0BUND+Skeku8G/SnTkefvcB85aU/7HBkLLJpw0w84F
 F6PTCaCJOEHrl3ADkilsi3z0sKWrph6aAzDEgp6Q6cmo9ulFAGw0bjuJb59xsvVK
 flIvTLUY3n76FA==
 =dgiF
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 static call update from Thomas Gleixner:
 "A single fix for static calls to make the trampoline patching more
  robust by placing explicit signature bytes after the call trampoline
  to prevent patching random other jumps like the CFI jump table
  entries"

* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call,x86: Robustify trampoline patching
2021-11-14 10:30:17 -08:00
Linus Torvalds
fc661f2dcb - Avoid touching ~100 config files in order to be able to select
the preemption model
 
 - clear cluster CPU masks too, on the CPU unplug path
 
 - prevent use-after-free in cfs
 
 - Prevent a race condition when updating CPU cache domains
 
 - Factor out common shared part of smp_prepare_cpus() into a common
 helper which can be called by both baremetal and Xen, in order to fix a
 booting of Xen PV guests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ8HcACgkQEsHwGGHe
 VUouoA//WAZ/dZu7IiM06JhZWswa2yNsdU8qQHys81lEqstaBqiWuZdg1qJTVIir
 2d0aN0keiPcsLyAsp1UJ2g/K/7D5vSJWDzsHKfEAToiAm8Tntai2LlSocWWfeSQm
 10grDHWpEHbj0hTHTA6HYOr2WbY4/LnR4cdL0WobIzivIrRTx49d0XUOUfWLP5KX
 60uM6dSjwpJrQUnvzk+bhGiHVmutFrEJy+UU/0o+nxkdhwraNiSbLi0007BGRCof
 6dokRRvLLR09dl1LMG51gVjQch4j/lCx6EWWUhYOFeV3I3gibSCNkmu7dpmMCBTR
 QWO01cR9gyFN4xQ2is4I36M5L0/8T+sbGvvXIXNDT/XWr0/p+g6p2mx0cd2XiYIr
 ZthGRcxxV/KGmxfPaygKS9tpQseMEIrdd6VjAnGfZ3OS6CtUvYt8d0B2Soj8FALQ
 N9fMXDIEP3uUZim8UvCT6HBKlj9LR5uI5n+dAQ6uzsenO9WqeGeldc/N26/+osdN
 vo4lNYTqiXJPhJvunYW5t4j5JnUa3grDHioAPWaQRJlWtEZBGKs9SXTcweg/KURb
 mNfe1RfSlGJt28RD3E18gXeSS7xWdKgpcVX1rmW/9tUjX04NNDWjq4sAzOj7c+Ir
 4sr78XgCY0pUxFaFYxvQWFUy7wcm0zAczo1RGUhcDTf1edDEvjo=
 =s2MX
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Avoid touching ~100 config files in order to be able to select the
   preemption model

 - clear cluster CPU masks too, on the CPU unplug path

 - prevent use-after-free in cfs

 - Prevent a race condition when updating CPU cache domains

 - Factor out common shared part of smp_prepare_cpus() into a common
   helper which can be called by both baremetal and Xen, in order to fix
   a booting of Xen PV guests

* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Restore preemption model selection configs
  arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
  sched/fair: Prevent dead task groups from regaining cfs_rq's
  sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14 09:39:03 -08:00
Linus Torvalds
f7018be292 - Prevent unintentional page sharing by checking whether a page
reference to a PMU samples page has been acquired properly before that
 
 - Make sure the LBR_SELECT MSR is saved/restored too
 
 - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
 residual data left
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ5z8ACgkQEsHwGGHe
 VUqqdQ/+JIV6t0yIj7aNADaakwAe+i9zFzzUuvb5KT0zPzZirswkz6xeZ4g8S8PZ
 lSjqKk8M2Yt3SJiqi/s3KNIOev52wtKGmeOFz1I+DUNpgk0wGHkRtVHV/iSptB61
 Kp/fJvOVppY5grs5B0fRYkM5e477RPyZo+E0COKnff1bQ+k+z2ItMLCVxFCxQS6k
 HmgPW7CBye811YcEg28lSwgS1OXiMZ19gACIsqnQ6kQP2Puo8+HT1/V1n+0grejb
 OeYxURuYSRPd6Ft76qz0YlRIe1dgKllUBr7b0AaM11ADBMtWBTxqJcQvq/mOIHmT
 9to0dVB/xFySR57iaL7BRuZFOrt8MRqJniEedMO99Dm9sxEVfHs1iXC9r7wZxQAf
 /HcvVkcyOJD92Kv+4LS5tKjowCByOYEJW2YQIgXEbA6oIhRuM9/fdxEW6lHwgdwc
 BPnOR6rtYuq+I+merBIIijAuf8OsIGY7ap2B+f7DkiOtA9+SHZsrU22J8T7CED/w
 gmrAC3+3KGt7YDs6WZTbvkXminZQyu5WpHe+2K6dlCIPmJLqEsYUx8TeXa/okyvb
 8ZXy/CfJNbHUrk6GZw7RFoeannwSPv9ZJO3Mfy5PDvwDk0Fj0J+/G92mR2Zucxpo
 siNyBCivPY5vBPqk+x6eUPev/C3wPS+dNrs4HOyr1N2gZwgTk40=
 =Ciqw
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Prevent unintentional page sharing by checking whether a page
   reference to a PMU samples page has been acquired properly before
   that

 - Make sure the LBR_SELECT MSR is saved/restored too

 - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
   residual data left

* tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Avoid put_page() when GUP fails
  perf/x86/vlbr: Add c->flags to vlbr event constraints
  perf/x86/lbr: Reset LBR_SELECT during vlbr reset
2021-11-14 09:33:12 -08:00
Linus Torvalds
1654e95ee3 - Add the model number of a new, Raptor Lake CPU, to intel-family.h
- Do not log spurious corrected MCEs on SKL too, due to an erratum
 
 - Clarify the path of paravirt ops patches upstream
 
 - Add an optimization to avoid writing out AMX components to sigframes
 when former are in init state
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ3CgACgkQEsHwGGHe
 VUoLAA/+NXRvcBHYkLaByT9f4OI6B79HzyguIBSfipYiw8ir0H7uEdV5FUCCUgCz
 egBRVFpOsXWt1teeuu6ViO+WBHncUxG/ryZ0ka35lri/3kuVYnugZExWDs4MrGR5
 vehRXehOxYNRaYc3oLYjubSbxqF1nWz3WWfGfhiBKk0jT/S1T9tX6lsRXlKsJCgj
 M4x5aqBWP8HTbFQfqjdHwagNitmSKzgjZvMcC4UWcql33ZCycbjvRdrAzBtw7WRI
 UBvgxWVmeMoagu5fqEOoph1oSoFxWuFrweFUjnxJmT6uZrTsfF7BVgXkxdG6eYUy
 2Xogcd4bPDBiRgbs0vPEog1tyyrKHOQ6p1pvksySKMPq6ULcSZ6hBpEZRpgr6Y9u
 0jB3P6weQgCckx5Hd+iwvX1a+GvEuHSEqAE+j160wFyrsBS5Cir3P1WqthWaPd5I
 3nH3h955PokUHPUioUhdf+8cfuP6h6K0nz1gdYI8GR8+fJHhEceT+pLLeyIxj/VM
 yr+bq+V7D6Cg62w3z3s9Dzg2XKpxStu1R9L1N/K8MtIGf6Uc7paL6xR27XxhmBp5
 Y6bGZw0mxxFhp6AEsFWo3rwLL9Dl5DmFcfgUHHpPK5VP0pVWp48Uapx2Hi2/JzAo
 c1o4UkPQa/EZJBPTklmGkS1JNp/2TsEL4Fw7sew+j7DWtsJpCfk=
 =Ge2T
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add the model number of a new, Raptor Lake CPU, to intel-family.h

 - Do not log spurious corrected MCEs on SKL too, due to an erratum

 - Clarify the path of paravirt ops patches upstream

 - Add an optimization to avoid writing out AMX components to sigframes
   when former are in init state

* tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Raptor Lake to Intel family
  x86/mce: Add errata workaround for Skylake SKX37
  MAINTAINERS: Add some information to PARAVIRT_OPS entry
  x86/fpu: Optimize out sigframe xfeatures when in init state
2021-11-14 09:29:03 -08:00
Linus Torvalds
4d6fe79fde New x86 features:
* Guest API and guest kernel support for SEV live migration
 
 * SEV and SEV-ES intra-host migration
 
 Bugfixes and cleanups for x86:
 
 * Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status
 
 * Fix selftests on APICv machines
 
 * Fix sparse warnings
 
 * Fix detection of KVM features in CPUID
 
 * Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
 
 * Fixes and cleanups for MSR bitmap handling
 
 * Cleanups for INVPCID
 
 * Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
 
 Bugfixes for ARM:
 
 * Fix finalization of host stage2 mappings
 
 * Tighten the return value of kvm_vcpu_preferred_target()
 
 * Make sure the extraction of ESR_ELx.EC is limited to architected bits
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGO1usUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOsXgf/d2CgZmTxZii/pOt0JKGSDKwRdoti
 pfbmtXwTkagqg2tIJtEsxwvcc26Q2VLyg9mlEelX1I8KVlexSa8mDtbGWHLFilsc
 JIZY8lE96wtlr9AyuN06K44QingDMIbXzlxcwO+muS3zTlSPsNkPcaVDk5PL35sN
 Wy2GA6GCLWv6iTLCtlX5EzbcgkrR+Mypj4lGSdXGRqfBKVFOQjeVt+Q3YzOPuacS
 03NBlYOmRxTnAGXfeAVs0/zEa69u/dYjBmwDPe6ERma4u8thHv0U/fDAh5C/ES7q
 42Thes/JOYnEZiBQ1xZLsHqRII0achbJKZhmxqPwjRf/u6eXsfz0KY9FBg==
 =kN8U
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "New x86 features:

   - Guest API and guest kernel support for SEV live migration

   - SEV and SEV-ES intra-host migration

  Bugfixes and cleanups for x86:

   - Fix misuse of gfn-to-pfn cache when recording guest steal time /
     preempted status

   - Fix selftests on APICv machines

   - Fix sparse warnings

   - Fix detection of KVM features in CPUID

   - Cleanups for bogus writes to MSR_KVM_PV_EOI_EN

   - Fixes and cleanups for MSR bitmap handling

   - Cleanups for INVPCID

   - Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures

  Bugfixes for ARM:

   - Fix finalization of host stage2 mappings

   - Tighten the return value of kvm_vcpu_preferred_target()

   - Make sure the extraction of ESR_ELx.EC is limited to architected
     bits"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
  KVM: x86: move guest_pv_has out of user_access section
  KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
  KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
  KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
  KVM: nVMX: Clean up x2APIC MSR handling for L2
  KVM: VMX: Macrofy the MSR bitmap getters and setters
  KVM: nVMX: Handle dynamic MSR intercept toggling
  KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
  KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
  KVM: x86: Rename kvm_lapic_enable_pv_eoi()
  KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
  KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
  kvm: mmu: Use fast PF path for access tracking of huge pages when possible
  KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator
  KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
  kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool
  KVM: x86: Fix recording of guest steal time / preempted status
  selftest: KVM: Add intra host migration tests
  selftest: KVM: Add open sev dev helper
  ...
2021-11-13 10:01:10 -08:00
Linus Torvalds
d4fa09e514 Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull vm86 fix from Eric Biederman:
 "Just the removal of an unnecessary (and incorrect) test from a BUG_ON"

* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal/vm86_32: Remove pointless test in BUG_ON
2021-11-13 09:54:24 -08:00
Eric W. Biederman
c7a9b6471c signal/vm86_32: Remove pointless test in BUG_ON
kernel test robot <oliver.sang@intel.com> writes[1]:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 1a4d21a23c ("signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
>
> [ 70.645554][ T3747] kernel BUG at arch/x86/kernel/vm86_32.c:109!
> [ 70.646185][ T3747] invalid opcode: 0000 [#1] SMP
> [ 70.646682][ T3747] CPU: 0 PID: 3747 Comm: trinity-c6 Not tainted 5.15.0-rc1-00009-g1a4d21a23c4c #1
> [ 70.647598][ T3747] EIP: save_v86_state (arch/x86/kernel/vm86_32.c:109 (discriminator 3))
> [ 70.648113][ T3747] Code: 89 c3 64 8b 35 60 b8 25 c2 83 ec 08 89 55 f0 8b 96 10 19 00 00 89 55 ec e8 c6 2d 0c 00 fb 8b 55 ec 85 d2 74 05 83 3a 00 75 02 <0f> 0b 8b 86 10 19 00 00 8b 4b 38 8b 78 48 31 cf 89 f8 8b 7a 4c 81
> [ 70.650136][ T3747] EAX: 00000001 EBX: f5f49fac ECX: 0000000b EDX: f610b600
> [ 70.650852][ T3747] ESI: f5f79cc0 EDI: f5f79cc0 EBP: f5f49f04 ESP: f5f49ef0
> [ 70.651593][ T3747] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> [ 70.652413][ T3747] CR0: 80050033 CR2: 00004000 CR3: 35fc7000 CR4: 000406d0
> [ 70.653169][ T3747] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 70.653897][ T3747] DR6: fffe0ff0 DR7: 00000400
> [ 70.654382][ T3747] Call Trace:
> [ 70.654719][ T3747] arch_do_signal_or_restart (arch/x86/kernel/signal.c:792 arch/x86/kernel/signal.c:867)
> [ 70.655288][ T3747] exit_to_user_mode_prepare (kernel/entry/common.c:174 kernel/entry/common.c:209)
> [ 70.655854][ T3747] irqentry_exit_to_user_mode (kernel/entry/common.c:126 kernel/entry/common.c:317)
> [ 70.656450][ T3747] irqentry_exit (kernel/entry/common.c:406)
> [ 70.656897][ T3747] exc_page_fault (arch/x86/mm/fault.c:1535)
> [ 70.657369][ T3747] ? sysvec_kvm_asyncpf_interrupt (arch/x86/mm/fault.c:1488)
> [ 70.657989][ T3747] handle_exception (arch/x86/entry/entry_32.S:1085)

vm86_32.c:109 is: "BUG_ON(!vm86 || !vm86->user_vm86)"

When trying to understand the failure Brian Gerst pointed out[2] that
the code does not need protection against vm86->user_vm86 being NULL.
The copy_from_user code will already handles that case if the address
is going to fault.

Looking futher I realized that if we care about not allowing struct
vm86plus_struct at address 0 it should be do_sys_vm86 (the system
call) that does the filtering.  Not way down deep when the emulation
has completed in save_v86_state.

So let's just remove the silly case of attempting to filter a
userspace address with a BUG_ON.  Existing userspace can't break and
it won't make the kernel any more attackable as the userspace access
helpers will handle it, if it isn't a good userspace pointer.

I have run the reproducer the fuzzer gave me before I made this change
and it reproduced, and after I made this change and I have not seen
the reported failure.  So it does looks like this fixes the reported
issue.

[1] https://lkml.kernel.org/r/20211112074030.GB19820@xsang-OptiPlex-9020
[2] https://lkml.kernel.org/r/CAMzpN2jkK5sAv-Kg_kVnCEyVySiqeTdUORcC=AdG1gV6r8nUew@mail.gmail.com
Suggested-by: Brian Gerst <brgerst@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-12 15:26:42 -06:00