Commit Graph

1264658 Commits

Author SHA1 Message Date
Stefan Berger
c6ab5c915d crypto: ecc - Prevent ecc_digits_from_bytes from reading too many bytes
Prevent ecc_digits_from_bytes from reading too many bytes from the input
byte array in case an insufficient number of bytes is provided to fill the
output digit array of ndigits. Therefore, initialize the most significant
digits with 0 to avoid trying to read too many bytes later on. Convert the
function into a regular function since it is getting too big for an inline
function.

If too many bytes are provided on the input byte array the extra bytes
are ignored since the input variable 'ndigits' limits the number of digits
that will be filled.

Fixes: d67c96fb97 ("crypto: ecdsa - Convert byte arrays with key coordinates to digits")
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-17 18:55:07 +08:00
Herbert Xu
d3b17c6d9d crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
Using completion_done to determine whether the caller has gone
away only works after a complete call.  Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.

Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.

Fixes: 7d42e09760 ("crypto: qat - resolve race condition during AER recovery")
Cc: <stable@vger.kernel.org> #6.8+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-17 18:55:07 +08:00
Lothar Rubusch
13909a0c88 crypto: atmel-sha204a - provide the otp content
Set up sysfs for the Atmel SHA204a. Provide the content of the otp zone as
an attribute field on the sysfs entry. Thereby make sure that if the chip
is locked, not connected or trouble with the i2c bus, the sysfs device is
not set up. This is mostly already handled in atmel-i2c.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:25 +08:00
Lothar Rubusch
e05ce444e9 crypto: atmel-sha204a - add reading from otp zone
Provide a read function reading the otp zone. The otp zone can be used for
storing serial numbers. The otp zone, as also data zone, are only
accessible if the chip was locked before. Locking the chip is a post
production customization and has to be done manually i.e. not by this
driver. Without this step the chip is pretty much not usable, where
putting or not putting data into the otp zone is optional.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:25 +08:00
Lothar Rubusch
3f5f746165 crypto: atmel-i2c - rename read function
Make the memory read function name more specific to the read memory zone.
The Atmel SHA204 chips provide config, otp and data zone. The implemented
read function in fact only reads some fields in zone config. The function
renaming allows for a uniform naming scheme when reading from other memory
zones.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:25 +08:00
Lothar Rubusch
e228b41abb crypto: atmel-i2c - add missing arg description
Add missing description for argument hwrng.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:25 +08:00
Thorsten Blum
bfbe27ba59 crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy()
Fixes the following two Coccinelle/coccicheck warnings reported by
memdup.cocci:

	iaa_crypto_main.c:350:19-26: WARNING opportunity for kmemdup
	iaa_crypto_main.c:358:18-25: WARNING opportunity for kmemdup

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:25 +08:00
Wolfram Sang
e02ea6f9f2 crypto: sahara - use 'time_left' variable with wait_for_completion_timeout()
There is a confusing pattern in the kernel to use a variable named 'timeout' to
store the result of wait_for_completion_timeout() causing patterns like:

	timeout = wait_for_completion_timeout(...)
	if (!timeout) return -ETIMEDOUT;

with all kinds of permutations. Use 'time_left' as a variable to make the code
self explaining.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:25 +08:00
Wolfram Sang
98f9e44713 crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout()
There is a confusing pattern in the kernel to use a variable named 'timeout' to
store the result of wait_for_completion_killable_timeout() causing patterns like:

	timeout = wait_for_completion_killable_timeout(...)
	if (!timeout) return -ETIMEDOUT;

with all kinds of permutations. Use 'time_left' as a variable to make the code
self explaining.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Pankaj Gupta
d2835701d9 crypto: caam - i.MX8ULP donot have CAAM page0 access
iMX8ULP have a secure-enclave hardware IP called EdgeLock Enclave(ELE),
that control access to caam controller's register page, i.e., page0.

At all, if the ELE release access to CAAM controller's register page,
it will release to secure-world only.

Clocks are turned on automatically for iMX8ULP. There exists the caam
clock gating bit, but it is not advised to gate the clock at linux, as
optee-os or any other entity might be using it.

Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Horia Geanta <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Pankaj Gupta
6144436803 crypto: caam - init-clk based on caam-page0-access
CAAM clock initializat is done based on the basis of soc specific
info stored in struct caam_imx_data:
- caam-page0-access flag
- num_clks

CAAM driver needs to be aware of access rights to CAAM control page
i.e., page0, to do things differently.

Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Jia Jie Ho
f8c423bab9 crypto: starfive - Use fallback for unaligned dma access
Dma address mapping fails on unaligned scatterlist offset. Use sw
fallback for these cases.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Jia Jie Ho
d7f01649f4 crypto: starfive - Do not free stack buffer
RSA text data uses variable length buffer allocated in software stack.
Calling kfree on it causes undefined behaviour in subsequent operations.

Cc: <stable@vger.kernel.org> #6.7+
Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Jia Jie Ho
25ca4a85e9 crypto: starfive - Skip unneeded fallback allocation
Skip sw fallback allocation if RSA module failed to get device handle.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Jia Jie Ho
3d12d90efa crypto: starfive - Skip dma setup for zeroed message
Skip dma setup and mapping for AES driver if plaintext is empty.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10 17:15:24 +08:00
Wenkai Lin
6117af8636 crypto: hisilicon/sec2 - fix for register offset
The offset of SEC_CORE_ENABLE_BITMAP should be 0 instead of 32,
it cause a kasan shift-out-bounds warning, fix it.

Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-03 18:44:44 +08:00
Chenghai Huang
15f112f9ce crypto: hisilicon/debugfs - mask the unnecessary info from the dump
Some information showed by the dump function is invalid. Mask
the unnecessary information from the dump file.

Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-03 18:44:44 +08:00
Giovanni Cabiddu
a3dc1f2b6b crypto: qat - specify firmware files for 402xx
The 4xxx driver can probe 4xxx and 402xx devices. However, the driver
only specifies the firmware images required for 4xxx.
This might result in external tools missing these binaries, if required,
in the initramfs.

Specify the firmware image used by 402xx with the MODULE_FIRMWARE()
macros in the 4xxx driver.

Fixes: a3e8c919b9 ("crypto: qat - add support for 402xx devices")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-03 18:44:44 +08:00
Eric Biggers
ed265f7fd9 crypto: x86/aes-gcm - simplify GCM hash subkey derivation
Remove a redundant expansion of the AES key, and use rodata for zeroes.
Also rename rfc4106_set_hash_subkey() to aes_gcm_derive_hash_subkey()
because it's used for both versions of AES-GCM, not just RFC4106.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:10 +08:00
Eric Biggers
a0bbb1c187 crypto: x86/aes-gcm - delete unused GCM assembly code
Delete aesni_gcm_enc() and aesni_gcm_dec() because they are unused.
Only the incremental AES-GCM functions (aesni_gcm_init(),
aesni_gcm_enc_update(), aesni_gcm_finalize()) are actually used.

This saves 17 KB of object code.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:10 +08:00
Eric Biggers
6a80586474 crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
Since the total length processed by the loop in xts_crypt_slowpath() is
a multiple of AES_BLOCK_SIZE, just round the length down to
AES_BLOCK_SIZE even on the last step.  This doesn't change behavior, as
the last step will process a multiple of AES_BLOCK_SIZE regardless.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:10 +08:00
Marek Vasut
c819d7b836 hwrng: stm32 - repair clock handling
The clock management in this driver does not seem to be correct. The
struct hwrng .init callback enables the clock, but there is no matching
.cleanup callback to disable the clock. The clock get disabled as some
later point by runtime PM suspend callback.

Furthermore, both runtime PM and sleep suspend callbacks access registers
first and disable clock which are used for register access second. If the
IP is already in RPM suspend and the system enters sleep state, the sleep
callback will attempt to access registers while the register clock are
already disabled. This bug has been fixed once before already in commit
9bae54942b ("hwrng: stm32 - fix pm_suspend issue"), and regressed in
commit ff4e46104f ("hwrng: stm32 - rework power management sequences") .

Fix this slightly differently, disable register clock at the end of .init
callback, this way the IP is disabled after .init. On every access to the
IP, which really is only stm32_rng_read(), do pm_runtime_get_sync() which
is already done in stm32_rng_read() to bring the IP from RPM suspend, and
pm_runtime_mark_last_busy()/pm_runtime_put_sync_autosuspend() to put it
back into RPM suspend.

Change sleep suspend/resume callbacks to enable and disable register clock
around register access, as those cannot use the RPM suspend/resume callbacks
due to slightly different initialization in those sleep callbacks. This way,
the register access should always be performed with clock surely enabled.

Fixes: ff4e46104f ("hwrng: stm32 - rework power management sequences")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:10 +08:00
Marek Vasut
da62ed5c01 hwrng: stm32 - put IP into RPM suspend on failure
In case of an irrecoverable failure, put the IP into RPM suspend
to avoid RPM imbalance. I did not trigger this case, but it seems
it should be done based on reading the code.

Fixes: b17bc6eb7c ("hwrng: stm32 - rework error handling in stm32_rng_read()")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Marek Vasut
31b57788a5 hwrng: stm32 - use logical OR in conditional
The conditional is used to check whether err is non-zero OR whether
reg variable is non-zero after clearing bits from it. This should be
done using logical OR, not bitwise OR, fix it.

Fixes: 6b85a7e141 ("hwrng: stm32 - implement STM32MP13x support")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Stefan Berger
01474b70a7 crypto: ecdh - Initialize ctx->private_key in proper byte order
The private key in ctx->private_key is currently initialized in reverse
byte order in ecdh_set_secret and whenever the key is needed in proper
byte order the variable priv is introduced and the bytes from
ctx->private_key are copied into priv while being byte-swapped
(ecc_swap_digits). To get rid of the unnecessary byte swapping initialize
ctx->private_key in proper byte order and clean up all functions that were
previously using priv or were called with ctx->private_key:

- ecc_gen_privkey: Directly initialize the passed ctx->private_key with
  random bytes filling all the digits of the private key. Get rid of the
  priv variable. This function only has ecdh_set_secret as a caller to
  create NIST P192/256/384 private keys.

- crypto_ecdh_shared_secret: Called only from ecdh_compute_value with
  ctx->private_key. Get rid of the priv variable and work with the passed
  private_key directly.

- ecc_make_pub_key: Called only from ecdh_compute_value with
  ctx->private_key. Get rid of the priv variable and work with the passed
  private_key directly.

Cc: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Stefan Berger
bd955a4e92 crypto: ecdh - Pass private key in proper byte order to check valid key
ecc_is_key_valid expects a key with the most significant digit in the last
entry of the digit array. Currently ecdh_set_secret passes a reversed key
to ecc_is_key_valid that then passes the rather simple test checking
whether the private key is in range [2, n-3]. For all current ecdh-
supported curves (NIST P192/256/384) the 'n' parameter is a rather large
number, therefore easily passing this test.

Throughout the ecdh and ecc codebase the variable 'priv' is used for a
private_key holding the bytes in proper byte order. Therefore, introduce
priv in ecdh_set_secret and copy the bytes from ctx->private_key into
priv in proper byte order by using ecc_swap_digits. Pass priv to
ecc_is_valid_key.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Dan Carpenter
5ae6d3f5c8 crypto: tegra - Fix some error codes
Return negative -ENOMEM, instead of positive ENOMEM.

Fixes: 0880bb3b00 ("crypto: tegra - Add Tegra Security Engine driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Geert Uytterhoeven
ee2615fa4d dt-bindings: crypto: starfive: Restore sort order
Restore alphabetical sort order of the list of supported compatible
values.

Fixes: 2ccf7a5d9c ("dt-bindings: crypto: starfive: Add jh8100 support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Lucas Segarra Fernandez
483fd65ce2 crypto: qat - validate slices count returned by FW
The function adf_send_admin_tl_start() enables the telemetry (TL)
feature on a QAT device by sending the ICP_QAT_FW_TL_START message to
the firmware. This triggers the FW to start writing TL data to a DMA
buffer in memory and returns an array containing the number of
accelerators of each type (slices) supported by this HW.
The pointer to this array is stored in the adf_tl_hw_data data
structure called slice_cnt.

The array slice_cnt is then used in the function tl_print_dev_data()
to report in debugfs only statistics about the supported accelerators.
An incorrect value of the elements in slice_cnt might lead to an out
of bounds memory read.
At the moment, there isn't an implementation of FW that returns a wrong
value, but for robustness validate the slice count array returned by FW.

Fixes: 69e7649f7c ("crypto: qat - add support for device telemetry")
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Hailey Mothershead
23e4099bdc crypto: aead,cipher - zeroize key buffer after use
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding
cryptographic information should be zeroized once they are no longer
needed. Accomplish this by using kfree_sensitive for buffers that
previously held the private key.

Signed-off-by: Hailey Mothershead <hailmo@amazon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Ard Biesheuvel
571e557cba crypto: arm64/aes-ce - Simplify round key load sequence
Tweak the round key logic so that they can be loaded using a single
branchless sequence using overlapping loads. This is shorter and
simpler, and puts the conditional branches based on the key size further
apart, which might benefit microarchitectures that cannot record taken
branches at every instruction. For these branches, use test-bit-branch
instructions that don't clobber the condition flags.

Note that none of this has any impact on performance, positive or
otherwise (and the branch prediction benefit would only benefit AES-192
which nobody uses). It does make for nicer code, though.

While at it, use \@ to generate the labels inside the macros, which is
more robust than using fixed numbers, which could clash inadvertently.
Also, bring aes-neon.S in line with these changes, including the switch
to test-and-branch instructions, to avoid surprises in the future when
we might start relying on the condition flags being preserved in the
chaining mode wrappers in aes-modes.S

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Uwe Kleine-König
3f4d1482da crypto: tegra - Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Fixes: 0880bb3b00 ("crypto: tegra - Add Tegra Security Engine driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Eric Biggers
543ea178fb crypto: x86/aes-xts - optimize size of instructions operating on lengths
x86_64 has the "interesting" property that the instruction size is
generally a bit shorter for instructions that operate on the 32-bit (or
less) part of registers, or registers that are in the original set of 8.

This patch adjusts the AES-XTS code to take advantage of that property
by changing the LEN parameter from size_t to unsigned int (which is all
that's needed and is what the non-AVX implementation uses) and using the
%eax register for KEYLEN.

This decreases the size of aes-xts-avx-x86_64.o by 1.2%.

Note that changing the kmovq to kmovd was going to be needed anyway to
make the AVX10/256 code really work on CPUs that don't support 512-bit
vectors (since the AVX10 spec says that 64-bit opmask instructions will
only be supported on processors that support 512-bit vectors).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Eric Biggers
e619723a85 crypto: x86/aes-xts - eliminate a few more instructions
- For conditionally subtracting 16 from LEN when decrypting a message
  whose length isn't a multiple of 16, use the cmovnz instruction.

- Fold the addition of 4*VL to LEN into the sub of VL or 16 from LEN.

- Remove an unnecessary test instruction.

This results in slightly shorter code, both source and binary.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Eric Biggers
2717e01fc3 crypto: x86/aes-xts - handle AES-128 and AES-192 more efficiently
Decrease the amount of code specific to the different AES variants by
"right-aligning" the sequence of round keys, and for AES-128 and AES-192
just skipping irrelevant rounds at the beginning.

This shrinks the size of aes-xts-avx-x86_64.o by 13.3%, and it improves
the efficiency of AES-128 and AES-192.  The tradeoff is that for AES-256
some additional not-taken conditional jumps are now executed.  But these
are predicted well and are cheap on x86.

Note that the ARMv8 CE based AES-XTS implementation uses a similar
strategy to handle the different AES variants.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Eric Biggers
ea9459ef36 crypto: x86/aesni-xts - deduplicate aesni_xts_enc() and aesni_xts_dec()
Since aesni_xts_enc() and aesni_xts_dec() are very similar, generate
them from a macro that's passed an argument enc=1 or enc=0.  This
reduces the length of aesni-intel_asm.S by 112 lines while still
producing the exact same object file in both 32-bit and 64-bit mode.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Eric Biggers
1d27e1f5c8 crypto: x86/aes-xts - handle CTS encryption more efficiently
When encrypting a message whose length isn't a multiple of 16 bytes,
encrypt the last full block in the main loop.  This works because only
decryption uses the last two tweaks in reverse order, not encryption.

This improves the performance of decrypting messages whose length isn't
a multiple of the AES block length, shrinks the size of
aes-xts-avx-x86_64.o by 5.0%, and eliminates two instructions (a test
and a not-taken conditional jump) when encrypting a message whose length
*is* a multiple of the AES block length.

While it's not super useful to optimize for ciphertext stealing given
that it's rarely needed in practice, the other two benefits mentioned
above make this optimization worthwhile.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Maxime Méré
3525fe4752 crypto: stm32/hash - add full DMA support for stm32mpx
Due to a lack of alignment in the data sent by requests, the actual DMA
support of the STM32 hash driver is only working with digest calls.
This patch, based on the algorithm used in the driver omap-sham.c,
allows for the usage of DMA in any situation.

It has been functionally tested on STM32MP15, STM32MP13 and STM32MP25.

By checking the performance of this new driver with OpenSSL, the
following results were found:

Performance:

(datasize: 4096, number of hashes performed in 10s)

|type   |no DMA    |DMA support|software  |
|-------|----------|-----------|----------|
|md5    |13873.56k |10958.03k  |71163.08k |
|sha1   |13796.15k |10729.47k  |39670.58k |
|sha224 |13737.98k |10775.76k  |22094.64k |
|sha256 |13655.65k |10872.01k  |22075.39k |

CPU Usage:

(algorithm used: sha256, computation time: 20s, measurement taken at
~10s)

|datasize  |no DMA |DMA  | software |
|----------|-------|-----|----------|
|  2048    | 56%   | 49% | 50%      |
|  4096    | 54%   | 46% | 50%      |
|  8192    | 53%   | 40% | 50%      |
| 16384    | 53%   | 33% | 50%      |

Note: this update doesn't change the driver performance without DMA.

As shown, performance with DMA is slightly lower than without, but in
most cases, it will save CPU time.

Signed-off-by: Maxime Méré <maxime.mere@foss.st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Adam Guerin
d281a28bd2 crypto: qat - improve error logging to be consistent across features
Improve error logging in rate limiting feature. Staying consistent with
the error logging found in the telemetry feature.

Fixes: d9fb840837 ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Adam Guerin
4a4fc6c0c7 crypto: qat - improve error message in adf_get_arbiter_mapping()
Improve error message to be more readable.

Fixes: 5da6a2d535 ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Eric Biggers
7daba20cc7 crypto: x86/sha256-ni - simplify do_4rounds
Instead of loading the message words into both MSG and \m0 and then
adding the round constants to MSG, load the message words into \m0 and
the round constants into MSG and then add \m0 to MSG.  This shortens the
source code slightly.  It changes the instructions slightly, but it
doesn't affect binary code size and doesn't seem to affect performance.

Suggested-by: Stefan Kanthak <stefan.kanthak@nexgo.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Eric Biggers
59e62b20ac crypto: x86/sha256-ni - optimize code size
- Load the SHA-256 round constants relative to a pointer that points
  into the middle of the constants rather than to the beginning.  Since
  x86 instructions use signed offsets, this decreases the instruction
  length required to access some of the later round constants.

- Use punpcklqdq or punpckhqdq instead of longer instructions such as
  pshufd, pblendw, and palignr.  This doesn't harm performance.

The end result is that sha256_ni_transform shrinks from 839 bytes to 791
bytes, with no loss in performance.

Suggested-by: Stefan Kanthak <stefan.kanthak@nexgo.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Eric Biggers
1b5ddb067d crypto: x86/sha256-ni - rename some register aliases
MSGTMP[0-3] are used to hold the message schedule and are not temporary
registers per se.  MSGTMP4 is used as a temporary register for several
different purposes and isn't really related to MSGTMP[0-3].  Rename them
to MSG[0-3] and TMP accordingly.

Also add a comment that clarifies what MSG is.

Suggested-by: Stefan Kanthak <stefan.kanthak@nexgo.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Eric Biggers
ffaec34b0f crypto: x86/sha256-ni - convert to use rounds macros
To avoid source code duplication, do the SHA-256 rounds using macros.
This reduces the length of sha256_ni_asm.S by 153 lines while still
producing the exact same object file.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Damian Muszynski
5d5bd24f41 crypto: qat - implement dh fallback for primes > 4K
The Intel QAT driver provides support for the Diffie-Hellman (DH)
algorithm, limited to prime numbers up to 4K. This driver is used
by default on platforms with integrated QAT hardware for all DH requests.
This has led to failures with algorithms requiring larger prime sizes,
such as ffdhe6144.

  alg: ffdhe6144(dh): test failed on vector 1, err=-22
  alg: self-tests for ffdhe6144(qat-dh) (ffdhe6144(dh)) failed (rc=-22)

Implement a fallback mechanism when an unsupported request is received.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Eric Biggers
b924ecd305 crypto: x86/aes-xts - access round keys using single-byte offsets
Access the AES round keys using offsets -7*16 through 7*16, instead of
0*16 through 14*16.  This allows VEX-encoded instructions to address all
round keys using 1-byte offsets, whereas before some needed 4-byte
offsets.  This decreases the code size of aes-xts-avx-x86_64.o by 4.2%.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Chen Ni
6a6d6a3a32 crypto: octeontx2 - add missing check for dma_map_single
Add check for dma_map_single() and return error if it fails in order
to avoid invalid dma address.

Fixes: e92971117c ("crypto: octeontx2 - add ctx_val workaround")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Eric Biggers
751fb2528c crypto: x86/aes-xts - make non-AVX implementation use new glue code
Make the non-AVX implementation of AES-XTS (xts-aes-aesni) use the new
glue code that was introduced for the AVX implementations of AES-XTS.
This reduces code size, and it improves the performance of xts-aes-aesni
due to the optimization for messages that don't span page boundaries.

This required moving the new glue functions higher up in the file and
allowing the IV encryption function to be specified by the caller.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12 15:07:53 +08:00
Lukas Wunner
5c6ca9d936 X.509: Introduce scope-based x509_certificate allocation
Add a DEFINE_FREE() clause for x509_certificate structs and use it in
x509_cert_parse() and x509_key_preparse().  These are the only functions
where scope-based x509_certificate allocation currently makes sense.
A third user will be introduced with the forthcoming SPDM library
(Security Protocol and Data Model) for PCI device authentication.

Unlike most other DEFINE_FREE() clauses, this one checks for IS_ERR()
instead of NULL before calling x509_free_certificate() at end of scope.
That's because the "constructor" of x509_certificate structs,
x509_cert_parse(), returns a valid pointer or an ERR_PTR(), but never
NULL.

Comparing the Assembler output before/after has shown they are identical,
save for the fact that gcc-12 always generates two return paths when
__cleanup() is used, one for the success case and one for the error case.

In x509_cert_parse(), add a hint for the compiler that kzalloc() never
returns an ERR_PTR().  Otherwise the compiler adds a gratuitous IS_ERR()
check on return.  Introduce an assume() macro for this which can be
re-used elsewhere in the kernel to provide hints for the compiler.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Link: https://lore.kernel.org/all/20231003153937.000034ca@Huawei.com/
Link: https://lwn.net/Articles/934679/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12 15:07:53 +08:00
Chenghai Huang
c9ccfd5e0f crypto: hisilicon/qm - Add the err memory release process to qm uninit
When the qm uninit command is executed, the err data needs to
be released to prevent memory leakage. The error information
release operation and uacce_remove are integrated in
qm_remove_uacce.

So add the qm_remove_uacce to qm uninit to avoid err memory
leakage.

Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12 15:07:53 +08:00