linux/drivers
Tvrtko Ursulin 4aa923a6e6 drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40bc9 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f96)
Cc: stable@vger.kernel.org # v6.6+
2024-10-28 17:14:08 -04:00
..
accel accel/qaic: Fix the for loop used to walk SG table 2024-10-12 14:55:55 -06:00
accessibility
acpi ACPI fixes for 6.12-rc3 2024-10-11 11:32:10 -07:00
amba
android binder: modify the comment for binder_proc_unlock 2024-09-11 16:02:45 +02:00
ata ata: libata: avoid superfluous disk spin down + spin up during hibernation 2024-10-09 16:21:19 +02:00
atm
auxdisplay move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
base pmdomain core: 2024-10-11 11:26:15 -07:00
bcma PCI: Rename CRS Completion Status to RRS 2024-09-10 19:52:30 -05:00
block block-6.12-20241018 2024-10-18 15:53:00 -07:00
bluetooth Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001 2024-10-16 16:10:25 -04:00
bus Driver core update for 6.12-rc1 2024-09-27 08:48:37 -07:00
cache
cdrom cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed() 2024-10-17 19:47:15 -06:00
cdx
char virtio: bugfixes 2024-10-07 11:33:26 -07:00
clk Two clk driver fixes and a unit test fix: 2024-10-17 16:24:42 -07:00
clocksource Updates for x86 timers: 2024-09-17 15:27:01 +02:00
comedi move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
connector
counter move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
cpufreq cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled 2024-10-15 23:54:15 -05:00
cpuidle pmdomain core: 2024-09-18 10:49:45 +02:00
crypto This push fixes the following issues: 2024-10-16 08:42:54 -07:00
cxl move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
dax device-dax: correct pgoff align in dax_set_mapping() 2024-10-09 12:47:19 -07:00
dca
devfreq
dio
dma dmaengine: cirrus: check that output may be truncated 2024-10-11 09:55:47 +00:00
dma-buf drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
dpll
edac - Drop a now obsolete ppc4xx_edac driver 2024-09-16 06:36:37 +02:00
eisa
extcon Char/Misc and other driver changes for 6.12-rc1 2024-09-26 10:13:08 -07:00
firewire move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
firmware Arm SCMI fixes for v6.12 2024-10-15 20:39:43 +00:00
fpga move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
fsi move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
gnss [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
gpio gpio: aspeed: Use devm_clk api to manage clock source 2024-10-08 16:01:58 +02:00
gpu drm/amd/pm: Vangogh: Fix kernel memory out of bounds write 2024-10-28 17:14:08 -04:00
greybus move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
hid hid-for-linus-2024101301 2024-10-13 16:35:20 -07:00
hsi
hte
hv drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
hwmon [PATCH} hwmon: (jc42) Properly detect TSE2004-compliant devices again 2024-10-14 19:14:08 -07:00
hwspinlock
hwtracing [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
i2c i2c-for-6.12-rc2 2024-10-05 10:31:04 -07:00
i3c i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition 2024-09-17 16:51:45 +02:00
idle intel_idle: fix ACPI _CST matching for newer Xeon platforms 2024-09-25 22:30:33 +02:00
iio IIO: 1st set of fixes for the 6.12 cycle. 2024-10-13 17:23:47 +02:00
infiniband RDMA/bnxt_re: Fix the GID table length 2024-10-11 20:49:02 -03:00
input Input updates for v6.12-rc3 2024-10-19 10:18:03 -07:00
interconnect
iommu iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devices 2024-10-15 10:17:54 +02:00
ipack
irqchip irqchip/renesas-rzg2l: Fix missing put_device 2024-10-15 23:54:35 +02:00
isdn move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
leds move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
macintosh move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
mailbox mailbox, remoteproc: omap2+: fix compile testing 2024-09-27 09:11:05 -05:00
mcb
md Getting rid of asm/unaligned.h includes 2024-10-02 16:42:28 -07:00
media move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
memory
memstick move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
message SCSI misc on 20240928 2024-09-29 09:22:34 -07:00
mfd move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
misc Char/Misc/IIO fixes for 6.12-rc4 2024-10-20 13:10:44 -07:00
mmc MMC core: 2024-10-11 11:23:21 -07:00
most
mtd move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
mux
net net/mlx5e: Don't call cleanup on profile rollback failure 2024-10-17 12:14:07 +02:00
nfc move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
ntb ntb: Force physically contiguous allocation of rx ring buffers 2024-09-20 10:51:25 -04:00
nubus
nvdimm virtio: features, fixes, cleanups 2024-09-26 08:43:17 -07:00
nvme block-6.12-20241018 2024-10-18 15:53:00 -07:00
nvmem Char/Misc and other driver changes for 6.12-rc1 2024-09-26 10:13:08 -07:00
of of: Skip kunit tests when arm64+ACPI doesn't populate root node 2024-10-10 12:43:01 -05:00
opp OPP: fix error code in dev_pm_opp_set_config() 2024-10-02 01:27:50 +02:00
parisc parisc: pdc_stable: Constify struct kobj_type 2024-09-09 08:53:17 +02:00
parport parport: Proper fix for array out-of-bounds access 2024-10-13 18:17:35 +02:00
pci move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
pcmcia move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
peci move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
perf drivers/perf: riscv: Align errno for unsupported perf event 2024-10-01 02:47:39 -07:00
phy phy-for-6.12 2024-09-23 14:05:10 -07:00
pinctrl pinctrl: ocelot: fix system hang on level based interrupts 2024-10-12 22:04:38 +02:00
platform platform-drivers-x86 for v6.12-2 2024-10-06 11:11:01 -07:00
pmdomain pmdomain: qcom-cpr: Fix the return of uninitialized variable 2024-10-02 12:38:53 +02:00
pnp
power move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
powercap powercap: intel_rapl_msr: Add PL4 support for ArrowLake-H 2024-10-16 22:34:03 +02:00
pps [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
ps3
ptp move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
pwm soc: convert ep93xx to devicetree 2024-09-26 12:00:25 -07:00
rapidio
ras
regulator regulator: sm5703: Remove because it is unused and fails to build 2024-09-13 19:08:14 +01:00
remoteproc mhu-v3, omap2+ : fix kconfig dependencies 2024-09-29 09:53:04 -07:00
reset reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC 2024-09-30 14:24:37 +02:00
rpmsg rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings 2024-09-13 14:09:47 -07:00
rtc move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
s390 s390/sclp_vt220: Convert newlines to CRLF instead of LFCR 2024-10-16 11:32:32 +02:00
sbus [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
scsi SCSI fixes on 20241019 2024-10-19 12:52:19 -07:00
sh sh: intc: Replace simple_strtoul() with kstrtoul() 2024-09-26 17:25:29 +02:00
siox
slimbus
soc FSL SOC fixes for v6.12: 2024-10-11 10:03:13 +00:00
soundwire soundwire updates for 6.12 2024-09-23 14:00:46 -07:00
spi spi: Fixes for v6.12 2024-10-05 10:25:04 -07:00
spmi
ssb
staging move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
target SCSI fixes on 20241019 2024-10-19 12:52:19 -07:00
tc
tee optee: Fix a NULL vs IS_ERR() check 2024-09-09 12:22:06 +02:00
thermal Power management updates for 6.12-rc3 2024-10-11 11:41:20 -07:00
thunderbolt thunderbolt: Changes for v6.12 merge window 2024-09-11 15:17:43 +02:00
tty serial: qcom-geni: rename suspend functions 2024-10-11 08:39:24 +02:00
ufs SCSI fixes on 20241019 2024-10-19 12:52:19 -07:00
uio uio: Constify struct kobj_type 2024-09-11 16:02:54 +02:00
usb USB-serial device ids for 6.12-rc4 2024-10-18 12:11:28 +02:00
vdpa virtio: bugfixes 2024-10-07 11:33:26 -07:00
vfio [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
vhost virtio: bugfixes 2024-10-07 11:33:26 -07:00
video fbdev: Switch back to struct platform_driver::remove() 2024-10-08 21:47:18 +02:00
virt [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
virtio virtio: bugfixes 2024-10-07 11:33:26 -07:00
w1 w1: ds2482: Drop explicit initialization of struct i2c_device_id::driver_data to 0 2024-09-06 19:18:32 +02:00
watchdog move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
xen xen: Remove dependency between pciback and privcmd 2024-10-18 11:59:04 +02:00
zorro
Kconfig
Makefile