mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-01 10:43:43 +00:00
drm/doc: Document submission error signaling
Different approaches have been tried to signal resets and other errors in vendor specific ways which not only resulted in a wide variety of implementations but also repeating the same bugs and problems over different drivers. Document that drivers should use dma_fence based error signaling which is vendor agnostic and allows userspace to query submission errors in generic non-vendor specific code. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-3-christian.koenig@amd.com
This commit is contained in:
parent
a401bd1264
commit
f07a0d1bf7
@ -305,13 +305,26 @@ Kernel Mode Driver
|
||||
------------------
|
||||
|
||||
The KMD is responsible for checking if the device needs a reset, and to perform
|
||||
it as needed. Usually a hang is detected when a job gets stuck executing. KMD
|
||||
should keep track of resets, because userspace can query any time about the
|
||||
reset status for a specific context. This is needed to propagate to the rest of
|
||||
the stack that a reset has happened. Currently, this is implemented by each
|
||||
driver separately, with no common DRM interface. Ideally this should be properly
|
||||
integrated at DRM scheduler to provide a common ground for all drivers. After a
|
||||
reset, KMD should reject new command submissions for affected contexts.
|
||||
it as needed. Usually a hang is detected when a job gets stuck executing.
|
||||
|
||||
Propagation of errors to userspace has proven to be tricky since it goes in
|
||||
the opposite direction of the usual flow of commands. Because of this vendor
|
||||
independent error handling was added to the &dma_fence object, this way drivers
|
||||
can add an error code to their fences before signaling them. See function
|
||||
dma_fence_set_error() on how to do this and for examples of error codes to use.
|
||||
|
||||
The DRM scheduler also allows setting error codes on all pending fences when
|
||||
hardware submissions are restarted after an reset. Error codes are also
|
||||
forwarded from the hardware fence to the scheduler fence to bubble up errors
|
||||
to the higher levels of the stack and eventually userspace.
|
||||
|
||||
Fence errors can be queried by userspace through the generic SYNC_IOC_FILE_INFO
|
||||
IOCTL as well as through driver specific interfaces.
|
||||
|
||||
Additional to setting fence errors drivers should also keep track of resets per
|
||||
context, the DRM scheduler provides the drm_sched_entity_error() function as
|
||||
helper for this use case. After a reset, KMD should reject new command
|
||||
submissions for affected contexts.
|
||||
|
||||
User Mode Driver
|
||||
----------------
|
||||
|
Loading…
Reference in New Issue
Block a user