accel/habanalabs: fix bug in timestamp interrupt handling

There is a potential race between user thread seeking to re-use
a timestamp record with new interrupt id, while this record is still
in the middle of interrupt handling and it is about to be freed.
Imagine the driver set the record in_use to 0 and only then fill the
free_node information. This might lead to unpleasant scenario where
the new registration thread detects the record as free to use, and
change the cq buff address. That will cause the free_node to get
the wrong buffer address to put refcount to.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
farah kassabri 2023-08-24 15:45:21 +03:00 committed by Oded Gabbay
parent d89d329a2b
commit 0165994c21

View File

@ -259,8 +259,6 @@ static int handle_registration_node(struct hl_device *hdev, struct hl_user_pendi
dev_dbg(hdev->dev, "Irq handle: Timestamp record (%p) ts cb address (%p), interrupt_id: %u\n",
pend, pend->ts_reg_info.timestamp_kernel_addr, interrupt_id);
/* Mark kernel CB node as free */
pend->ts_reg_info.in_use = false;
list_del(&pend->wait_list_node);
/* Putting the refcount for ts_buff and cq_cb objects will be handled
@ -270,6 +268,9 @@ static int handle_registration_node(struct hl_device *hdev, struct hl_user_pendi
free_node->cq_cb = pend->ts_reg_info.cq_cb;
list_add(&free_node->free_objects_node, *free_list);
/* Mark TS record as free */
pend->ts_reg_info.in_use = false;
return 0;
}