accel/habanalabs: set device status 'malfunction' while in rmmod
hl_device_status() returns the status of an acquired device. If a device is going down (following an rmmod cmd), it should be marked as an unusable/malfunctioning device, and hence should not be acquired. However, since this was not the case so far (i.e., a device going down would inaccurately return 'in reset' status allowing the user to acquire the device) it introduced a bug where as part of a reset flow, the driver could not kill processes that have not run yet, and since those processes aren't blocked from reacquiring a device, we get eventually a new flow of a driver attempting to kill all processes in a list that can't be ever really empty. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Showing
Please register or sign in to comment