- 22 Jun, 2021 4 commits
-
-
Tomas Winkler authored
The mei extension header was build as array of flexible structures which will not work if actually more headers are added. (Currently only vtag header was used). Sparse reports: drivers/misc/mei/hw.h:253:32: warning: array of flexible structures Use basic type u8 for the variable sized extension. Define explicitly mei_ext_hdr_vtag structure. And also fix mei_ext_next() function to point correctly to the end of the header. Note: the headers are part of firmware interface and need to be __packed. Signed-off-by:
Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210621193756.134027-2-tomas.winkler@intel.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tamar Mashiah authored
Over time the functions were renamed, but this was not always reflected in kdoc, fix that. Signed-off-by:
Tamar Mashiah <tamar.mashiah@intel.com> Signed-off-by:
Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210621193756.134027-1-tomas.winkler@intel.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
Merge tag 'misc-habanalabs-next-2021-06-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next Oded writes: This tag contains habanalabs driver changes for v5.14: - Change communication protocol with f/w. The new protocl allows better backward compatibility between different f/w versions and is more stable. - Send hard-reset cause to f/w after a hard-reset has happened. - Move to indirection when generating interrupts to f/w. - Better progress and error messages during the f/w load stage. - Recognize that f/w is with enabled security according to device ID. - Add validity check to event queue mechanism. - Add new event from f/w that will indicate a daemon has been terminated inside the f/w. - Move to TLB cache range invalidation in the device's MMU. - Disable memory scrubbing by default for performance. - Many fixes for sparse/smatch reported errors. - Enable by default stop-on-err in the ASIC. - Move to ASYNC device probing to speedup loading of driver in server with multiple devices. - Fix to stop using disabled NIC ports when doing collective operation. - Use standard error codes instead of positive values. - Add support for resetting device after user has finished using it. - Add debugfs option to avoid reset when a CS has got stuck. - Add print of the last 8 CS pointers in case of error in QMANs. - Add statistics on opening of the FD of a device. * tag 'misc-habanalabs-next-2021-06-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (72 commits) habanalabs/gaudi: refactor hard-reset related code habanalabs/gaudi: add support for NIC DERR habanalabs: add validity check for signal cs habanalabs: get lower/upper 32 bits via masking habanalabs: allow reset upon device release debugfs: add skip_reset_on_timeout option habanalabs: fix typo habanalabs/gaudi: correct driver events numbering habanalabs: remove a rogue #ifdef habanalabs/gaudi: print last QM PQEs on error habanalabs/goya: add '__force' attribute to suppress false alarm habanalabs: added open_stats info ioctl habanalabs/gaudi: set the correct rc in case of err habanalabs/gaudi: update coresight configuration habanalabs: remove node from list before freeing the node habanalabs: set rc as 'valid' in case of intentional func exit habanalabs: zero complex structures using memset habanalabs: print more info when failing to pin user memory habanalabs: Fix an error handling path in 'hl_pci_probe()' habanalabs: print firmware versions ...
-
Greg Kroah-Hartman authored
Merge tag 'soundwire-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-next Vinod writes: soundwire updates for 5.14-rc1 Updates for v5.14-rc1 are: - Core has odd updates including improving clock stop codes, write api, handling ENODATA etc - Drivers has Big move of Intel driver to be aux dev and minor updates to Intel/cadence driver * tag 'soundwire-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: stream: Fix test for DP prepare complete soundwire: bus: Make sdw_nwrite() data pointer argument const soundwire: intel: move to auxiliary bus soundwire: cadence: remove the repeated declaration soundwire: dmi-quirks: remove duplicate initialization soundwire: cadence_master: always set CMD_ACCEPT soundwire: bus: add missing \n in dynamic debug soundwire: bus: handle -ENODATA errors in clock stop/start sequences soundwire: add missing kernel-doc description soundwire: bus: only use CLOCK_STOP_MODE0 and fix confusions soundwire: bandwidth allocation: improve error messages soundwire/ASoC: add leading zeroes in peripheral device name
-
- 21 Jun, 2021 5 commits
-
-
Koby Elbaz authored
There is code related to hard-reset, which is done in gaudi specific code. However, this code can be used by future ASICs and therefore it is better to move it to the common code section. Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
We add support for NIC DERR ECC error events, in case this error is received a device reset will be performed. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
farah kassabri authored
In preparation for a new feature that allows the user to reserve signals ahead of submissions, we need to change a current assumption in the code. Currently, the driver uses 2 SOBs to support signal CS. When the first SOB reaches max value, the driver switches to the other one and assumes that when it will need to switch back to the first one, all of the signals have already been handled. This assumption won't hold when the new feature will be added, because using signal reservation, the driver can reach the max SOB value very fast. The change is to add a validity check when submitting a signal CS, to make sure the previous SOB is available (all the signals attached to it indeed finished). Signed-off-by:
farah kassabri <fkassabri@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
fix multiple similar occurrences of the following sparse warning: 'warning: cast truncates bits from constant value (7ffc113000 becomes fc113000)' Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
We introduce a new type of reset which is reset upon device release. This reset is very similar to soft reset except the fact it is performed only upon device release and not upon user sysfs request nor TDR. The purpose of this reset is to make sure the device is returned to IDLE state after the current user has finished working with the device. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
- 20 Jun, 2021 2 commits
-
-
Richard Fitzgerald authored
In sdw_prep_deprep_slave_ports(), after the wait_for_completion() the DP prepare status register is read. If this indicates that the port is now prepared, the code should continue with the port setup. It is irrelevant whether the wait_for_completion() timed out if the port is now ready. The previous implementation would always fail if the wait_for_completion() timed out, even if the port was reporting successful prepare. This patch also fixes a minor bug where the return from sdw_read() was not checked for error - any error code with LSBits clear could be misinterpreted as a successful port prepare. Fixes: 79df15b7 ("soundwire: Add helpers for ports operations") Signed-off-by:
Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20210618144745.30629-1-rf@opensource.cirrus.comSigned-off-by:
Vinod Koul <vkoul@kernel.org>
-
Richard Fitzgerald authored
Idiomatically, write functions should take const pointers to the data buffer, as they don't change the data. They are also likely to be called from functions that receive a const data pointer. Internally the pointer is passed to function/structs shared with the read functions, requiring a cast, but this is an implementation detail that should be hidden by the public API. Signed-off-by:
Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://lore.kernel.org/r/20210616145901.29402-1-rf@opensource.cirrus.comSigned-off-by:
Vinod Koul <vkoul@kernel.org>
-
- 18 Jun, 2021 29 commits
-
-
Yuri Nudelman authored
To be able to debug long-running CS better, without changing the userspace code, we are adding a new option through debugfs interface to skip the reset of the device in case of CS timeout. Signed-off-by:
Yuri Nudelman <ynudelman@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Zvika Yehudai authored
fix a spelling error in comment Signed-off-by:
Zvika Yehudai <zyehudai@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
Currently driver sends fc interrupt id to FW instead of using cpu interrupt id. We intend to fix that and keep backward compatibility by using the same interrupt values. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
There was a rogue #ifdef that crept into the upstream code for backwards compatibility which isn't needed of course. Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ohad Sharabi authored
In case QMAN has an error and stop_on_err is true, print specific information of the "offending" command buffer batch. If the error occurred on one of the higher CPs, the CQ pointer and size will be printed along with (up to) last 8 PQEs of the stream. If the error occurred in the lower CP, the CQ pointer and size will be printed along with (up to) last 8 PQEs of ALL upper CPs as we have no way to know which upper CP sent the job there. This is done so higher SW levels will be able to debug their CS by extracting the raw data of the offending command buffer batch and examine those offline to detect the issue. Signed-off-by:
Ohad Sharabi <osharabi@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
fix (suppress) the following sparse warnings: 'warning: cast removes address space of expression' Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Yuri Nudelman authored
In a system with multiple ASICs, there is a need to provide monitoring tools with information on how long a device was opened and how many times a device was opened. Therefore, we add a new opcode to the INFO ioctl to provide that information. Signed-off-by:
Yuri Nudelman <ynudelman@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
fix the following smatch warnings: gaudi_internal_cb_pool_init() warn: missing error code 'rc' Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Tal Albo authored
Update STMTCSR and STMSYNCR values in order to reduce amount of sync packets Signed-off-by:
Tal Albo <talbo@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
fix the following smatch warnings: goya_pin_memory_before_cs() warn: '&userptr->job_node' not removed from list gaudi_pin_memory_before_cs() warn: '&userptr->job_node' not removed from list Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
fix the following smatch warnings: hl_fw_static_init_cpu() warn: missing error code 'rc' Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
fix the following sparse warnings: 'warning: Using plain integer as NULL pointer' 'warning: missing braces around initializer' Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
pin_user_pages_fast() might fail and return a negative number, or pin less pages than requested and return the number of the pages that were pinned. For the latter, it is informative to print also the memory size and the number of requested pages. Signed-off-by:
Tomer Tayar <ttayar@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Christophe JAILLET authored
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function. Fixes: 2e5eda46 ("habanalabs: PCIe Advanced Error Reporting support") Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
Firmware in habanalabs devices is composed of several components. During device initialization, we read these versions from the device. Print them during device initialization to allow better visibility in automated systems. Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Omer Shpigelman authored
Hard reset flow on PLDM might take more than 2 minutes. Hence add a dedicated hard reset timeout of 6 minutes for PLDM. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Bharat Jauhari authored
In current code, for dynamic f/w loading flow, DRAM scrambling is enabled post Linux fit image is loaded to the card. This can cause the device CPU to go into reset state. The correct sequence should be: 1. Load boot fit image 2. Enable scrambling 3. Load Linux fit image This commit aligns the DRAM scrambling enabling with the static f/w load flow. Signed-off-by:
Bharat Jauhari <bjauhari@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
If there is an error in the QMAN/engine, there is no point of trying to continue running the workload. It is better to stop to allow the user to debug the program. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ohad Sharabi authored
In case we have EQ fault we would like to know about it. For this, a status bitmask was added in which EQ_FAULT bit is set by FW in case of EQ fault. Signed-off-by:
Ohad Sharabi <osharabi@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
Use datatype defines instead of hard coded values, and rename set_fixed_properties function. Signed-off-by:
Koby Elbaz <kelbaz@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
When there is an ECC error in the HBM, return a standard error code, -EIO in this case, and not a positive value. Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ohad Sharabi authored
When converting virtual address to physical we need to add correct offset to the physical page. For this we need to use mask that include ALL bits of page offset. Signed-off-by:
Ohad Sharabi <osharabi@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ohad Sharabi authored
Get rid of the need to check if boot_dev_sts is valid on every access to value read from these registers. This is done by storing the register value in hdev props ONLY if register is enabled. This way if register is NOT enabled all capability bits will not be set. Signed-off-by:
Ohad Sharabi <osharabi@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
If device is not idle after user closes the FD we must reset device as next user that will try to open FD will encounter a non-functional device. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Yuri Nudelman authored
Sometimes it is useful to allow the command to continue running despite the timeout occurred, to differentiate between really stuck or just very time consuming commands. This can be achieved by passing a new debug flag alongside the cs, HL_CS_FLAGS_SKIP_RESET_ON_TIMEOUT. Anyway, if the timeout occurred, a warning print shall be issued, however this shall not fail the submission. Signed-off-by:
Yuri Nudelman <ynudelman@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
In order for driver to be aware of process or thread crashes inside GAUDI's CPU, we introduce a new event which contains all relevant information. Upon event reception, driver will dump information and will reset the device. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
In the collective wait, we put jobs on the QMANs of all the NICs. The code takes into account if a port is disabled only in case of PCI card. When this info arrives from the f/w, the code doesn't take it into account, and it tries to schedule jobs on NICs that aren't enabled and thats a bug. To fix this, after the f/w sends us the list of disabled ports, we update the state of the QMANs according to that list. In addition, we need to update the HW_CAP bits so the collective wait operation will not try to use those QMANs. We also need to update the collective master monitor mask. Moreover, we need to add a protection for such future cases and in case the user will try to submit work to those QMANs. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
Update the firmware interface files to their latest version. Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
Current implementation uses a single interrupt interface towards FW, this interface is causing races between interrupt types. We split this interface to interface per interrupt type. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <ogabbay@kernel.org> Signed-off-by:
Oded Gabbay <ogabbay@kernel.org>
-