Commit bb8e97e2 authored by Carl Vanderlip's avatar Carl Vanderlip Committed by Jeffrey Hugo

accel/qaic: Enable 1 MSI fallback mode

Several virtualization use-cases either don't support 32 MultiMSIs
(Xen/VMware) or have significant drawbacks to their use (KVM's vIOMMU,
which is required to support 32 MSI, needs to allocate an alternate
system memory space for each device using vIOMMU (e.g. 8GB VM mem and
2 cards => 8 + 2 * 8 = 24GB host memory required)). Support these
cases by enabling a 1 MSI fallback mode.

Whenever all 32 MSIs requested are not available, a second request for
a single MSI is made. Its success is the initiator of single MSI mode.
This mode causes all interrupts generated by the device to be directed
to the 0th MSI (firmware >=v1.10 will do this as a response to the PCIe
MSI capability configuration). Likewise, all interrupt handlers for the
device are registered to the 0th MSI.

Since the DBC interrupt handler checks if the DBC is in use or if
there is any pending changes, the 'spurious' interrupts are
disregarded. If there is work to be done, the standard threaded IRQ
handler is dispatched.

On every interrupt, the MHI handler wakes up its threaded interrupt
handler, and attempts to wake any waiters for MHI state events.

Performance is within +-0.6% for test cases that typify real world
use. Larger differences ([-4,+132]%, avg +47%) exist for very simple
tasks (e.g. addition) compiled for single NSPs. It is assumed that the
small work and many interrupts typically cause contention (e.g. 16 NSPs
vs 4 CPUs), as evidenced by the standard deviation between runs also
decreasing (r=-0.48 between delta(Performace_test) and
delta(StdDev_test/Avg_test))
Signed-off-by: default avatarCarl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: default avatarPranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: default avatarJeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: default avatarJeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: default avatarStanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231016170036.5409-1-quic_jhugo@quicinc.com
parent 88b02ebc
...@@ -36,8 +36,9 @@ AIC100 DID (0xa100). ...@@ -36,8 +36,9 @@ AIC100 DID (0xa100).
AIC100 does not implement FLR (function level reset). AIC100 does not implement FLR (function level reset).
AIC100 implements MSI but does not implement MSI-X. AIC100 requires 17 MSIs to AIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to
operate (1 for MHI, 16 for the DMA Bridge). operate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in
scenarios where reserving 32 MSIs isn't feasible.
As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
hardware. AIC100 provides 3, 64-bit BARs. hardware. AIC100 provides 3, 64-bit BARs.
......
...@@ -10,6 +10,9 @@ accelerator products. ...@@ -10,6 +10,9 @@ accelerator products.
Interrupts Interrupts
========== ==========
IRQ Storm Mitigation
--------------------
While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
mechanism, it is still possible for an IRQ storm to occur. A storm can happen mechanism, it is still possible for an IRQ storm to occur. A storm can happen
if the workload is particularly quick, and the host is responsive. If the host if the workload is particularly quick, and the host is responsive. If the host
...@@ -35,6 +38,26 @@ generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64 ...@@ -35,6 +38,26 @@ generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
IRQs over 5 minutes while keeping the host system stable, and having the same IRQs over 5 minutes while keeping the host system stable, and having the same
workload throughput performance (within run to run noise variation). workload throughput performance (within run to run noise variation).
Single MSI Mode
---------------
MultiMSI is not well supported on all systems; virtualized ones even less so
(circa 2023). Between hypervisors masking the PCIe MSI capability structure to
large memory requirements for vIOMMUs (required for supporting MultiMSI), it is
useful to be able to fall back to a single MSI when needed.
To support this fallback, we allow the case where only one MSI is able to be
allocated, and share that one MSI between MHI and the DBCs. The device detects
when only one MSI has been configured and directs the interrupts for the DBCs
to the interrupt normally used for MHI. Unfortunately this means that the
interrupt handlers for every DBC and MHI wake up for every interrupt that
arrives; however, the DBC threaded irq handlers only are started when work to be
done is detected (MHI will always start its threaded handler).
If the DBC is configured to force MSI interrupts, this can circumvent the
software IRQ storm mitigation mentioned above. Since the MSI is shared it is
never disabled, allowing each new entry to the FIFO to trigger a new interrupt.
Neural Network Control (NNC) Protocol Neural Network Control (NNC) Protocol
===================================== =====================================
......
...@@ -468,7 +468,7 @@ static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntrl) ...@@ -468,7 +468,7 @@ static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntrl)
} }
struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar, struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar,
int mhi_irq) int mhi_irq, bool shared_msi)
{ {
struct mhi_controller *mhi_cntrl; struct mhi_controller *mhi_cntrl;
int ret; int ret;
...@@ -500,6 +500,10 @@ struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, voi ...@@ -500,6 +500,10 @@ struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, voi
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
mhi_cntrl->irq[0] = mhi_irq; mhi_cntrl->irq[0] = mhi_irq;
if (shared_msi) /* MSI shared with data path, no IRQF_NO_SUSPEND */
mhi_cntrl->irq_flags = IRQF_SHARED;
mhi_cntrl->fw_image = "qcom/aic100/sbl.bin"; mhi_cntrl->fw_image = "qcom/aic100/sbl.bin";
/* use latest configured timeout */ /* use latest configured timeout */
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
#define MHICONTROLLERQAIC_H_ #define MHICONTROLLERQAIC_H_
struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar, struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar,
int mhi_irq); int mhi_irq, bool shared_msi);
void qaic_mhi_free_controller(struct mhi_controller *mhi_cntrl, bool link_up); void qaic_mhi_free_controller(struct mhi_controller *mhi_cntrl, bool link_up);
void qaic_mhi_start_reset(struct mhi_controller *mhi_cntrl); void qaic_mhi_start_reset(struct mhi_controller *mhi_cntrl);
void qaic_mhi_reset_done(struct mhi_controller *mhi_cntrl); void qaic_mhi_reset_done(struct mhi_controller *mhi_cntrl);
......
...@@ -123,6 +123,8 @@ struct qaic_device { ...@@ -123,6 +123,8 @@ struct qaic_device {
struct srcu_struct dev_lock; struct srcu_struct dev_lock;
/* true: Device under reset; false: Device not under reset */ /* true: Device under reset; false: Device not under reset */
bool in_reset; bool in_reset;
/* true: single MSI is used to operate device */
bool single_msi;
/* /*
* true: A tx MHI transaction has failed and a rx buffer is still queued * true: A tx MHI transaction has failed and a rx buffer is still queued
* in control device. Such a buffer is considered lost rx buffer * in control device. Such a buffer is considered lost rx buffer
......
...@@ -1466,6 +1466,16 @@ irqreturn_t dbc_irq_handler(int irq, void *data) ...@@ -1466,6 +1466,16 @@ irqreturn_t dbc_irq_handler(int irq, void *data)
rcu_id = srcu_read_lock(&dbc->ch_lock); rcu_id = srcu_read_lock(&dbc->ch_lock);
if (datapath_polling) {
srcu_read_unlock(&dbc->ch_lock, rcu_id);
/*
* Normally datapath_polling will not have irqs enabled, but
* when running with only one MSI the interrupt is shared with
* MHI so it cannot be disabled. Return ASAP instead.
*/
return IRQ_HANDLED;
}
if (!dbc->usr) { if (!dbc->usr) {
srcu_read_unlock(&dbc->ch_lock, rcu_id); srcu_read_unlock(&dbc->ch_lock, rcu_id);
return IRQ_HANDLED; return IRQ_HANDLED;
...@@ -1488,7 +1498,8 @@ irqreturn_t dbc_irq_handler(int irq, void *data) ...@@ -1488,7 +1498,8 @@ irqreturn_t dbc_irq_handler(int irq, void *data)
return IRQ_NONE; return IRQ_NONE;
} }
disable_irq_nosync(irq); if (!dbc->qdev->single_msi)
disable_irq_nosync(irq);
srcu_read_unlock(&dbc->ch_lock, rcu_id); srcu_read_unlock(&dbc->ch_lock, rcu_id);
return IRQ_WAKE_THREAD; return IRQ_WAKE_THREAD;
} }
...@@ -1559,12 +1570,12 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data) ...@@ -1559,12 +1570,12 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data)
u32 tail; u32 tail;
rcu_id = srcu_read_lock(&dbc->ch_lock); rcu_id = srcu_read_lock(&dbc->ch_lock);
qdev = dbc->qdev;
head = readl(dbc->dbc_base + RSPHP_OFF); head = readl(dbc->dbc_base + RSPHP_OFF);
if (head == U32_MAX) /* PCI link error */ if (head == U32_MAX) /* PCI link error */
goto error_out; goto error_out;
qdev = dbc->qdev;
read_fifo: read_fifo:
if (!event_count) { if (!event_count) {
...@@ -1645,14 +1656,14 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data) ...@@ -1645,14 +1656,14 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data)
goto read_fifo; goto read_fifo;
normal_out: normal_out:
if (likely(!datapath_polling)) if (!qdev->single_msi && likely(!datapath_polling))
enable_irq(irq); enable_irq(irq);
else else if (unlikely(datapath_polling))
schedule_work(&dbc->poll_work); schedule_work(&dbc->poll_work);
/* checking the fifo and enabling irqs is a race, missed event check */ /* checking the fifo and enabling irqs is a race, missed event check */
tail = readl(dbc->dbc_base + RSPTP_OFF); tail = readl(dbc->dbc_base + RSPTP_OFF);
if (tail != U32_MAX && head != tail) { if (tail != U32_MAX && head != tail) {
if (likely(!datapath_polling)) if (!qdev->single_msi && likely(!datapath_polling))
disable_irq_nosync(irq); disable_irq_nosync(irq);
goto read_fifo; goto read_fifo;
} }
...@@ -1661,9 +1672,9 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data) ...@@ -1661,9 +1672,9 @@ irqreturn_t dbc_irq_threaded_fn(int irq, void *data)
error_out: error_out:
srcu_read_unlock(&dbc->ch_lock, rcu_id); srcu_read_unlock(&dbc->ch_lock, rcu_id);
if (likely(!datapath_polling)) if (!qdev->single_msi && likely(!datapath_polling))
enable_irq(irq); enable_irq(irq);
else else if (unlikely(datapath_polling))
schedule_work(&dbc->poll_work); schedule_work(&dbc->poll_work);
return IRQ_HANDLED; return IRQ_HANDLED;
......
...@@ -424,14 +424,24 @@ static int init_msi(struct qaic_device *qdev, struct pci_dev *pdev) ...@@ -424,14 +424,24 @@ static int init_msi(struct qaic_device *qdev, struct pci_dev *pdev)
int i; int i;
/* Managed release since we use pcim_enable_device */ /* Managed release since we use pcim_enable_device */
ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI); ret = pci_alloc_irq_vectors(pdev, 32, 32, PCI_IRQ_MSI);
if (ret < 0) if (ret == -ENOSPC) {
return ret; ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
if (ret < 0)
return ret;
if (ret < 32) { /*
pci_err(pdev, "%s: Requested 32 MSIs. Obtained %d MSIs which is less than the 32 required.\n", * Operate in one MSI mode. All interrupts will be directed to
__func__, ret); * MSI0; every interrupt will wake up all the interrupt handlers
return -ENODEV; * (MHI and DBC[0-15]). Since the interrupt is now shared, it is
* not disabled during DBC threaded handler, but only one thread
* will be allowed to run per DBC, so while it can be
* interrupted, it shouldn't race with itself.
*/
qdev->single_msi = true;
pci_info(pdev, "Allocating 32 MSIs failed, operating in 1 MSI mode. Performance may be impacted.\n");
} else if (ret < 0) {
return ret;
} }
mhi_irq = pci_irq_vector(pdev, 0); mhi_irq = pci_irq_vector(pdev, 0);
...@@ -439,15 +449,17 @@ static int init_msi(struct qaic_device *qdev, struct pci_dev *pdev) ...@@ -439,15 +449,17 @@ static int init_msi(struct qaic_device *qdev, struct pci_dev *pdev)
return mhi_irq; return mhi_irq;
for (i = 0; i < qdev->num_dbc; ++i) { for (i = 0; i < qdev->num_dbc; ++i) {
ret = devm_request_threaded_irq(&pdev->dev, pci_irq_vector(pdev, i + 1), ret = devm_request_threaded_irq(&pdev->dev,
pci_irq_vector(pdev, qdev->single_msi ? 0 : i + 1),
dbc_irq_handler, dbc_irq_threaded_fn, IRQF_SHARED, dbc_irq_handler, dbc_irq_threaded_fn, IRQF_SHARED,
"qaic_dbc", &qdev->dbc[i]); "qaic_dbc", &qdev->dbc[i]);
if (ret) if (ret)
return ret; return ret;
if (datapath_polling) { if (datapath_polling) {
qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1); qdev->dbc[i].irq = pci_irq_vector(pdev, qdev->single_msi ? 0 : i + 1);
disable_irq_nosync(qdev->dbc[i].irq); if (!qdev->single_msi)
disable_irq_nosync(qdev->dbc[i].irq);
INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work); INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
} }
} }
...@@ -479,7 +491,8 @@ static int qaic_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) ...@@ -479,7 +491,8 @@ static int qaic_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto cleanup_qdev; goto cleanup_qdev;
} }
qdev->mhi_cntrl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq); qdev->mhi_cntrl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq,
qdev->single_msi);
if (IS_ERR(qdev->mhi_cntrl)) { if (IS_ERR(qdev->mhi_cntrl)) {
ret = PTR_ERR(qdev->mhi_cntrl); ret = PTR_ERR(qdev->mhi_cntrl);
goto cleanup_qdev; goto cleanup_qdev;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment