Commit 69e7649f authored by Lucas Segarra Fernandez's avatar Lucas Segarra Fernandez Committed by Herbert Xu

crypto: qat - add support for device telemetry

Expose through debugfs device telemetry data for QAT GEN4 devices.

This allows to gather metrics about the performance and the utilization
of a device. In particular, statistics on (1) the utilization of the
PCIe channel, (2) address translation, when SVA is enabled and (3) the
internal engines for crypto and data compression.

If telemetry is supported by the firmware, the driver allocates a DMA
region and a circular buffer. When telemetry is enabled, through the
`control` attribute in debugfs, the driver sends to the firmware, via
the admin interface, the `TL_START` command. This triggers the device to
periodically gather telemetry data from hardware registers and write it
into the DMA memory region. The device writes into the shared region
every second.

The driver, every 500ms, snapshots the DMA shared region into the
circular buffer. This is then used to compute basic metric
(min/max/average) on each counter, every time the `device_data` attribute
is queried.

Telemetry counters are exposed through debugfs in the folder
/sys/kernel/debug/qat_<device>_<BDF>/telemetry.

For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.

This patch is based on earlier work done by Wojciech Ziemba.
Signed-off-by: default avatarLucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: default avatarDamian Muszynski <damian.muszynski@intel.com>
Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
parent 7f06679d
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/control
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RW) Enables/disables the reporting of telemetry metrics.
Allowed values to write:
========================
* 0: disable telemetry
* 1: enable telemetry
* 2, 3, 4: enable telemetry and calculate minimum, maximum
and average for each counter over 2, 3 or 4 samples
Returned values:
================
* 1-4: telemetry is enabled and running
* 0: telemetry is disabled
Example.
Writing '3' to this file starts the collection of
telemetry metrics. Samples are collected every second and
stored in a circular buffer of size 3. These values are then
used to calculate the minimum, maximum and average for each
counter. After enabling, counters can be retrieved through
the ``device_data`` file::
echo 3 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
Writing '0' to this file stops the collection of telemetry
metrics::
echo 0 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
This attribute is only available for qat_4xxx devices.
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/device_data
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RO) Reports device telemetry counters.
Reads report metrics about performance and utilization of
a QAT device:
======================= ========================================
Field Description
======================= ========================================
sample_cnt number of acquisitions of telemetry data
from the device. Reads are performed
every 1000 ms.
pci_trans_cnt number of PCIe partial transactions
max_rd_lat maximum logged read latency [ns] (could
be any read operation)
rd_lat_acc_avg average read latency [ns]
max_gp_lat max get to put latency [ns] (only takes
samples for AE0)
gp_lat_acc_avg average get to put latency [ns]
bw_in PCIe, write bandwidth [Mbps]
bw_out PCIe, read bandwidth [Mbps]
at_page_req_lat_avg Address Translator(AT), average page
request latency [ns]
at_trans_lat_avg AT, average page translation latency [ns]
at_max_tlb_used AT, maximum uTLB used
util_cpr<N> utilization of Compression slice N [%]
exec_cpr<N> execution count of Compression slice N
util_xlt<N> utilization of Translator slice N [%]
exec_xlt<N> execution count of Translator slice N
util_dcpr<N> utilization of Decompression slice N [%]
exec_dcpr<N> execution count of Decompression slice N
util_pke<N> utilization of PKE N [%]
exec_pke<N> execution count of PKE N
util_ucs<N> utilization of UCS slice N [%]
exec_ucs<N> execution count of UCS slice N
util_wat<N> utilization of Wireless Authentication
slice N [%]
exec_wat<N> execution count of Wireless Authentication
slice N
util_wcp<N> utilization of Wireless Cipher slice N [%]
exec_wcp<N> execution count of Wireless Cipher slice N
util_cph<N> utilization of Cipher slice N [%]
exec_cph<N> execution count of Cipher slice N
util_ath<N> utilization of Authentication slice N [%]
exec_ath<N> execution count of Authentication slice N
======================= ========================================
The telemetry report file can be read with the following command::
cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/device_data
If ``control`` is set to 1, only the current values of the
counters are displayed::
<counter_name> <current>
If ``control`` is 2, 3 or 4, counters are displayed in the
following format::
<counter_name> <current> <min> <max> <avg>
If a device lacks of a specific accelerator, the corresponding
attribute is not reported.
This attribute is only available for qat_4xxx devices.
......@@ -15,6 +15,7 @@
#include <adf_gen4_pm.h>
#include <adf_gen4_ras.h>
#include <adf_gen4_timer.h>
#include <adf_gen4_tl.h>
#include "adf_420xx_hw_data.h"
#include "icp_qat_hw.h"
......@@ -543,6 +544,7 @@ void adf_init_hw_data_420xx(struct adf_hw_device_data *hw_data, u32 dev_id)
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen4_init_dc_ops(&hw_data->dc_ops);
adf_gen4_init_ras_ops(&hw_data->ras_ops);
adf_gen4_init_tl_data(&hw_data->tl_data);
adf_init_rl_data(&hw_data->rl_data);
}
......
......@@ -15,6 +15,7 @@
#include <adf_gen4_pm.h>
#include "adf_gen4_ras.h"
#include <adf_gen4_timer.h>
#include <adf_gen4_tl.h>
#include "adf_4xxx_hw_data.h"
#include "icp_qat_hw.h"
......@@ -453,6 +454,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen4_init_dc_ops(&hw_data->dc_ops);
adf_gen4_init_ras_ops(&hw_data->ras_ops);
adf_gen4_init_tl_data(&hw_data->tl_data);
adf_init_rl_data(&hw_data->rl_data);
}
......
......@@ -41,9 +41,12 @@ intel_qat-$(CONFIG_DEBUG_FS) += adf_transport_debug.o \
adf_fw_counters.o \
adf_cnv_dbgfs.o \
adf_gen4_pm_debugfs.o \
adf_gen4_tl.o \
adf_heartbeat.o \
adf_heartbeat_dbgfs.o \
adf_pm_dbgfs.o \
adf_telemetry.o \
adf_tl_debugfs.o \
adf_dbgfs.o
intel_qat-$(CONFIG_PCI_IOV) += adf_sriov.o adf_vf_isr.o adf_pfvf_utils.o \
......
......@@ -11,6 +11,7 @@
#include <linux/types.h>
#include "adf_cfg_common.h"
#include "adf_rl.h"
#include "adf_telemetry.h"
#include "adf_pfvf_msg.h"
#define ADF_DH895XCC_DEVICE_NAME "dh895xcc"
......@@ -254,6 +255,7 @@ struct adf_hw_device_data {
struct adf_ras_ops ras_ops;
struct adf_dev_err_mask dev_err_mask;
struct adf_rl_hw_data rl_data;
struct adf_tl_hw_data tl_data;
const char *fw_name;
const char *fw_mmp_name;
u32 fuses;
......@@ -308,6 +310,7 @@ struct adf_hw_device_data {
#define GET_CSR_OPS(accel_dev) (&(accel_dev)->hw_device->csr_ops)
#define GET_PFVF_OPS(accel_dev) (&(accel_dev)->hw_device->pfvf_ops)
#define GET_DC_OPS(accel_dev) (&(accel_dev)->hw_device->dc_ops)
#define GET_TL_DATA(accel_dev) GET_HW_DATA(accel_dev)->tl_data
#define accel_to_pci_dev(accel_ptr) accel_ptr->accel_pci_dev.pci_dev
struct adf_admin_comms;
......@@ -356,6 +359,7 @@ struct adf_accel_dev {
struct adf_cfg_device_data *cfg;
struct adf_fw_loader_data *fw_loader;
struct adf_admin_comms *admin;
struct adf_telemetry *telemetry;
struct adf_dc_data *dc_data;
struct adf_pm power_management;
struct list_head crypto_list;
......
......@@ -10,6 +10,7 @@
#include "adf_fw_counters.h"
#include "adf_heartbeat_dbgfs.h"
#include "adf_pm_dbgfs.h"
#include "adf_tl_debugfs.h"
/**
* adf_dbgfs_init() - add persistent debugfs entries
......@@ -66,6 +67,7 @@ void adf_dbgfs_add(struct adf_accel_dev *accel_dev)
adf_heartbeat_dbgfs_add(accel_dev);
adf_pm_dbgfs_add(accel_dev);
adf_cnv_dbgfs_add(accel_dev);
adf_tl_dbgfs_add(accel_dev);
}
}
......@@ -79,6 +81,7 @@ void adf_dbgfs_rm(struct adf_accel_dev *accel_dev)
return;
if (!accel_dev->is_vf) {
adf_tl_dbgfs_rm(accel_dev);
adf_cnv_dbgfs_rm(accel_dev);
adf_pm_dbgfs_rm(accel_dev);
adf_heartbeat_dbgfs_rm(accel_dev);
......
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2023 Intel Corporation. */
#include <linux/export.h>
#include <linux/kernel.h>
#include "adf_gen4_tl.h"
#include "adf_telemetry.h"
#include "adf_tl_debugfs.h"
#define ADF_GEN4_TL_DEV_REG_OFF(reg) ADF_TL_DEV_REG_OFF(reg, gen4)
#define ADF_GEN4_TL_SL_UTIL_COUNTER(_name) \
ADF_TL_COUNTER("util_" #_name, \
ADF_TL_SIMPLE_COUNT, \
ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_util, gen4))
#define ADF_GEN4_TL_SL_EXEC_COUNTER(_name) \
ADF_TL_COUNTER("exec_" #_name, \
ADF_TL_SIMPLE_COUNT, \
ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_exec_cnt, gen4))
/* Device level counters. */
static const struct adf_tl_dbg_counter dev_counters[] = {
/* PCIe partial transactions. */
ADF_TL_COUNTER(PCI_TRANS_CNT_NAME, ADF_TL_SIMPLE_COUNT,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_pci_trans_cnt)),
/* Max read latency[ns]. */
ADF_TL_COUNTER(MAX_RD_LAT_NAME, ADF_TL_COUNTER_NS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_max)),
/* Read latency average[ns]. */
ADF_TL_COUNTER_LATENCY(RD_LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_cmpl_cnt)),
/* Max get to put latency[ns]. */
ADF_TL_COUNTER(MAX_LAT_NAME, ADF_TL_COUNTER_NS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_max)),
/* Get to put latency average[ns]. */
ADF_TL_COUNTER_LATENCY(LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_ae_put_cnt)),
/* PCIe write bandwidth[Mbps]. */
ADF_TL_COUNTER(BW_IN_NAME, ADF_TL_COUNTER_MBPS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_in)),
/* PCIe read bandwidth[Mbps]. */
ADF_TL_COUNTER(BW_OUT_NAME, ADF_TL_COUNTER_MBPS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_out)),
/* Page request latency average[ns]. */
ADF_TL_COUNTER_LATENCY(PAGE_REQ_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_cnt)),
/* Page translation latency average[ns]. */
ADF_TL_COUNTER_LATENCY(AT_TRANS_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_cnt)),
/* Maximum uTLB used. */
ADF_TL_COUNTER(AT_MAX_UTLB_USED_NAME, ADF_TL_SIMPLE_COUNT,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_max_tlb_used)),
};
/* Slice utilization counters. */
static const struct adf_tl_dbg_counter sl_util_counters[ADF_TL_SL_CNT_COUNT] = {
/* Compression slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(cpr),
/* Translator slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(xlt),
/* Decompression slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(dcpr),
/* PKE utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(pke),
/* Wireless Authentication slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(wat),
/* Wireless Cipher slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(wcp),
/* UCS slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(ucs),
/* Cipher slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(cph),
/* Authentication slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(ath),
};
/* Slice execution counters. */
static const struct adf_tl_dbg_counter sl_exec_counters[ADF_TL_SL_CNT_COUNT] = {
/* Compression slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(cpr),
/* Translator slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(xlt),
/* Decompression slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(dcpr),
/* PKE execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(pke),
/* Wireless Authentication slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(wat),
/* Wireless Cipher slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(wcp),
/* UCS slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(ucs),
/* Cipher slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(cph),
/* Authentication slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(ath),
};
void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
{
tl_data->layout_sz = ADF_GEN4_TL_LAYOUT_SZ;
tl_data->slice_reg_sz = ADF_GEN4_TL_SLICE_REG_SZ;
tl_data->num_hbuff = ADF_GEN4_TL_NUM_HIST_BUFFS;
tl_data->msg_cnt_off = ADF_GEN4_TL_MSG_CNT_OFF;
tl_data->cpp_ns_per_cycle = ADF_GEN4_CPP_NS_PER_CYCLE;
tl_data->bw_units_to_bytes = ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES;
tl_data->dev_counters = dev_counters;
tl_data->num_dev_counters = ARRAY_SIZE(dev_counters);
tl_data->sl_util_counters = sl_util_counters;
tl_data->sl_exec_counters = sl_exec_counters;
}
EXPORT_SYMBOL_GPL(adf_gen4_init_tl_data);
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (c) 2023 Intel Corporation. */
#ifndef ADF_GEN4_TL_H
#define ADF_GEN4_TL_H
#include <linux/stddef.h>
#include <linux/types.h>
struct adf_tl_hw_data;
/* Computation constants. */
#define ADF_GEN4_CPP_NS_PER_CYCLE 2
#define ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES 64
/* Maximum aggregation time. Value in milliseconds. */
#define ADF_GEN4_TL_MAX_AGGR_TIME_MS 4000
/* Num of buffers to store historic values. */
#define ADF_GEN4_TL_NUM_HIST_BUFFS \
(ADF_GEN4_TL_MAX_AGGR_TIME_MS / ADF_TL_DATA_WR_INTERVAL_MS)
/* Max number of HW resources of one type. */
#define ADF_GEN4_TL_MAX_SLICES_PER_TYPE 24
/**
* struct adf_gen4_tl_slice_data_regs - HW slice data as populated by FW.
* @reg_tm_slice_exec_cnt: Slice execution count.
* @reg_tm_slice_util: Slice utilization.
*/
struct adf_gen4_tl_slice_data_regs {
__u32 reg_tm_slice_exec_cnt;
__u32 reg_tm_slice_util;
};
#define ADF_GEN4_TL_SLICE_REG_SZ sizeof(struct adf_gen4_tl_slice_data_regs)
/**
* struct adf_gen4_tl_device_data_regs - This structure stores device telemetry
* counter values as are being populated periodically by device.
* @reg_tl_rd_lat_acc: read latency accumulator
* @reg_tl_gp_lat_acc: get-put latency accumulator
* @reg_tl_at_page_req_lat_acc: AT/DevTLB page request latency accumulator
* @reg_tl_at_trans_lat_acc: DevTLB transaction latency accumulator
* @reg_tl_re_acc: accumulated ring empty time
* @reg_tl_pci_trans_cnt: PCIe partial transactions
* @reg_tl_rd_lat_max: maximum logged read latency
* @reg_tl_rd_cmpl_cnt: read requests completed count
* @reg_tl_gp_lat_max: maximum logged get to put latency
* @reg_tl_ae_put_cnt: Accelerator Engine put counts across all rings
* @reg_tl_bw_in: PCIe write bandwidth
* @reg_tl_bw_out: PCIe read bandwidth
* @reg_tl_at_page_req_cnt: DevTLB page requests count
* @reg_tl_at_trans_lat_cnt: DevTLB transaction latency samples count
* @reg_tl_at_max_tlb_used: maximum uTLB used
* @reg_tl_re_cnt: ring empty time samples count
* @reserved: reserved
* @ath_slices: array of Authentication slices utilization registers
* @cph_slices: array of Cipher slices utilization registers
* @cpr_slices: array of Compression slices utilization registers
* @xlt_slices: array of Translator slices utilization registers
* @dcpr_slices: array of Decompression slices utilization registers
* @pke_slices: array of PKE slices utilization registers
* @ucs_slices: array of UCS slices utilization registers
* @wat_slices: array of Wireless Authentication slices utilization registers
* @wcp_slices: array of Wireless Cipher slices utilization registers
*/
struct adf_gen4_tl_device_data_regs {
__u64 reg_tl_rd_lat_acc;
__u64 reg_tl_gp_lat_acc;
__u64 reg_tl_at_page_req_lat_acc;
__u64 reg_tl_at_trans_lat_acc;
__u64 reg_tl_re_acc;
__u32 reg_tl_pci_trans_cnt;
__u32 reg_tl_rd_lat_max;
__u32 reg_tl_rd_cmpl_cnt;
__u32 reg_tl_gp_lat_max;
__u32 reg_tl_ae_put_cnt;
__u32 reg_tl_bw_in;
__u32 reg_tl_bw_out;
__u32 reg_tl_at_page_req_cnt;
__u32 reg_tl_at_trans_lat_cnt;
__u32 reg_tl_at_max_tlb_used;
__u32 reg_tl_re_cnt;
__u32 reserved;
struct adf_gen4_tl_slice_data_regs ath_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs cph_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs cpr_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs xlt_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs dcpr_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs pke_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs ucs_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs wat_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
struct adf_gen4_tl_slice_data_regs wcp_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
};
/**
* struct adf_gen4_tl_layout - This structure represents entire telemetry
* counters data: Device + 4 Ring Pairs as are being populated periodically
* by device.
* @tl_device_data_regs: structure of device telemetry registers
* @reserved1: reserved
* @reg_tl_msg_cnt: telemetry messages counter
* @reserved: reserved
*/
struct adf_gen4_tl_layout {
struct adf_gen4_tl_device_data_regs tl_device_data_regs;
__u32 reserved1[14];
__u32 reg_tl_msg_cnt;
__u32 reserved;
};
#define ADF_GEN4_TL_LAYOUT_SZ sizeof(struct adf_gen4_tl_layout)
#define ADF_GEN4_TL_MSG_CNT_OFF offsetof(struct adf_gen4_tl_layout, reg_tl_msg_cnt)
#ifdef CONFIG_DEBUG_FS
void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data);
#else
static inline void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
{
}
#endif /* CONFIG_DEBUG_FS */
#endif /* ADF_GEN4_TL_H */
......@@ -11,6 +11,7 @@
#include "adf_heartbeat.h"
#include "adf_rl.h"
#include "adf_sysfs_ras_counters.h"
#include "adf_telemetry.h"
static LIST_HEAD(service_table);
static DEFINE_MUTEX(service_lock);
......@@ -142,6 +143,10 @@ static int adf_dev_init(struct adf_accel_dev *accel_dev)
if (ret && ret != -EOPNOTSUPP)
return ret;
ret = adf_tl_init(accel_dev);
if (ret && ret != -EOPNOTSUPP)
return ret;
/*
* Subservice initialisation is divided into two stages: init and start.
* This is to facilitate any ordering dependencies between services
......@@ -220,6 +225,10 @@ static int adf_dev_start(struct adf_accel_dev *accel_dev)
if (ret && ret != -EOPNOTSUPP)
return ret;
ret = adf_tl_start(accel_dev);
if (ret && ret != -EOPNOTSUPP)
return ret;
list_for_each_entry(service, &service_table, list) {
if (service->event_hld(accel_dev, ADF_EVENT_START)) {
dev_err(&GET_DEV(accel_dev),
......@@ -279,6 +288,7 @@ static void adf_dev_stop(struct adf_accel_dev *accel_dev)
!test_bit(ADF_STATUS_STARTING, &accel_dev->status))
return;
adf_tl_stop(accel_dev);
adf_rl_stop(accel_dev);
adf_dbgfs_rm(accel_dev);
adf_sysfs_stop_ras(accel_dev);
......@@ -374,6 +384,8 @@ static void adf_dev_shutdown(struct adf_accel_dev *accel_dev)
adf_heartbeat_shutdown(accel_dev);
adf_tl_shutdown(accel_dev);
hw_data->disable_iov(accel_dev);
if (test_bit(ADF_STATUS_IRQ_ALLOCATED, &accel_dev->status)) {
......
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2023 Intel Corporation. */
#define dev_fmt(fmt) "Telemetry: " fmt
#include <asm/errno.h>
#include <linux/atomic.h>
#include <linux/device.h>
#include <linux/dev_printk.h>
#include <linux/dma-mapping.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/workqueue.h>
#include "adf_admin.h"
#include "adf_accel_devices.h"
#include "adf_common_drv.h"
#include "adf_telemetry.h"
#define TL_IS_ZERO(input) ((input) == 0)
static bool is_tl_supported(struct adf_accel_dev *accel_dev)
{
u16 fw_caps = GET_HW_DATA(accel_dev)->fw_capabilities;
return fw_caps & TL_CAPABILITY_BIT;
}
static int validate_tl_data(struct adf_tl_hw_data *tl_data)
{
if (!tl_data->dev_counters ||
TL_IS_ZERO(tl_data->num_dev_counters) ||
!tl_data->sl_util_counters ||
!tl_data->sl_exec_counters)
return -EOPNOTSUPP;
return 0;
}
static int adf_tl_alloc_mem(struct adf_accel_dev *accel_dev)
{
struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
struct device *dev = &GET_DEV(accel_dev);
size_t regs_sz = tl_data->layout_sz;
struct adf_telemetry *telemetry;
int node = dev_to_node(dev);
void *tl_data_regs;
unsigned int i;
telemetry = kzalloc_node(sizeof(*telemetry), GFP_KERNEL, node);
if (!telemetry)
return -ENOMEM;
telemetry->regs_hist_buff = kmalloc_array(tl_data->num_hbuff,
sizeof(*telemetry->regs_hist_buff),
GFP_KERNEL);
if (!telemetry->regs_hist_buff)
goto err_free_tl;
telemetry->regs_data = dma_alloc_coherent(dev, regs_sz,
&telemetry->regs_data_p,
GFP_KERNEL);
if (!telemetry->regs_data)
goto err_free_regs_hist_buff;
for (i = 0; i < tl_data->num_hbuff; i++) {
tl_data_regs = kzalloc_node(regs_sz, GFP_KERNEL, node);
if (!tl_data_regs)
goto err_free_dma;
telemetry->regs_hist_buff[i] = tl_data_regs;
}
accel_dev->telemetry = telemetry;
return 0;
err_free_dma:
dma_free_coherent(dev, regs_sz, telemetry->regs_data,
telemetry->regs_data_p);
while (i--)
kfree(telemetry->regs_hist_buff[i]);
err_free_regs_hist_buff:
kfree(telemetry->regs_hist_buff);
err_free_tl:
kfree(telemetry);
return -ENOMEM;
}
static void adf_tl_free_mem(struct adf_accel_dev *accel_dev)
{
struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
struct adf_telemetry *telemetry = accel_dev->telemetry;
struct device *dev = &GET_DEV(accel_dev);
size_t regs_sz = tl_data->layout_sz;
unsigned int i;
for (i = 0; i < tl_data->num_hbuff; i++)
kfree(telemetry->regs_hist_buff[i]);
dma_free_coherent(dev, regs_sz, telemetry->regs_data,
telemetry->regs_data_p);
kfree(telemetry->regs_hist_buff);
kfree(telemetry);
accel_dev->telemetry = NULL;
}
static unsigned long get_next_timeout(void)
{
return msecs_to_jiffies(ADF_TL_TIMER_INT_MS);
}
static void snapshot_regs(struct adf_telemetry *telemetry, size_t size)
{
void *dst = telemetry->regs_hist_buff[telemetry->hb_num];
void *src = telemetry->regs_data;
memcpy(dst, src, size);
}
static void tl_work_handler(struct work_struct *work)
{
struct delayed_work *delayed_work;
struct adf_telemetry *telemetry;
struct adf_tl_hw_data *tl_data;
u32 msg_cnt, old_msg_cnt;
size_t layout_sz;
u32 *regs_data;
size_t id;
delayed_work = to_delayed_work(work);
telemetry = container_of(delayed_work, struct adf_telemetry, work_ctx);
tl_data = &GET_TL_DATA(telemetry->accel_dev);
regs_data = telemetry->regs_data;
id = tl_data->msg_cnt_off / sizeof(*regs_data);
layout_sz = tl_data->layout_sz;
if (!atomic_read(&telemetry->state)) {
cancel_delayed_work_sync(&telemetry->work_ctx);
return;
}
msg_cnt = regs_data[id];
old_msg_cnt = msg_cnt;
if (msg_cnt == telemetry->msg_cnt)
goto out;
mutex_lock(&telemetry->regs_hist_lock);
snapshot_regs(telemetry, layout_sz);
/* Check if data changed while updating it */
msg_cnt = regs_data[id];
if (old_msg_cnt != msg_cnt)
snapshot_regs(telemetry, layout_sz);
telemetry->msg_cnt = msg_cnt;
telemetry->hb_num++;
telemetry->hb_num %= telemetry->hbuffs;
mutex_unlock(&telemetry->regs_hist_lock);
out:
adf_misc_wq_queue_delayed_work(&telemetry->work_ctx, get_next_timeout());
}
int adf_tl_halt(struct adf_accel_dev *accel_dev)
{
struct adf_telemetry *telemetry = accel_dev->telemetry;
struct device *dev = &GET_DEV(accel_dev);
int ret;
cancel_delayed_work_sync(&telemetry->work_ctx);
atomic_set(&telemetry->state, 0);
ret = adf_send_admin_tl_stop(accel_dev);
if (ret)
dev_err(dev, "failed to stop telemetry\n");
return ret;
}
int adf_tl_run(struct adf_accel_dev *accel_dev, int state)
{
struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
struct adf_telemetry *telemetry = accel_dev->telemetry;
struct device *dev = &GET_DEV(accel_dev);
size_t layout_sz = tl_data->layout_sz;
int ret;
ret = adf_send_admin_tl_start(accel_dev, telemetry->regs_data_p,
layout_sz, NULL, &telemetry->slice_cnt);
if (ret) {
dev_err(dev, "failed to start telemetry\n");
return ret;
}
telemetry->hbuffs = state;
atomic_set(&telemetry->state, state);
adf_misc_wq_queue_delayed_work(&telemetry->work_ctx, get_next_timeout());
return 0;
}
int adf_tl_init(struct adf_accel_dev *accel_dev)
{
struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
struct device *dev = &GET_DEV(accel_dev);
struct adf_telemetry *telemetry;
int ret;
ret = validate_tl_data(tl_data);
if (ret)
return ret;
ret = adf_tl_alloc_mem(accel_dev);
if (ret) {
dev_err(dev, "failed to initialize: %d\n", ret);
return ret;
}
telemetry = accel_dev->telemetry;
telemetry->accel_dev = accel_dev;
mutex_init(&telemetry->wr_lock);
mutex_init(&telemetry->regs_hist_lock);
INIT_DELAYED_WORK(&telemetry->work_ctx, tl_work_handler);
return 0;
}
int adf_tl_start(struct adf_accel_dev *accel_dev)
{
struct device *dev = &GET_DEV(accel_dev);
if (!accel_dev->telemetry)
return -EOPNOTSUPP;
if (!is_tl_supported(accel_dev)) {
dev_info(dev, "feature not supported by FW\n");
adf_tl_free_mem(accel_dev);
return -EOPNOTSUPP;
}
return 0;
}
void adf_tl_stop(struct adf_accel_dev *accel_dev)
{
if (!accel_dev->telemetry)
return;
if (atomic_read(&accel_dev->telemetry->state))
adf_tl_halt(accel_dev);
}
void adf_tl_shutdown(struct adf_accel_dev *accel_dev)
{
if (!accel_dev->telemetry)
return;
adf_tl_free_mem(accel_dev);
}
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (c) 2023 Intel Corporation. */
#ifndef ADF_TELEMETRY_H
#define ADF_TELEMETRY_H
#include <linux/bits.h>
#include <linux/mutex.h>
#include <linux/types.h>
#include <linux/workqueue.h>
#include "icp_qat_fw_init_admin.h"
struct adf_accel_dev;
struct adf_tl_dbg_counter;
struct dentry;
#define ADF_TL_SL_CNT_COUNT \
(sizeof(struct icp_qat_fw_init_admin_slice_cnt) / sizeof(__u8))
#define TL_CAPABILITY_BIT BIT(1)
/* Interval within device writes data to DMA region. Value in milliseconds. */
#define ADF_TL_DATA_WR_INTERVAL_MS 1000
/* Interval within timer interrupt should be handled. Value in milliseconds. */
#define ADF_TL_TIMER_INT_MS (ADF_TL_DATA_WR_INTERVAL_MS / 2)
struct adf_tl_hw_data {
size_t layout_sz;
size_t slice_reg_sz;
size_t msg_cnt_off;
const struct adf_tl_dbg_counter *dev_counters;
const struct adf_tl_dbg_counter *sl_util_counters;
const struct adf_tl_dbg_counter *sl_exec_counters;
u8 num_hbuff;
u8 cpp_ns_per_cycle;
u8 bw_units_to_bytes;
u8 num_dev_counters;
};
struct adf_telemetry {
struct adf_accel_dev *accel_dev;
atomic_t state;
u32 hbuffs;
int hb_num;
u32 msg_cnt;
dma_addr_t regs_data_p; /* bus address for DMA mapping */
void *regs_data; /* virtual address for DMA mapping */
/**
* @regs_hist_buff: array of pointers to copies of the last @hbuffs
* values of @regs_data
*/
void **regs_hist_buff;
struct dentry *dbg_dir;
/**
* @regs_hist_lock: protects from race conditions between write and read
* to the copies referenced by @regs_hist_buff
*/
struct mutex regs_hist_lock;
/**
* @wr_lock: protects from concurrent writes to debugfs telemetry files
*/
struct mutex wr_lock;
struct delayed_work work_ctx;
struct icp_qat_fw_init_admin_slice_cnt slice_cnt;
};
#ifdef CONFIG_DEBUG_FS
int adf_tl_init(struct adf_accel_dev *accel_dev);
int adf_tl_start(struct adf_accel_dev *accel_dev);
void adf_tl_stop(struct adf_accel_dev *accel_dev);
void adf_tl_shutdown(struct adf_accel_dev *accel_dev);
int adf_tl_run(struct adf_accel_dev *accel_dev, int state);
int adf_tl_halt(struct adf_accel_dev *accel_dev);
#else
static inline int adf_tl_init(struct adf_accel_dev *accel_dev)
{
return 0;
}
static inline int adf_tl_start(struct adf_accel_dev *accel_dev)
{
return 0;
}
static inline void adf_tl_stop(struct adf_accel_dev *accel_dev)
{
}
static inline void adf_tl_shutdown(struct adf_accel_dev *accel_dev)
{
}
#endif /* CONFIG_DEBUG_FS */
#endif /* ADF_TELEMETRY_H */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (c) 2023 Intel Corporation. */
#ifndef ADF_TL_DEBUGFS_H
#define ADF_TL_DEBUGFS_H
#include <linux/types.h>
struct adf_accel_dev;
#define MAX_COUNT_NAME_SIZE 32
#define SNAPSHOT_CNT_MSG "sample_cnt"
#define RP_NUM_INDEX "rp_num"
#define PCI_TRANS_CNT_NAME "pci_trans_cnt"
#define MAX_RD_LAT_NAME "max_rd_lat"
#define RD_LAT_ACC_NAME "rd_lat_acc_avg"
#define MAX_LAT_NAME "max_gp_lat"
#define LAT_ACC_NAME "gp_lat_acc_avg"
#define BW_IN_NAME "bw_in"
#define BW_OUT_NAME "bw_out"
#define PAGE_REQ_LAT_NAME "at_page_req_lat_avg"
#define AT_TRANS_LAT_NAME "at_trans_lat_avg"
#define AT_MAX_UTLB_USED_NAME "at_max_tlb_used"
#define AT_GLOB_DTLB_HIT_NAME "at_glob_devtlb_hit"
#define AT_GLOB_DTLB_MISS_NAME "at_glob_devtlb_miss"
#define AT_PAYLD_DTLB_HIT_NAME "tl_at_payld_devtlb_hit"
#define AT_PAYLD_DTLB_MISS_NAME "tl_at_payld_devtlb_miss"
#define ADF_TL_DATA_REG_OFF(reg, qat_gen) \
offsetof(struct adf_##qat_gen##_tl_layout, reg)
#define ADF_TL_DEV_REG_OFF(reg, qat_gen) \
(ADF_TL_DATA_REG_OFF(tl_device_data_regs, qat_gen) + \
offsetof(struct adf_##qat_gen##_tl_device_data_regs, reg))
#define ADF_TL_SLICE_REG_OFF(slice, reg, qat_gen) \
(ADF_TL_DEV_REG_OFF(slice##_slices[0], qat_gen) + \
offsetof(struct adf_##qat_gen##_tl_slice_data_regs, reg))
/**
* enum adf_tl_counter_type - telemetry counter types
* @ADF_TL_COUNTER_UNSUPPORTED: unsupported counter
* @ADF_TL_SIMPLE_COUNT: simple counter
* @ADF_TL_COUNTER_NS: latency counter, value in ns
* @ADF_TL_COUNTER_NS_AVG: accumulated average latency counter, value in ns
* @ADF_TL_COUNTER_MBPS: bandwidth, value in MBps
*/
enum adf_tl_counter_type {
ADF_TL_COUNTER_UNSUPPORTED,
ADF_TL_SIMPLE_COUNT,
ADF_TL_COUNTER_NS,
ADF_TL_COUNTER_NS_AVG,
ADF_TL_COUNTER_MBPS,
};
/**
* struct adf_tl_dbg_counter - telemetry counter definition
* @name: name of the counter as printed in the report
* @adf_tl_counter_type: type of the counter
* @offset1: offset of 1st register
* @offset2: offset of 2nd optional register
*/
struct adf_tl_dbg_counter {
const char *name;
enum adf_tl_counter_type type;
size_t offset1;
size_t offset2;
};
#define ADF_TL_COUNTER(_name, _type, _offset) \
{ .name = _name, \
.type = _type, \
.offset1 = _offset \
}
#define ADF_TL_COUNTER_LATENCY(_name, _type, _offset1, _offset2) \
{ .name = _name, \
.type = _type, \
.offset1 = _offset1, \
.offset2 = _offset2 \
}
/* Telemetry counter aggregated values. */
struct adf_tl_dbg_aggr_values {
u64 curr;
u64 min;
u64 max;
u64 avg;
};
/**
* adf_tl_dbgfs_add() - Add telemetry's debug fs entries.
* @accel_dev: Pointer to acceleration device.
*
* Creates telemetry's debug fs folder and attributes in QAT debug fs root.
*/
void adf_tl_dbgfs_add(struct adf_accel_dev *accel_dev);
/**
* adf_tl_dbgfs_rm() - Remove telemetry's debug fs entries.
* @accel_dev: Pointer to acceleration device.
*
* Removes telemetry's debug fs folder and attributes from QAT debug fs root.
*/
void adf_tl_dbgfs_rm(struct adf_accel_dev *accel_dev);
#endif /* ADF_TL_DEBUGFS_H */
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment