Commit d8621762 authored by Jakub Kicinski's avatar Jakub Kicinski

Merge branch 'support-direct-read-from-region'

Jacob Keller says:

====================
support direct read from region

A long time ago when initially implementing devlink regions in ice I
proposed the ability to allow reading from a region without taking a
snapshot [1]. I eventually dropped this work from the original series due to
size. Then I eventually lost track of submitting this follow up.

This can be useful when interacting with some region that has some
definitive "contents" from which snapshots are made. For example the ice
driver has regions representing the contents of the device flash.

If userspace wants to read the contents today, it must first take a snapshot
and then read from that snapshot. This makes sense if you want to read a
large portion of data or you want to be sure reads are consistently from the
same recording of the flash.

However if user space only wants to read a small chunk, it must first
generate a snapshot of the entire contents, perform a read from the
snapshot, and then delete the snapshot after reading.

For such a use case, a direct read from the region makes more sense. This
can be achieved by allowing the devlink region read command to work without
a snapshot. Instead the portion to be read can be forwarded directly to the
driver via a new .read callback.

This avoids the need to read the entire region contents into memory first
and avoids the software overhead of creating a snapshot and then deleting
it.

This series implements such behavior and hooks up the ice NVM and shadow RAM
regions to allow it.

[1] https://lore.kernel.org/netdev/20200130225913.1671982-1-jacob.e.keller@intel.com/
====================

Link: https://lore.kernel.org/r/20221128203647.1198669-1-jacob.e.keller@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents c1d8e3fb 3af4b40b
......@@ -31,6 +31,15 @@ in its ``devlink_region_ops`` structure. If snapshot id is not set in
the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send
the snapshot information to user space.
Regions may optionally allow directly reading from their contents without a
snapshot. Direct read requests are not atomic. In particular a read request
of size 256 bytes or larger will be split into multiple chunks. If atomic
access is required, use a snapshot. A driver wishing to enable this for a
region should implement the ``.read`` callback in the ``devlink_region_ops``
structure. User space can request a direct read by using the
``DEVLINK_ATTR_REGION_DIRECT`` attribute instead of specifying a snapshot
id.
example usage
-------------
......@@ -65,6 +74,10 @@ example usage
$ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
# Read from the region without a snapshot
$ devlink region read pci/0000:00:05.0/fw-health address 16 length 16
0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
As regions are likely very device or driver specific, no generic regions are
defined. See the driver-specific documentation files for information on the
specific regions a driver supports.
......@@ -189,12 +189,21 @@ device data.
* - ``nvm-flash``
- The contents of the entire flash chip, sometimes referred to as
the device's Non Volatile Memory.
* - ``shadow-ram``
- The contents of the Shadow RAM, which is loaded from the beginning
of the flash. Although the contents are primarily from the flash,
this area also contains data generated during device boot which is
not stored in flash.
* - ``device-caps``
- The contents of the device firmware's capabilities buffer. Useful to
determine the current state and configuration of the device.
Users can request an immediate capture of a snapshot via the
``DEVLINK_CMD_REGION_NEW``
Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a
snapshot. The ``device-caps`` region requires a snapshot as the contents are
sent by firmware and can't be split into separate reads.
Users can request an immediate capture of a snapshot for all three regions
via the ``DEVLINK_CMD_REGION_NEW`` command.
.. code:: shell
......
......@@ -1596,21 +1596,22 @@ void ice_devlink_destroy_vf_port(struct ice_vf *vf)
#define ICE_DEVLINK_READ_BLK_SIZE (1024 * 1024)
static const struct devlink_region_ops ice_nvm_region_ops;
static const struct devlink_region_ops ice_sram_region_ops;
/**
* ice_devlink_nvm_snapshot - Capture a snapshot of the NVM flash contents
* @devlink: the devlink instance
* @ops: the devlink region being snapshotted
* @ops: the devlink region to snapshot
* @extack: extended ACK response structure
* @data: on exit points to snapshot data buffer
*
* This function is called in response to the DEVLINK_CMD_REGION_TRIGGER for
* the nvm-flash devlink region. It captures a snapshot of the full NVM flash
* contents, including both banks of flash. This snapshot can later be viewed
* via the devlink-region interface.
* This function is called in response to a DEVLINK_CMD_REGION_NEW for either
* the nvm-flash or shadow-ram region.
*
* It captures the flash using the FLASH_ONLY bit set when reading via
* firmware, so it does not read the current Shadow RAM contents. For that,
* use the shadow-ram region.
* It captures a snapshot of the NVM or Shadow RAM flash contents. This
* snapshot can then later be viewed via the DEVLINK_CMD_REGION_READ netlink
* interface.
*
* @returns zero on success, and updates the data pointer. Returns a non-zero
* error code on failure.
......@@ -1622,17 +1623,27 @@ static int ice_devlink_nvm_snapshot(struct devlink *devlink,
struct ice_pf *pf = devlink_priv(devlink);
struct device *dev = ice_pf_to_dev(pf);
struct ice_hw *hw = &pf->hw;
bool read_shadow_ram;
u8 *nvm_data, *tmp, i;
u32 nvm_size, left;
s8 num_blks;
int status;
if (ops == &ice_nvm_region_ops) {
read_shadow_ram = false;
nvm_size = hw->flash.flash_size;
} else if (ops == &ice_sram_region_ops) {
read_shadow_ram = true;
nvm_size = hw->flash.sr_words * 2u;
} else {
NL_SET_ERR_MSG_MOD(extack, "Unexpected region in snapshot function");
return -EOPNOTSUPP;
}
nvm_data = vzalloc(nvm_size);
if (!nvm_data)
return -ENOMEM;
num_blks = DIV_ROUND_UP(nvm_size, ICE_DEVLINK_READ_BLK_SIZE);
tmp = nvm_data;
left = nvm_size;
......@@ -1656,7 +1667,7 @@ static int ice_devlink_nvm_snapshot(struct devlink *devlink,
}
status = ice_read_flat_nvm(hw, i * ICE_DEVLINK_READ_BLK_SIZE,
&read_sz, tmp, false);
&read_sz, tmp, read_shadow_ram);
if (status) {
dev_dbg(dev, "ice_read_flat_nvm failed after reading %u bytes, err %d aq_err %d\n",
read_sz, status, hw->adminq.sq_last_status);
......@@ -1677,62 +1688,69 @@ static int ice_devlink_nvm_snapshot(struct devlink *devlink,
}
/**
* ice_devlink_sram_snapshot - Capture a snapshot of the Shadow RAM contents
* ice_devlink_nvm_read - Read a portion of NVM flash contents
* @devlink: the devlink instance
* @ops: the devlink region being snapshotted
* @ops: the devlink region to snapshot
* @extack: extended ACK response structure
* @data: on exit points to snapshot data buffer
* @offset: the offset to start at
* @size: the amount to read
* @data: the data buffer to read into
*
* This function is called in response to the DEVLINK_CMD_REGION_TRIGGER for
* the shadow-ram devlink region. It captures a snapshot of the shadow ram
* contents. This snapshot can later be viewed via the devlink-region
* interface.
* This function is called in response to DEVLINK_CMD_REGION_READ to directly
* read a section of the NVM contents.
*
* It reads from either the nvm-flash or shadow-ram region contents.
*
* @returns zero on success, and updates the data pointer. Returns a non-zero
* error code on failure.
*/
static int
ice_devlink_sram_snapshot(struct devlink *devlink,
const struct devlink_region_ops __always_unused *ops,
struct netlink_ext_ack *extack, u8 **data)
static int ice_devlink_nvm_read(struct devlink *devlink,
const struct devlink_region_ops *ops,
struct netlink_ext_ack *extack,
u64 offset, u32 size, u8 *data)
{
struct ice_pf *pf = devlink_priv(devlink);
struct device *dev = ice_pf_to_dev(pf);
struct ice_hw *hw = &pf->hw;
u8 *sram_data;
u32 sram_size;
int err;
bool read_shadow_ram;
u64 nvm_size;
int status;
sram_size = hw->flash.sr_words * 2u;
sram_data = vzalloc(sram_size);
if (!sram_data)
return -ENOMEM;
if (ops == &ice_nvm_region_ops) {
read_shadow_ram = false;
nvm_size = hw->flash.flash_size;
} else if (ops == &ice_sram_region_ops) {
read_shadow_ram = true;
nvm_size = hw->flash.sr_words * 2u;
} else {
NL_SET_ERR_MSG_MOD(extack, "Unexpected region in snapshot function");
return -EOPNOTSUPP;
}
err = ice_acquire_nvm(hw, ICE_RES_READ);
if (err) {
if (offset + size >= nvm_size) {
NL_SET_ERR_MSG_MOD(extack, "Cannot read beyond the region size");
return -ERANGE;
}
status = ice_acquire_nvm(hw, ICE_RES_READ);
if (status) {
dev_dbg(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
err, hw->adminq.sq_last_status);
status, hw->adminq.sq_last_status);
NL_SET_ERR_MSG_MOD(extack, "Failed to acquire NVM semaphore");
vfree(sram_data);
return err;
return -EIO;
}
/* Read from the Shadow RAM, rather than directly from NVM */
err = ice_read_flat_nvm(hw, 0, &sram_size, sram_data, true);
if (err) {
status = ice_read_flat_nvm(hw, (u32)offset, &size, data,
read_shadow_ram);
if (status) {
dev_dbg(dev, "ice_read_flat_nvm failed after reading %u bytes, err %d aq_err %d\n",
sram_size, err, hw->adminq.sq_last_status);
NL_SET_ERR_MSG_MOD(extack,
"Failed to read Shadow RAM contents");
size, status, hw->adminq.sq_last_status);
NL_SET_ERR_MSG_MOD(extack, "Failed to read NVM contents");
ice_release_nvm(hw);
vfree(sram_data);
return err;
return -EIO;
}
ice_release_nvm(hw);
*data = sram_data;
return 0;
}
......@@ -1784,12 +1802,14 @@ static const struct devlink_region_ops ice_nvm_region_ops = {
.name = "nvm-flash",
.destructor = vfree,
.snapshot = ice_devlink_nvm_snapshot,
.read = ice_devlink_nvm_read,
};
static const struct devlink_region_ops ice_sram_region_ops = {
.name = "shadow-ram",
.destructor = vfree,
.snapshot = ice_devlink_sram_snapshot,
.snapshot = ice_devlink_nvm_snapshot,
.read = ice_devlink_nvm_read,
};
static const struct devlink_region_ops ice_devcaps_region_ops = {
......
......@@ -650,6 +650,10 @@ struct devlink_info_req;
* the data variable must be updated to point to the snapshot data.
* The function will be called while the devlink instance lock is
* held.
* @read: callback to directly read a portion of the region. On success,
* the data pointer will be updated with the contents of the
* requested portion of the region. The function will be called
* while the devlink instance lock is held.
* @priv: Pointer to driver private data for the region operation
*/
struct devlink_region_ops {
......@@ -659,6 +663,10 @@ struct devlink_region_ops {
const struct devlink_region_ops *ops,
struct netlink_ext_ack *extack,
u8 **data);
int (*read)(struct devlink *devlink,
const struct devlink_region_ops *ops,
struct netlink_ext_ack *extack,
u64 offset, u32 size, u8 *data);
void *priv;
};
......@@ -670,6 +678,10 @@ struct devlink_region_ops {
* the data variable must be updated to point to the snapshot data.
* The function will be called while the devlink instance lock is
* held.
* @read: callback to directly read a portion of the region. On success,
* the data pointer will be updated with the contents of the
* requested portion of the region. The function will be called
* while the devlink instance lock is held.
* @priv: Pointer to driver private data for the region operation
*/
struct devlink_port_region_ops {
......@@ -679,6 +691,10 @@ struct devlink_port_region_ops {
const struct devlink_port_region_ops *ops,
struct netlink_ext_ack *extack,
u8 **data);
int (*read)(struct devlink_port *port,
const struct devlink_port_region_ops *ops,
struct netlink_ext_ack *extack,
u64 offset, u32 size, u8 *data);
void *priv;
};
......
......@@ -610,6 +610,8 @@ enum devlink_attr {
DEVLINK_ATTR_RATE_TX_PRIORITY, /* u32 */
DEVLINK_ATTR_RATE_TX_WEIGHT, /* u32 */
DEVLINK_ATTR_REGION_DIRECT, /* flag */
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
......
......@@ -6431,7 +6431,6 @@ devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info)
}
static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
struct devlink *devlink,
u8 *chunk, u32 chunk_size,
u64 addr)
{
......@@ -6461,39 +6460,37 @@ static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
#define DEVLINK_REGION_READ_CHUNK_SIZE 256
static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
struct devlink *devlink,
struct devlink_region *region,
struct nlattr **attrs,
u64 start_offset,
u64 end_offset,
u64 *new_offset)
typedef int devlink_chunk_fill_t(void *cb_priv, u8 *chunk, u32 chunk_size,
u64 curr_offset,
struct netlink_ext_ack *extack);
static int
devlink_nl_region_read_fill(struct sk_buff *skb, devlink_chunk_fill_t *cb,
void *cb_priv, u64 start_offset, u64 end_offset,
u64 *new_offset, struct netlink_ext_ack *extack)
{
struct devlink_snapshot *snapshot;
u64 curr_offset = start_offset;
u32 snapshot_id;
int err = 0;
u8 *data;
*new_offset = start_offset;
/* Allocate and re-use a single buffer */
data = kmalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
if (!data)
return -ENOMEM;
snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
if (!snapshot)
return -EINVAL;
*new_offset = start_offset;
while (curr_offset < end_offset) {
u32 data_size;
u8 *data;
if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
data_size = end_offset - curr_offset;
else
data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
data_size = min_t(u32, end_offset - curr_offset,
DEVLINK_REGION_READ_CHUNK_SIZE);
data = &snapshot->data[curr_offset];
err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
data, data_size,
curr_offset);
err = cb(cb_priv, data, data_size, curr_offset, extack);
if (err)
break;
err = devlink_nl_cmd_region_read_chunk_fill(skb, data, data_size, curr_offset);
if (err)
break;
......@@ -6501,21 +6498,57 @@ static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
}
*new_offset = curr_offset;
kfree(data);
return err;
}
static int
devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size,
u64 curr_offset,
struct netlink_ext_ack __always_unused *extack)
{
struct devlink_snapshot *snapshot = cb_priv;
memcpy(chunk, &snapshot->data[curr_offset], chunk_size);
return 0;
}
static int
devlink_region_port_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size,
u64 curr_offset, struct netlink_ext_ack *extack)
{
struct devlink_region *region = cb_priv;
return region->port_ops->read(region->port, region->port_ops, extack,
curr_offset, chunk_size, chunk);
}
static int
devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size,
u64 curr_offset, struct netlink_ext_ack *extack)
{
struct devlink_region *region = cb_priv;
return region->ops->read(region->devlink, region->ops, extack,
curr_offset, chunk_size, chunk);
}
static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
struct netlink_callback *cb)
{
const struct genl_dumpit_info *info = genl_dumpit_info(cb);
struct nlattr *chunks_attr, *region_attr, *snapshot_attr;
u64 ret_offset, start_offset, end_offset = U64_MAX;
struct nlattr **attrs = info->attrs;
struct devlink_port *port = NULL;
devlink_chunk_fill_t *region_cb;
struct devlink_region *region;
struct nlattr *chunks_attr;
const char *region_name;
struct devlink *devlink;
unsigned int index;
void *region_cb_priv;
void *hdr;
int err;
......@@ -6527,8 +6560,8 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
devl_lock(devlink);
if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
!attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
if (!attrs[DEVLINK_ATTR_REGION_NAME]) {
NL_SET_ERR_MSG(cb->extack, "No region name provided");
err = -EINVAL;
goto out_unlock;
}
......@@ -6543,7 +6576,8 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
}
}
region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]);
region_attr = attrs[DEVLINK_ATTR_REGION_NAME];
region_name = nla_data(region_attr);
if (port)
region = devlink_port_region_get_by_name(port, region_name);
......@@ -6551,10 +6585,51 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
region = devlink_region_get_by_name(devlink, region_name);
if (!region) {
NL_SET_ERR_MSG_ATTR(cb->extack, region_attr, "Requested region does not exist");
err = -EINVAL;
goto out_unlock;
}
snapshot_attr = attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID];
if (!snapshot_attr) {
if (!nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) {
NL_SET_ERR_MSG(cb->extack, "No snapshot id provided");
err = -EINVAL;
goto out_unlock;
}
if (!region->ops->read) {
NL_SET_ERR_MSG(cb->extack, "Requested region does not support direct read");
err = -EOPNOTSUPP;
goto out_unlock;
}
if (port)
region_cb = &devlink_region_port_direct_fill;
else
region_cb = &devlink_region_direct_fill;
region_cb_priv = region;
} else {
struct devlink_snapshot *snapshot;
u32 snapshot_id;
if (nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) {
NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Direct region read does not use snapshot");
err = -EINVAL;
goto out_unlock;
}
snapshot_id = nla_get_u32(snapshot_attr);
snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
if (!snapshot) {
NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Requested snapshot does not exist");
err = -EINVAL;
goto out_unlock;
}
region_cb = &devlink_region_snapshot_fill;
region_cb_priv = snapshot;
}
if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] &&
attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]) {
if (!start_offset)
......@@ -6603,10 +6678,9 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
goto nla_put_failure;
}
err = devlink_nl_region_read_snapshot_fill(skb, devlink,
region, attrs,
start_offset,
end_offset, &ret_offset);
err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv,
start_offset, end_offset, &ret_offset,
cb->extack);
if (err && err != -EMSGSIZE)
goto nla_put_failure;
......@@ -9251,6 +9325,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED },
[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 },
[DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 },
[DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG },
};
static const struct genl_small_ops devlink_nl_ops[] = {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment