Commits · 960deb33be3d08e55a39e40e0286a51c7448e053 · Kirill Smelkov / linux

01 Nov, 2021 40 commits

vdpa: Use kernel coding style for structure comments · 960deb33

Parav Pandit authored Oct 26, 2021

As subsequent patch adds new structure field with comment, move the
structure comment to follow kernel coding style.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211026175519.87795-4-parav@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

960deb33

vdpa: Introduce query of device config layout · ad69dd0b

Parav Pandit authored Oct 26, 2021

Introduce a command to query a device config layout.

An example query of network vdpa device:

$ vdpa dev add name bar mgmtdev vdpasim_net

$ vdpa dev config show
bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500

$ vdpa dev config show -jp
{
    "config": {
        "bar": {
            "mac": "00:35:09:19:48:05",
            "link ": "up",
            "link_announce ": false,
            "mtu": 1500,
        }
    }
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211026175519.87795-3-parav@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

ad69dd0b

vdpa: Introduce and use vdpa device get, set config helpers · 6dbb1f16

Parav Pandit authored Oct 26, 2021

Subsequent patches enable get and set configuration either
via management device or via vdpa device' config ops.

This requires synchronization between multiple callers to get and set
config callbacks. Features setting also influence the layout of the
configuration fields endianness.

To avoid exposing synchronization primitives to callers, introduce
helper for setting the configuration and use it.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211026175519.87795-2-parav@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

6dbb1f16

virtio-scsi: don't let virtio core to validate used buffer length · c57911eb

Jason Wang authored Oct 27, 2021

We never tries to use used length, so the patch prevents the virtio
core from validating used length.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-5-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

c57911eb

virtio-blk: don't let virtio core to validate used length · a40392ed

Jason Wang authored Oct 27, 2021

We never tries to use used length, so the patch prevents the virtio
core from validating used length.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-4-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

a40392ed

virtio-net: don't let virtio core to validate used length · 816625c1

Jason Wang authored Oct 27, 2021

For RX virtuqueue, the used length is validated in all the three paths
(big, small and mergeable). For control vq, we never tries to use used
length. So this patch forbids the core to validate the used length.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-3-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

816625c1

virtio_ring: validate used buffer length · 939779f5

Jason Wang authored Oct 27, 2021

This patch validate the used buffer length provided by the device
before trying to use it. This is done by record the in buffer length
in a new field in desc_state structure during virtqueue_add(), then we
can fail the virtqueue_get_buf() when we find the device is trying to
give us a used buffer length which is greater than the in buffer
length.

Since some drivers have already done the validation by themselves,
this patch tries to makes the core validation optional. For the driver
that doesn't want the validation, it can set the
suppress_used_validation to be true (which could be overridden by
force_used_validation module parameter). To be more efficient, a
dedicate array is used for storing the validate used length, this
helps to eliminate the cache stress if validation is done by the
driver.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-2-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

939779f5

virtio_blk: correct types for status handling · f0839372

Michael S. Tsirkin authored Oct 25, 2021

virtblk_setup_cmd returns blk_status_t in an int, callers then assign it
back to a blk_status_t variable. blk_status_t is either u32 or (more
typically) u8 so it works, but is inelegant and causes sparse warnings.

Pass the status in blk_status_t in a consistent way.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: b2c5221fd074 ("virtio-blk: avoid preallocating big SGL for data")
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

f0839372

virtio_blk: allow 0 as num_request_queues · ead65f76

Michael S. Tsirkin authored Oct 24, 2021

The default value is 0 meaning "no limit". However if 0
is specified on the command line it is instead silently
converted to 1. Further, the value is already validated
at point of use, there's no point in duplicating code
validating the value when it is set.

Simplify the code while making the behaviour more consistent
by using plain module_param.

Fixes: 1a662cf6cb9a ("virtio-blk: add num_request_queues module parameter")
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

ead65f76

i2c: virtio: Add support for zero-length requests · dcce1625

Viresh Kumar authored Oct 21, 2021

The virtio specification received a new mandatory feature
(VIRTIO_I2C_F_ZERO_LENGTH_REQUEST) for zero length requests. Fail if the
feature isn't offered by the device.

For each read-request, set the VIRTIO_I2C_FLAGS_M_RD flag, as required
by the VIRTIO_I2C_F_ZERO_LENGTH_REQUEST feature.

This allows us to support zero length requests, like SMBUS Quick, where
the buffer need not be sent anymore.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/7c58868cd26d2fc4bd82d0d8b0dfb55636380110.1634808714.git.viresh.kumar@linaro.orgSigned-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jie Deng <jie.deng@intel.com> # once the spec is merged

dcce1625

virtio-blk: fixup coccinelle warnings · f1aa12f5

Ye Guojin authored Oct 21, 2021

coccicheck complains about the use of snprintf() in sysfs show
functions:
WARNING  use scnprintf or sprintf

Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211021065111.1047824-1-ye.guojin@zte.com.cnSigned-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

f1aa12f5

virtio_ring: fix typos in vring_desc_extra · ef5c366f

Jason Wang authored Oct 19, 2021

We're actually tracking descriptor address and length instead of the
buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-7-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

ef5c366f

virtio-pci: harden INTX interrupts · 080cd7c3

Jason Wang authored Oct 19, 2021

This patch tries to make sure the virtio interrupt handler for INTX
won't be called after a reset and before virtio_device_ready(). We
can't use IRQF_NO_AUTOEN since we're using shared interrupt
(IRQF_SHARED). So this patch tracks the INTX enabling status in a new
intx_soft_enabled variable and toggle it during in
vp_disable/enable_vectors(). The INTX interrupt handler will check
intx_soft_enabled before processing the actual interrupt.

Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-6-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

080cd7c3

virtio_pci: harden MSI-X interrupts · 9e35276a

Jason Wang authored Oct 19, 2021

We used to synchronize pending MSI-X irq handlers via
synchronize_irq(), this may not work for the untrusted device which
may keep sending interrupts after reset which may lead unexpected
results. Similarly, we should not enable MSI-X interrupt until the
device is ready. So this patch fixes those two issues by:

1) switching to use disable_irq() to prevent the virtio interrupt
   handlers to be called after the device is reset.
2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()

This can make sure the virtio interrupt handler won't be called before
virtio_device_ready() and after reset.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-5-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

9e35276a

virtio_config: introduce a new .enable_cbs method · d50497eb

Jason Wang authored Oct 19, 2021

This patch introduces a new method to enable the callbacks for config
and virtqueues. This will be used for making sure the virtqueue
callbacks are only enabled after virtio_device_ready() if transport
implements this method.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-4-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

d50497eb

virtio_console: validate max_nr_ports before trying to use it · 28962ec5

Jason Wang authored Oct 19, 2021

We calculate nr_ports based on the max_nr_ports:

nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;

If the device advertises a large max_nr_ports, we will end up with a
integer overflow. Fixing this by validating the max_nr_ports and fail
the probe for invalid max_nr_ports in this case.

Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-3-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

28962ec5

virtio_blk: Fix spelling mistake: "advertisted" -> "advertised" · 63b4ffa4

Colin Ian King authored Oct 25, 2021

There is a spelling mistake in a dev_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211025102240.22801-1-colin.i.king@gmail.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>

63b4ffa4

virtio-blk: validate num_queues during probe · 6ae6ff6f

Jason Wang authored Oct 19, 2021

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

6ae6ff6f

virtio-pmem: add myself as virtio-pmem maintainer · f1429e6c

Pankaj Gupta authored Oct 16, 2021

Adding myself as virtio-pmem maintainer and also adding virtualization
mailing list entry for virtio specific bits. Helps to get notified for
appropriate bug fixes & enhancements.
Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: https://lore.kernel.org/r/20211016090646.371145-1-pankaj.gupta.linux@gmail.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

f1429e6c

ALSA: virtio: Replace zero-length array with flexible-array member · 601695aa

Gustavo A. R. Silva authored Sep 29, 2021

There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

Also, make use of the struct_size() helper in kzalloc().

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210929191504.GA337268@embeddedorSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

601695aa

virtio_ring: check desc == NULL when using indirect with packed · fc6d70f4

Xuan Zhuo authored Oct 20, 2021

When using indirect with packed, we don't check for allocation failures.
This patch checks that and fall back on direct.

Fixes: 1ce9e605 ("virtio_ring: introduce packed ring support")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211020112323.67466-3-xuanzhuo@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

fc6d70f4

virtio_ring: make virtqueue_add_indirect_packed prettier · 8d7670f3

Xuan Zhuo authored Oct 20, 2021

Align the arguments of virtqueue_add_indirect_packed() to the open ( to
make it look prettier.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211020112323.67466-2-xuanzhuo@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

8d7670f3

hwrng: virtio - always add a pending request · 9a4b612d

Laurent Vivier authored Oct 28, 2021

If we ensure we have already some data available by enqueuing
again the buffer once data are exhausted, we can return what we
have without waiting for the device answer.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20211028101111.128049-5-lvivier@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

9a4b612d

hwrng: virtio - don't waste entropy · 5c8e9330

Laurent Vivier authored Oct 28, 2021

if we don't use all the entropy available in the buffer, keep it
and use it later.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20211028101111.128049-4-lvivier@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

5c8e9330

hwrng: virtio - don't wait on cleanup · 2bb31abd

Laurent Vivier authored Oct 28, 2021

When virtio-rng device was dropped by the hwrng core we were forced
to wait the buffer to come back from the device to not have
remaining ongoing operation that could spoil the buffer.

But now, as the buffer is internal to the virtio-rng we can release
the waiting loop immediately, the buffer will be retrieve and use
when the virtio-rng driver will be selected again.

This avoids to hang on an rng_current write command if the virtio-rng
device is blocked by a lack of entropy. This allows to select
another entropy source if the current one is empty.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20211028101111.128049-3-lvivier@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

2bb31abd

hwrng: virtio - add an internal buffer · bf3175bc

Laurent Vivier authored Oct 28, 2021

hwrng core uses two buffers that can be mixed in the
virtio-rng queue.

If the buffer is provided with wait=0 it is enqueued in the
virtio-rng queue but unused by the caller.
On the next call, core provides another buffer but the
first one is filled instead and the new one queued.
And the caller reads the data from the new one that is not
updated, and the data in the first one are lost.

To avoid this mix, virtio-rng needs to use its own unique
internal buffer at a cost of a data copy to the caller buffer.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20211028101111.128049-2-lvivier@redhat.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

bf3175bc

vdpa/mlx5: Propagate link status from device to vdpa driver · edf747af

Eli Cohen authored Sep 09, 2021

Add code to register to hardware asynchronous events. Use this
mechanism to track link status events coming from the device and update
the config struct.

After doing link status change, call the vdpa callback to notify of the
link status change.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210909123635.30884-4-elic@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

edf747af

vdpa/mlx5: Rename control VQ workqueue to vdpa wq · 218bdd20

Eli Cohen authored Sep 09, 2021

A subesequent patch will use the same workqueue for executing other
work not related to control VQ. Rename the workqueue and the work queue
entry used to convey information to the workqueue.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210909123635.30884-3-elic@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

218bdd20

vdpa/mlx5: Remove mtu field from vdpa net device · 246fd1ca

Eli Cohen authored Sep 09, 2021

No need to save the mtu int the net device struct. We can save it in the
config struct which cannot be modified.

Moreover, move the initialization to. mlx5_vdpa_set_features() callback
is not the right place to put it.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210909123635.30884-2-elic@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

246fd1ca

eni_vdpa: add vDPA driver for Alibaba ENI · e85087be

Wu Zongyong authored Oct 29, 2021

This patch adds a new vDPA driver for Alibaba ENI(Elastic Network
Interface) which is build upon virtio 0.9.5 specification.
And this driver is only enabled on X86 host currently.

Link: https://lore.kernel.org/r/6a9f32c00609af16bbb2ea32e633b3beb1cbf84b.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211026083214.3375383-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # fix Kconfig typo
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

e85087be

vdpa: add new attribute VDPA_ATTR_DEV_MIN_VQ_SIZE · e47be840

Wu Zongyong authored Oct 29, 2021

This attribute advertises the min value of virtqueue size. The value is
1 by default.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Link: https://lore.kernel.org/r/2bbc417355c4d22298050b1ba887cecfbde3e85d.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

e47be840

virtio_vdpa: setup correct vq size with callbacks get_vq_num_{max,min} · 30a03dfc

Wu Zongyong authored Oct 29, 2021

For the devices which implement the get_vq_num_min callback, the driver
should not negotiate with virtqueue size with the backend vdpa device if
the value returned by get_vq_num_min equals to the value returned by
get_vq_num_max.
This is useful for vdpa devices based on legacy virtio specfication.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Link: https://lore.kernel.org/r/bc0551cec6c3f3dd9424b678b7c22d882aebab3a.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

30a03dfc

vdpa: min vq num of vdpa device cannot be greater than max vq num · c53e5d1b

Wu Zongyong authored Oct 29, 2021

Just failed to probe the vdpa device if the min virtqueue num returned
by get_vq_num_min is greater than the max virtqueue num returned by
get_vq_num_max.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/21199b62cc10b2a9f2cf90eeb63ad080645d881f.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

c53e5d1b

vdpa: add new callback get_vq_num_min in vdpa_config_ops · 3b970a58

Wu Zongyong authored Oct 29, 2021

This callback is optional. For vdpa devices that not support to change
virtqueue size, get_vq_num_min and get_vq_num_max will return the same
value, so that users can choose a correct value for that device.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/f4af5b0abd660d9a29ab6b2f67bd6df10284a230.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

3b970a58

vp_vdpa: add vq irq offloading support · 5bbfea1e

Wu Zongyong authored Oct 29, 2021

This patch implements the get_vq_irq() callback for virtio pci devices
to allow irq offloading.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/bb091e5505db704dd620f8854a7aebc921d2a752.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

5bbfea1e

vdpa: fix typo · d0ae1fbf

Wu Zongyong authored Oct 29, 2021

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/4b5153262e4ba64986bb567d7425ad4829ca7bcc.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

d0ae1fbf

virtio-pci: introduce legacy device module · d89c8169

Wu Zongyong authored Oct 29, 2021

Split common codes from virtio-pci-legacy so vDPA driver can reuse it
later.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/71605acde5e97fcb2760a6973e406279fb1bbd33.1635493219.git.wuzongyong@linux.alibaba.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

d89c8169

virtio-blk: add num_request_queues module parameter · 0989c41b

Max Gurtovoy authored Sep 02, 2021

Sometimes a user would like to control the amount of request queues to
be created for a block device. For example, for limiting the memory
footprint of virtio-blk devices.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210902204622.54354-1-mgurtovoy@nvidia.comSigned-off-by: Michael S. Tsirkin <mst@redhat.com>

0989c41b

virtio-blk: avoid preallocating big SGL for data · 02746e26

Max Gurtovoy authored Sep 01, 2021

No need to pre-allocate a big buffer for the IO SGL anymore. If a device
has lots of deep queues, preallocation for the sg list can consume
substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
can be 64 or 128 and each queue's depth might be 128. This means the
resulting preallocation for the data SGLs is big.

Switch to runtime allocation for SGL for lists longer than 2 entries.
This is the approach used by NVMe drivers so it should be reasonable for
virtio block as well. Runtime SGL allocation has always been the case
for the legacy I/O path so this is nothing new.

The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
support SG_CHAIN, use only runtime allocation for the SGL.

Re-organize the setup of the IO request to fit the new sg chain
mechanism.

No performance degradation was seen (fio libaio engine with 16 jobs and
128 iodepth):

IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after)
-------- --------------------------------- ----------------------------------
512B 318K/316K 329K/325K

4KB 323K/321K 353K/349K

16KB 199K/208K 250K/275K

128KB 36K/36.1K 39.2K/41.7K
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Link: https://lore.kernel.org/r/20210901131434.31158-1-mgurtovoy@nvidia.comReviewed-by: Feng Li <lifeng1519@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # kconfig fixups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

02746e26

virtio_net: clarify tailroom logic · fc02e8cb

Michael S. Tsirkin authored Oct 09, 2021

Make tailroom math follow same logic as everything else, subtracing
values in the order in which things are laid out in the buffer.
Tested-by: Corentin Noël <corentin.noel@collabora.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

fc02e8cb