- 21 Apr, 2023 35 commits
-
-
Stefano Garzarella authored
The spinlock we use to protect the state of the simulator is sometimes held for a long time (for example, when devices handle requests). This also prevents us from calling functions that might sleep (such as kthread_flush_work() in the next patch), and thus having to release and retake the lock. For these reasons, let's replace the spinlock with a mutex that gives us more flexibility. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131730.45920-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
Let's use our own kthread to run device jobs. This allows us more flexibility, especially we can attach the kthread to the user address space when vDPA uses user's VA. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131725.45908-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
Let's move work management inside the vdpa_sim core. This way we can easily change how we manage the works, without having to change the devices each time. Acked-by: Eugenio Pérez Martin <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131721.45886-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
vDPA supports the possibility to use user VA in the iotlb messages. So, let's add support for user VA in vringh to use it in the vDPA simulators. Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131716.45855-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
Define a macro to be reused in the different parts of the code. Useful for the next patches where we add more arrays to manage also translations with user VA. Suggested-by: Eugenio Perez Martin <eperezma@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-5-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
kmap_atomic() is deprecated in favor of kmap_local_page() since commit f3ba3c71 ("mm/highmem: Provide kmap_local*"). With kmap_local_page() the mappings are per thread, CPU local, can take page-faults, and can be called from any context (including interrupts). Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. kmap_atomic() is implemented like a kmap_local_page() which also disables page-faults and preemption (the latter only for !PREEMPT_RT kernels, otherwise it only disables migration). The code within the mappings/un-mappings in getu16_iotlb() and putu16_iotlb() don't depend on the above-mentioned side effects of kmap_atomic(), so that mere replacements of the old API with the new one is all that is required (i.e., there is no need to explicitly add calls to pagefault_disable() and/or preempt_disable()). This commit reuses a "boiler plate" commit message from Fabio, who has already did this change in several places. Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-4-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
When the user call VHOST_SET_OWNER ioctl and the vDPA device has `use_va` set to true, let's call the bind_mm callback. In this way we can bind the device to the user address space and directly use the user VA. The unbind_mm callback is called during the release after stopping the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
These new optional callbacks is used to bind/unbind the device to a specific address space so the vDPA framework can use VA when these callbacks are implemented. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
Replace `userpace` with `userspace`. Cc: Simon Horman <simon.horman@corigine.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230331080208.17002-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
Xie Yongji authored
As discussed in [1], this adds sysfs interface to support specifying bounce buffer size in virtio-vdpa case. It would be a performance tuning parameter for high throughput workloads. [1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-12-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Xie Yongji authored
Delay creating iova domain until the vduse device is registered to vdpa bus. This is a preparation for adding sysfs interface to support specifying bounce buffer size for the iova domain. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-11-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Xie Yongji authored
Now the vdpa callback will associate an trigger eventfd in some cases. For performance reasons, VDUSE can signal it directly during irq injection. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-10-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Xie Yongji authored
Add eventfd for the vdpa callback so that user can signal it directly instead of triggering the callback. It will be used for vhost-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-9-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
Xie Yongji authored
Add sysfs interface for each vduse virtqueue to get/set the affinity for irq callback. This might be useful for performance tuning when the irq callback affinity mask contains more than one CPU. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-8-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
Xie Yongji authored
This implements get_vq_affinity callback so that the virtio-blk driver can build the blk-mq queues based on the irq callback affinity. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-7-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
Xie Yongji authored
Since virtio-vdpa bus driver already support interrupt affinity spreading mechanism, let's implement the set_vq_affinity callback to bring it to vduse device. After we get the virtqueue's affinity, we can spread IRQs between CPUs in the affinity mask, in a round-robin manner, to run the irq callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
Xie Yongji authored
Allocate memory for vduse virtqueues one by one instead of doing one allocation for all of them. This is a preparation for adding sysfs interface for virtqueues. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Xie Yongji authored
To support interrupt affinity spreading mechanism, this makes use of group_cpus_evenly() to create an irq callback affinity mask for each virtqueue of vdpa device. Then we will unify set_vq_affinity callback to pass the affinity to the vdpa device driver. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-4-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
Xie Yongji authored
This introduces set/get_vq_affinity callbacks in vdpa_config_ops to support virtqueue affinity management for vdpa device drivers. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-3-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Xie Yongji authored
Export group_cpus_evenly() so that some modules can make use of it to group CPUs evenly according to NUMA and CPU locality. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-2-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Eli Cohen authored
Extend the possible list for features that can be supported by firmware. Note that different versions of firmware may or may not support these features. The driver is made aware of them by querying the firmware. While doing this, improve the code so we use enum names instead of hard coded numerical values. The new features supported by the driver are the following: VIRTIO_NET_F_MRG_RXBUF VIRTIO_NET_F_HOST_ECN VIRTIO_NET_F_GUEST_ECN VIRTIO_NET_F_GUEST_TSO6 VIRTIO_NET_F_GUEST_TSO4 Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
-
Eli Cohen authored
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF. Current firmware versions show degradation in packet rate when using MRG_RXBUF. Users who favor memory saving over packet rate could enable this feature but we want to keep it off by default. One can still enable it when creating the vdpa device using vdpa tool by providing features that include it. For example: $ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
-
Feng Liu authored
According to the Virtio Specification, the Queue Size parameter of a virtqueue corresponds to the maximum number of descriptors in that queue, and it does not have to be a power of 2 for packed virtqueues. However, the virtio_pci_modern driver enforced a power of 2 check for virtqueue sizes, which is unnecessary and restrictive for packed virtuqueue. Split virtqueue still needs to check the virtqueue size is power_of_2 which has been done in vring_alloc_queue_split of the virtio_ring layer. To validate this change, we tested various virtqueue sizes for packed rings, including 128, 256, 512, 100, 200, 500, and 1000, with CONFIG_PAGE_POISONING enabled, and all tests passed successfully. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230315185458.11638-2-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Mike Christie authored
We on longer need to hold the vhost_scsi_mutex the entire time we set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related to the tpg list, the port link/unlink functions use the tv_tpg_mutex while accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer queue events after the virtqueue's backend has been cleared and flushed, and we don't drop our refcount to the tpg until after we have stopped cmds and wait for outstanding cmds to complete. This moves the vhost_scsi_mutex use to it's documented use of being used to access the tpg list. We then don't need to hold it while a flush is being performed causing other device's vhost_scsi_set_endpoint and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a flakey device. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Mike Christie authored
We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared tpg->vhost_scsi and it can't be freed while they are using. However, we currently set the tpg->vhost_scsi pointer while holding tv_tpg_mutex. So, we can just hold that while calling vhost_scsi_hotplug/hotunplug. We then don't need to hold the vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing a flush which could cause the LUN map/unmap to have to wait on another device's flush. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Mike Christie authored
We currenly hold the vhost_scsi_mutex while clearing the endpoint and while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed from uder us, and to make sure anything queued is handled by the full call in vhost_scsi_clear_endpoint. This patch removes the need for the vhost_scsi_mutex for the latter case. In the next patches, we won't hold the vhost_scsi_mutex while flushing so this patch adds a check for the clearing of the virtqueue from vhost_scsi_clear_endpoint. We then know that once vhost_scsi_clear_endpoint has cleared the backend that no new events will be queued, and the flush after the vhost_vq_set_backend(vq, NULL) call will see everything that's been queued to that point. So the flush will then handle all events without the need for the vhost_scsi_mutex. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Mike Christie authored
We don't need the device mutex in vhost_scsi_do_plug because: 1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not change on us and the vhost_scsi can't be freed from under us if it was set. 2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint has completed that vhost_scsi_do_plug can't send new events and any queued ones have completed. So this patch drops the device mutex use in vhost_scsi_do_plug. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Mike Christie authored
We currently hold the vhost_scsi_mutex the entire time we are running vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents userspace from being able to free the se_tpg from under us after we have called target_undepend_item. However, it forces management operations for for other devices to have to wait on a flakey device's: vhost_scsi_clear_endpoint -> vhost_scsi_flush() call which can which can take a long time. This moves the target_undepend_item call and the tpg unsetup code to after we have stopped new IO from starting up and after we have waited on running IO. We can then release our refcount on the tpg and session knowing our device is no longer accessing them. We can then drop the vhost_scsi_mutex use during thee flush call in later patches in this set, when we have removed other reasons for holding it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Feng Liu authored
Add const to make the read-only pointer parameters clear, similar to many existing functions. To implement this change, the commit also introduces the use of `container_of_const` to implement `to_vvq`, which ensures the const-ness of read-only parameters and avoids accidental modification of their members. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Message-Id: <20230310053428.3376-4-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Feng Liu authored
According to kernel coding style [1], defining inline functions is not necessary and beneficial for simple functions. Hence clean up the code by removing the inline keyword. It is verified with GCC 12.2.0, the generated code with/without inline is same. Additionally tested with pktgen and iperf, and verified the result, the pps test results are the same in the cases of with/without inline. Iperf and pps of pktgen for virtio-net didn't change before and after the change. [1] https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-diseaseSigned-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20230310053428.3376-3-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Rong Tao authored
When we get help information, we should return directly, and we should not execute test cases. Move the exit() directly into the help() function and remove it from case '?'. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_822CEBEB925205EA1573541CD1C2604F4805@qq.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Rong Tao authored
Replace eight spaces with Tab. Signed-off-by: Rong Tao <rtoax@foxmail.com> Message-Id: <tencent_89579C514BC4020324A1A4ACA44B5B95BB07@qq.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Christophe JAILLET authored
Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct virtqueue' from 72 to 68 bytes. It saves a few bytes of memory. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <8f3d2e49270a2158717e15008e7ed7228196ba02.1676707807.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Lafreniere <peter@n8pjl.ca> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
-
Jacob Keller authored
The vhost_get_avail_size and vhost_get_used_size functions compute the size of structures with flexible array members with an additional 2 bytes if the VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use struct_size() and size_add() instead of coding the calculation by hand. This ensures that the calculations will saturate at SIZE_MAX rather than overflowing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Eli Cohen authored
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not negotiated. However, link state updates could be received before feature negotiation was completed , therefore causing link state events to be lost, possibly leaving the link state down. Modify the code so link state notifier is registered after DRIVER_OK was negotiated and carry the registration only if VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the device is reset. Fixes: 033779a7 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230417110343.138319-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
- 16 Apr, 2023 5 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Borislav Petkov: - Do not pull tasks to the local scheduling group if its average load is higher than the average system load * tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix imbalance overflow
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fix from Borislav Petkov: - Drop __init annotation from two rtc functions which get called after boot is done, in order to prevent a crash * tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/rtc: Remove __init for runtime functions
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fix from Michael Ellerman: - A fix for NUMA distance handling in the pseries SCM (pmem) driver. Thanks to Aneesh Kumar K.V. * tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Update the NUMA distance table for the target node
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Drop debug info from purgatory objects again - Document that kernel.org provides prebuilt LLVM toolchains - Give up handling untracked files for source package builds - Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given with a pre-epoch data. - Change panic_show_mem() to a macro to handle variable-length argument - Compress tarballs on-the-fly again * tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: do not create intermediate *.tar for tar packages kbuild: do not create intermediate *.tar for source tarballs kbuild: merge cmd_archive_linux and cmd_archive_perf init/initramfs: Fix argument forwarding to panic() in panic_show_mem() initramfs: Check negative timestamp to prevent broken cpio archive kbuild: give up untracked files for source package builds Documentation/llvm: Add a note about prebuilt kernel.org toolchains purgatory: fix disabling debug info
-