Commit 99262a3d authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'virtio-for-linus' of...

Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus

Pull virtio updates from Rusty Russell.

* tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio: fix typo in comment
  virtio-mmio: Devices parameter parsing
  virtio_blk: Drop unused request tracking list
  virtio-blk: Fix hot-unplug race in remove method
  virtio: Use ida to allocate virtio index
  virtio: balloon: separate out common code between remove and freeze functions
  virtio: balloon: drop restore_common()
  9p: disconnect channel when PCI device is removed
  virtio: update documentation to v0.9.5 of spec
parents bf67f3a5 c6190804
...@@ -110,6 +110,7 @@ parameter is applicable: ...@@ -110,6 +110,7 @@ parameter is applicable:
USB USB support is enabled. USB USB support is enabled.
USBHID USB Human Interface Device support is enabled. USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled. V4L Video For Linux support is enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VGA The VGA console has been enabled. VGA The VGA console has been enabled.
VT Virtual terminal support is enabled. VT Virtual terminal support is enabled.
WDT Watchdog support is enabled. WDT Watchdog support is enabled.
...@@ -2932,6 +2933,22 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -2932,6 +2933,22 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
video= [FB] Frame buffer configuration video= [FB] Frame buffer configuration
See Documentation/fb/modedb.txt. See Documentation/fb/modedb.txt.
virtio_mmio.device=
[VMMIO] Memory mapped virtio (platform) device.
<size>@<baseaddr>:<irq>[:<id>]
where:
<size> := size (can use standard suffixes
like K, M and G)
<baseaddr> := physical base address
<irq> := interrupt number (as passed to
request_irq())
<id> := (optional) platform device id
example:
virtio_mmio.device=1K@0x100b0000:48:7
Can be used multiple times for multiple devices.
vga= [BOOT,X86-32] Select a particular video mode vga= [BOOT,X86-32] Select a particular video mode
See Documentation/x86/boot.txt and See Documentation/x86/boot.txt and
Documentation/svga.txt. Documentation/svga.txt.
......
[Generated file: see http://ozlabs.org/~rusty/virtio-spec/] [Generated file: see http://ozlabs.org/~rusty/virtio-spec/]
Virtio PCI Card Specification Virtio PCI Card Specification
v0.9.1 DRAFT v0.9.5 DRAFT
- -
Rusty Russell <rusty@rustcorp.com.au>IBM Corporation (Editor) Rusty Russell <rusty@rustcorp.com.au> IBM Corporation (Editor)
2011 August 1. 2012 May 7.
Purpose and Description Purpose and Description
...@@ -68,11 +68,11 @@ and consists of three parts: ...@@ -68,11 +68,11 @@ and consists of three parts:
+-------------------+-----------------------------------+-----------+ +-------------------+-----------------------------------+-----------+
When the driver wants to send buffers to the device, it puts them When the driver wants to send a buffer to the device, it fills in
in one or more slots in the descriptor table, and writes the a slot in the descriptor table (or chains several together), and
descriptor indices into the available ring. It then notifies the writes the descriptor index into the available ring. It then
device. When the device has finished with the buffers, it writes notifies the device. When the device has finished a buffer, it
the descriptors into the used ring, and sends an interrupt. writes the descriptor into the used ring, and sends an interrupt.
Specification Specification
...@@ -106,7 +106,13 @@ for informational purposes by the guest). ...@@ -106,7 +106,13 @@ for informational purposes by the guest).
+----------------------+--------------------+---------------+ +----------------------+--------------------+---------------+
| 6 | ioMemory | - | | 6 | ioMemory | - |
+----------------------+--------------------+---------------+ +----------------------+--------------------+---------------+
| 7 | rpmsg | Appendix H |
+----------------------+--------------------+---------------+
| 8 | SCSI host | Appendix I |
+----------------------+--------------------+---------------+
| 9 | 9P transport | - | | 9 | 9P transport | - |
+----------------------+--------------------+---------------+
| 10 | mac80211 wlan | - |
+----------------------+--------------------+---------------+ +----------------------+--------------------+---------------+
...@@ -127,7 +133,7 @@ Note that this is possible because while the virtio header is PCI ...@@ -127,7 +133,7 @@ Note that this is possible because while the virtio header is PCI
the native endian of the guest (where such distinction is the native endian of the guest (where such distinction is
applicable). applicable).
Device Initialization Sequence Device Initialization Sequence<sub:Device-Initialization-Sequence>
We start with an overview of device initialization, then expand We start with an overview of device initialization, then expand
on the details of the device and how each step is preformed. on the details of the device and how each step is preformed.
...@@ -177,7 +183,10 @@ The virtio header looks as follows: ...@@ -177,7 +183,10 @@ The virtio header looks as follows:
If MSI-X is enabled for the device, two additional fields If MSI-X is enabled for the device, two additional fields
immediately follow this header: immediately follow this header:[footnote:
ie. once you enable MSI-X on the device, the other fields move.
If you turn it off again, they move back!
]
+------------++----------------+--------+ +------------++----------------+--------+
...@@ -191,20 +200,6 @@ immediately follow this header: ...@@ -191,20 +200,6 @@ immediately follow this header:
+------------++----------------+--------+ +------------++----------------+--------+
Finally, if feature bits (VIRTIO_F_FEATURES_HI) this is
immediately followed by two additional fields:
+------------++----------------------+----------------------
| Bits || 32 | 32
+------------++----------------------+----------------------
| Read/Write || R | R+W
+------------++----------------------+----------------------
| Purpose || Device | Guest
| || Features bits 32:63 | Features bits 32:63
+------------++----------------------+----------------------
Immediately following these general headers, there may be Immediately following these general headers, there may be
device-specific headers: device-specific headers:
...@@ -238,31 +233,25 @@ at least one bit should be set: ...@@ -238,31 +233,25 @@ at least one bit should be set:
may be a significant (or infinite) delay before setting this may be a significant (or infinite) delay before setting this
bit. bit.
DRIVER_OK (3) Indicates that the driver is set up and ready to DRIVER_OK (4) Indicates that the driver is set up and ready to
drive the device. drive the device.
FAILED (8) Indicates that something went wrong in the guest, FAILED (128) Indicates that something went wrong in the guest,
and it has given up on the device. This could be an internal and it has given up on the device. This could be an internal
error, or the driver didn't like the device for some reason, or error, or the driver didn't like the device for some reason, or
even a fatal error during device operation. The device must be even a fatal error during device operation. The device must be
reset before attempting to re-initialize. reset before attempting to re-initialize.
Feature Bits Feature Bits<sub:Feature-Bits>
The least significant 31 bits of the first configuration field Thefirst configuration field indicates the features that the
indicates the features that the device supports (the high bit is device supports. The bits are allocated as follows:
reserved, and will be used to indicate the presence of future
feature bits elsewhere). If more than 31 feature bits are
supported, the device indicates so by setting feature bit 31 (see
[cha:Reserved-Feature-Bits]). The bits are allocated as follows:
0 to 23 Feature bits for the specific device type 0 to 23 Feature bits for the specific device type
24 to 40 Feature bits reserved for extensions to the queue and 24 to 32 Feature bits reserved for extensions to the queue and
feature negotiation mechanisms feature negotiation mechanisms
41 to 63 Feature bits reserved for future extensions
For example, feature bit 0 for a network device (i.e. Subsystem For example, feature bit 0 for a network device (i.e. Subsystem
Device ID 1) indicates that the device supports checksumming of Device ID 1) indicates that the device supports checksumming of
packets. packets.
...@@ -286,10 +275,6 @@ will not see that feature bit in the Device Features field and ...@@ -286,10 +275,6 @@ will not see that feature bit in the Device Features field and
can go into backwards compatibility mode (or, for poor can go into backwards compatibility mode (or, for poor
implementations, set the FAILED Device Status bit). implementations, set the FAILED Device Status bit).
Access to feature bits 32 to 63 is enabled by Guest by setting
feature bit 31. If this bit is unset, Device must assume that all
feature bits > 31 are unset.
Configuration/Queue Vectors Configuration/Queue Vectors
When MSI-X capability is present and enabled in the device When MSI-X capability is present and enabled in the device
...@@ -324,7 +309,7 @@ success, the previously written value is returned, and on ...@@ -324,7 +309,7 @@ success, the previously written value is returned, and on
failure, NO_VECTOR is returned. If a mapping failure is detected, failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver can retry mapping with fewervectors, or disable MSI-X. the driver can retry mapping with fewervectors, or disable MSI-X.
Virtqueue Configuration Virtqueue Configuration<sec:Virtqueue-Configuration>
As a device can have zero or more virtqueues for bulk data As a device can have zero or more virtqueues for bulk data
transport (for example, the network driver has two), the driver transport (for example, the network driver has two), the driver
...@@ -587,7 +572,7 @@ and Red Hat under the (3-clause) BSD license so that it can be ...@@ -587,7 +572,7 @@ and Red Hat under the (3-clause) BSD license so that it can be
freely used by all other projects, and is reproduced (with slight freely used by all other projects, and is reproduced (with slight
variation to remove Linux assumptions) in Appendix A. variation to remove Linux assumptions) in Appendix A.
Device Operation Device Operation<sec:Device-Operation>
There are two parts to device operation: supplying new buffers to There are two parts to device operation: supplying new buffers to
the device, and processing used buffers from the device. As an the device, and processing used buffers from the device. As an
...@@ -813,7 +798,7 @@ vring.used->ring[vq->last_seen_used%vsz]; ...@@ -813,7 +798,7 @@ vring.used->ring[vq->last_seen_used%vsz];
} }
Dealing With Configuration Changes Dealing With Configuration Changes<sub:Dealing-With-Configuration>
Some virtio PCI devices can change the device configuration Some virtio PCI devices can change the device configuration
state, as reflected in the virtio header in the PCI configuration state, as reflected in the virtio header in the PCI configuration
...@@ -1260,18 +1245,6 @@ Currently there are five device-independent feature bits defined: ...@@ -1260,18 +1245,6 @@ Currently there are five device-independent feature bits defined:
driver should ignore the used_event field; the device should driver should ignore the used_event field; the device should
ignore the avail_event field; the flags field is used ignore the avail_event field; the flags field is used
VIRTIO_F_BAD_FEATURE(30) This feature should never be
negotiated by the guest; doing so is an indication that the
guest is faulty[footnote:
An experimental virtio PCI driver contained in Linux version
2.6.25 had this problem, and this feature bit can be used to
detect it.
]
VIRTIO_F_FEATURES_HIGH(31) This feature indicates that the
device supports feature bits 32:63. If unset, feature bits
32:63 are unset.
Appendix C: Network Device Appendix C: Network Device
The virtio network device is a virtual ethernet card, and is the The virtio network device is a virtual ethernet card, and is the
...@@ -1335,11 +1308,17 @@ were required. ...@@ -1335,11 +1308,17 @@ were required.
VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering. VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering.
VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous
packets.
Device configuration layout Two configuration fields are Device configuration layout Two configuration fields are
currently defined. The mac address field always exists (though currently defined. The mac address field always exists (though
is only valid if VIRTIO_NET_F_MAC is set), and the status field is only valid if VIRTIO_NET_F_MAC is set), and the status field
only exists if VIRTIO_NET_F_STATUS is set. Only one bit is only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits
currently defined for the status field: VIRTIO_NET_S_LINK_UP. #define VIRTIO_NET_S_LINK_UP 1 are currently defined for the status field:
VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE. #define VIRTIO_NET_S_LINK_UP 1
#define VIRTIO_NET_S_ANNOUNCE 2
...@@ -1377,12 +1356,19 @@ struct virtio_net_config { ...@@ -1377,12 +1356,19 @@ struct virtio_net_config {
packets by negotating the VIRTIO_NET_F_CSUM feature. This “ packets by negotating the VIRTIO_NET_F_CSUM feature. This “
checksum offload” is a common feature on modern network cards. checksum offload” is a common feature on modern network cards.
If that feature is negotiated, a driver can use TCP or UDP If that feature is negotiated[footnote:
segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
(IPv4 TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
VIRTIO_NET_F_HOST_UFO (UDP fragmentation) features. It should features must offer the checksum feature, and a driver which
not send TCP packets requiring segmentation offload which have accepts the offload features must accept the checksum feature.
the Explicit Congestion Notification bit set, unless the Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
depending on VIRTIO_NET_F_GUEST_CSUM.
], a driver can use TCP or UDP segmentation offload by
negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 TCP),
VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
(UDP fragmentation) features. It should not send TCP packets
requiring segmentation offload which have the Explicit
Congestion Notification bit set, unless the
VIRTIO_NET_F_HOST_ECN feature is negotiated.[footnote: VIRTIO_NET_F_HOST_ECN feature is negotiated.[footnote:
This is a common restriction in real, older network cards. This is a common restriction in real, older network cards.
] ]
...@@ -1403,7 +1389,7 @@ segmentation, if both guests are amenable. ...@@ -1403,7 +1389,7 @@ segmentation, if both guests are amenable.
Packets are transmitted by placing them in the transmitq, and Packets are transmitted by placing them in the transmitq, and
buffers for incoming packets are placed in the receiveq. In each buffers for incoming packets are placed in the receiveq. In each
case, the packet itself is preceded by a header: case, the packet itself is preceeded by a header:
struct virtio_net_hdr { struct virtio_net_hdr {
...@@ -1462,9 +1448,10 @@ It will have a 14 byte ethernet header and 20 byte IP header ...@@ -1462,9 +1448,10 @@ It will have a 14 byte ethernet header and 20 byte IP header
followed by the TCP header (with the TCP checksum field 16 bytes followed by the TCP header (with the TCP checksum field 16 bytes
into that header). csum_start will be 14+20 = 34 (the TCP into that header). csum_start will be 14+20 = 34 (the TCP
checksum includes the header), and csum_offset will be 16. The checksum includes the header), and csum_offset will be 16. The
value in the TCP checksum field will be the sum of the TCP pseudo value in the TCP checksum field should be initialized to the sum
header, so that replacing it by the ones' complement checksum of of the TCP pseudo header, so that replacing it by the ones'
the TCP header and body will give the correct result. complement checksum of the TCP header and body will give the
correct result.
] ]
<enu:If-the-driver>If the driver negotiated <enu:If-the-driver>If the driver negotiated
...@@ -1483,8 +1470,8 @@ Due to various bugs in implementations, this field is not useful ...@@ -1483,8 +1470,8 @@ Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size. as a guarantee of the transport header size.
] ]
gso_size is the size of the packet beyond that header (ie. gso_size is the maximum size of each packet beyond that header
MSS). (ie. MSS).
If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the
VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as well, VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as well,
...@@ -1567,7 +1554,9 @@ Processing packet involves: ...@@ -1567,7 +1554,9 @@ Processing packet involves:
If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
negotiated, then the “gso_type” may be something other than negotiated, then the “gso_type” may be something other than
VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the
desired MSS (see [enu:If-the-driver]).Control Virtqueue desired MSS (see [enu:If-the-driver]).
Control Virtqueue
The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is
negotiated) to send commands to manipulate various features of negotiated) to send commands to manipulate various features of
...@@ -1642,7 +1631,7 @@ struct virtio_net_ctrl_mac { ...@@ -1642,7 +1631,7 @@ struct virtio_net_ctrl_mac {
The device can filter incoming packets by any number of The device can filter incoming packets by any number of
destination MAC addresses.[footnote: destination MAC addresses.[footnote:
Since there are no guarantees, it can use a hash filter Since there are no guarentees, it can use a hash filter
orsilently switch to allmulti or promiscuous mode if it is given orsilently switch to allmulti or promiscuous mode if it is given
too many addresses. too many addresses.
] This table is set using the class VIRTIO_NET_CTRL_MAC and the ] This table is set using the class VIRTIO_NET_CTRL_MAC and the
...@@ -1665,6 +1654,38 @@ can control a VLAN filter table in the device. ...@@ -1665,6 +1654,38 @@ can control a VLAN filter table in the device.
Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
command take a 16-bit VLAN id as the command-specific-data. command take a 16-bit VLAN id as the command-specific-data.
Gratuitous Packet Sending
If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous
packets; this is usually done after the guest has been physically
migrated, and needs to announce its presence on the new network
links. (As hypervisor does not have the knowledge of guest
network configuration (eg. tagged vlan) it is simplest to prod
the guest in this way).
#define VIRTIO_NET_CTRL_ANNOUNCE 3
#define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status
field when it notices the changes of device configuration. The
command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
driver has recevied the notification and device would clear the
VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
this command.
Processing this notification involves:
Sending the gratuitous packets or marking there are pending
gratuitous packets to be sent and letting deferred routine to
send them.
Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
vq.
.
Appendix D: Block Device Appendix D: Block Device
The virtio block device is a simple virtual block device (ie. The virtio block device is a simple virtual block device (ie.
...@@ -1699,8 +1720,6 @@ device except where noted. ...@@ -1699,8 +1720,6 @@ device except where noted.
VIRTIO_BLK_F_FLUSH (9) Cache flush command support. VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
Device configuration layout The capacity of the device Device configuration layout The capacity of the device
(expressed in 512-byte sectors) is always present. The (expressed in 512-byte sectors) is always present. The
availability of the others all depend on various feature bits availability of the others all depend on various feature bits
...@@ -1743,8 +1762,6 @@ device except where noted. ...@@ -1743,8 +1762,6 @@ device except where noted.
If the VIRTIO_BLK_F_RO feature is set by the device, any write If the VIRTIO_BLK_F_RO feature is set by the device, any write
requests will fail. requests will fail.
Device Operation Device Operation
The driver queues requests to the virtqueue, and they are used by The driver queues requests to the virtqueue, and they are used by
...@@ -1805,7 +1822,7 @@ the FLUSH and FLUSH_OUT types are equivalent, the device does not ...@@ -1805,7 +1822,7 @@ the FLUSH and FLUSH_OUT types are equivalent, the device does not
distinguish between them distinguish between them
]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit ]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit
(VIRTIO_BLK_T_BARRIER) indicates that this request acts as a (VIRTIO_BLK_T_BARRIER) indicates that this request acts as a
barrier and that all preceding requests must be complete before barrier and that all preceeding requests must be complete before
this one, and all following requests must not be started until this one, and all following requests must not be started until
this is complete. Note that a barrier does not flush caches in this is complete. Note that a barrier does not flush caches in
the underlying backend device in host, and thus does not serve as the underlying backend device in host, and thus does not serve as
...@@ -2118,7 +2135,7 @@ This is historical, and independent of the guest page size ...@@ -2118,7 +2135,7 @@ This is historical, and independent of the guest page size
Otherwise, the guest may begin to re-use pages previously given Otherwise, the guest may begin to re-use pages previously given
to the balloon before the device has acknowledged their to the balloon before the device has acknowledged their
withdrawal. [footnote: withdrawl. [footnote:
In this case, deflation advice is merely a courtesy In this case, deflation advice is merely a courtesy
] ]
...@@ -2198,3 +2215,996 @@ as follows: ...@@ -2198,3 +2215,996 @@ as follows:
VIRTIO_BALLOON_S_MEMTOT The total amount of memory available VIRTIO_BALLOON_S_MEMTOT The total amount of memory available
(in bytes). (in bytes).
Appendix H: Rpmsg: Remote Processor Messaging
Virtio rpmsg devices represent remote processors on the system
which run in asymmetric multi-processing (AMP) configuration, and
which are usually used to offload cpu-intensive tasks from the
main application processor (a typical SoC methodology).
Virtio is being used to communicate with those remote processors;
empty buffers are placed in one virtqueue for receiving messages,
and non-empty buffers, containing outbound messages, are enqueued
in a second virtqueue for transmission.
Numerous communication channels can be multiplexed over those two
virtqueues, so different entities, running on the application and
remote processor, can directly communicate in a point-to-point
fashion.
Configuration
Subsystem Device ID 7
Virtqueues 0:receiveq. 1:transmitq.
Feature bits
VIRTIO_RPMSG_F_NS (0) Device sends (and capable of receiving)
name service messages announcing the creation (or
destruction) of a channel:/**
* struct rpmsg_ns_msg - dynamic name service announcement
message
* @name: name of remote service that is published
* @addr: address of remote service that is published
* @flags: indicates whether service is created or destroyed
*
* This message is sent across to publish a new service (or
announce
* about its removal). When we receives these messages, an
appropriate
* rpmsg channel (i.e device) is created/destroyed.
*/
struct rpmsg_ns_msgoon_config {
char name[RPMSG_NAME_SIZE];
u32 addr;
u32 flags;
} __packed;
/**
* enum rpmsg_ns_flags - dynamic name service announcement flags
*
* @RPMSG_NS_CREATE: a new remote service was just created
* @RPMSG_NS_DESTROY: a remote service was just destroyed
*/
enum rpmsg_ns_flags {
RPMSG_NS_CREATE = 0,
RPMSG_NS_DESTROY = 1,
};
Device configuration layout
At his point none currently defined.
Device Initialization
The initialization routine should identify the receive and
transmission virtqueues.
The receive virtqueue should be filled with receive buffers.
Device Operation
Messages are transmitted by placing them in the transmitq, and
buffers for inbound messages are placed in the receiveq. In any
case, messages are always preceded by the following header: /**
* struct rpmsg_hdr - common header for all rpmsg messages
* @src: source address
* @dst: destination address
* @reserved: reserved for future use
* @len: length of payload (in bytes)
* @flags: message flags
* @data: @len bytes of message payload data
*
* Every message sent(/received) on the rpmsg bus begins with
this header.
*/
struct rpmsg_hdr {
u32 src;
u32 dst;
u32 reserved;
u16 len;
u16 flags;
u8 data[0];
} __packed;
Appendix I: SCSI Host Device
The virtio SCSI host device groups together one or more virtual
logical units (such as disks), and allows communicating to them
using the SCSI protocol. An instance of the device represents a
SCSI host to which many targets and LUNs are attached.
The virtio SCSI device services two kinds of requests:
command requests for a logical unit;
task management functions related to a logical unit, target or
command.
The device is also able to send out notifications about added and
removed logical units. Together, these capabilities provide a
SCSI transport protocol that uses virtqueues as the transfer
medium. In the transport protocol, the virtio driver acts as the
initiator, while the virtio SCSI host provides one or more
targets that receive and process the requests.
Configuration
Subsystem Device ID 8
Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
Feature bits
VIRTIO_SCSI_F_INOUT (0) A single request can include both
read-only and write-only data buffers.
VIRTIO_SCSI_F_HOTPLUG (1) The host should enable
hot-plug/hot-unplug of new LUNs and targets on the SCSI bus.
Device configuration layout All fields of this configuration
are always available. sense_size and cdb_size are writable by
the guest.struct virtio_scsi_config {
u32 num_queues;
u32 seg_max;
u32 max_sectors;
u32 cmd_per_lun;
u32 event_info_size;
u32 sense_size;
u32 cdb_size;
u16 max_channel;
u16 max_target;
u32 max_lun;
};
num_queues is the total number of request virtqueues exposed by
the device. The driver is free to use only one request queue,
or it can use more to achieve better performance.
seg_max is the maximum number of segments that can be in a
command. A bidirectional command can include seg_max input
segments and seg_max output segments.
max_sectors is a hint to the guest about the maximum transfer
size it should use.
cmd_per_lun is a hint to the guest about the maximum number of
linked commands it should send to one LUN. The actual value
to be used is the minimum of cmd_per_lun and the virtqueue
size.
event_info_size is the maximum size that the device will fill
for buffers that the driver places in the eventq. The driver
should always put buffers at least of this size. It is
written by the device depending on the set of negotated
features.
sense_size is the maximum size of the sense data that the
device will write. The default value is written by the device
and will always be 96, but the driver can modify it. It is
restored to the default when the device is reset.
cdb_size is the maximum size of the CDB that the driver will
write. The default value is written by the device and will
always be 32, but the driver can likewise modify it. It is
restored to the default when the device is reset.
max_channel, max_target and max_lun can be used by the driver
as hints to constrain scanning the logical units on the
host.h
Device Initialization
The initialization routine should first of all discover the
device's virtqueues.
If the driver uses the eventq, it should then place at least a
buffer in the eventq.
The driver can immediately issue requests (for example, INQUIRY
or REPORT LUNS) or task management functions (for example, I_T
RESET).
Device Operation: request queues
The driver queues requests to an arbitrary request queue, and
they are used by the device on that same queue. It is the
responsibility of the driver to ensure strict request ordering
for commands placed on different queues, because they will be
consumed with no order constraints.
Requests have the following format:
struct virtio_scsi_req_cmd {
// Read-only
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
// Write-only part
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};
/* command-specific response values */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_OVERRUN 1
#define VIRTIO_SCSI_S_ABORTED 2
#define VIRTIO_SCSI_S_BAD_TARGET 3
#define VIRTIO_SCSI_S_RESET 4
#define VIRTIO_SCSI_S_BUSY 5
#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
#define VIRTIO_SCSI_S_TARGET_FAILURE 7
#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
#define VIRTIO_SCSI_S_FAILURE 9
/* task_attr */
#define VIRTIO_SCSI_S_SIMPLE 0
#define VIRTIO_SCSI_S_ORDERED 1
#define VIRTIO_SCSI_S_HEAD 2
#define VIRTIO_SCSI_S_ACA 3
The lun field addresses a target and logical unit in the
virtio-scsi device's SCSI domain. The only supported format for
the LUN field is: first byte set to 1, second byte set to target,
third and fourth byte representing a single level LUN structure,
followed by four zero bytes. With this representation, a
virtio-scsi device can serve up to 256 targets and 16384 LUNs per
target.
The id field is the command identifier (“tag”).
task_attr, prio and crn should be left to zero. task_attr defines
the task attribute as in the table above, but all task attributes
may be mapped to SIMPLE by the device; crn may also be provided
by clients, but is generally expected to be 0. The maximum CRN
value defined by the protocol is 255, since CRN is stored in an
8-bit integer.
All of these fields are defined in SAM. They are always
read-only, as are the cdb and dataout field. The cdb_size is
taken from the configuration space.
sense and subsequent fields are always write-only. The sense_len
field indicates the number of bytes actually written to the sense
buffer. The residual field indicates the residual size,
calculated as “data_length - number_of_transferred_bytes”, for
read or write operations. For bidirectional commands, the
number_of_transferred_bytes includes both read and written bytes.
A residual field that is less than the size of datain means that
the dataout field was processed entirely. A residual field that
exceeds the size of datain means that the dataout field was
processed partially and the datain field was not processed at
all.
The status byte is written by the device to be the status code as
defined in SAM.
The response byte is written by the device to be one of the
following:
VIRTIO_SCSI_S_OK when the request was completed and the status
byte is filled with a SCSI status code (not necessarily
"GOOD").
VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires
transferring more data than is available in the data buffers.
VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
ABORT TASK or ABORT TASK SET task management function.
VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
because the target indicated by the lun field does not exist.
VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
or device reset (including a task management function).
VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
problem in the connection between the host and the target
(severed link).
VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
failure and the guest should not retry on other paths.
VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
but retrying on other paths might yield a different result.
VIRTIO_SCSI_S_BUSY if the request failed but retrying on the
same path should work.
VIRTIO_SCSI_S_FAILURE for other host or guest error. In
particular, if neither dataout nor datain is empty, and the
VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
request will be immediately returned with a response equal to
VIRTIO_SCSI_S_FAILURE.
Device Operation: controlq
The controlq is used for other SCSI transport operations.
Requests have the following format:
struct virtio_scsi_ctrl {
u32 type;
...
u8 response;
};
/* response values valid for all commands */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_BAD_TARGET 3
#define VIRTIO_SCSI_S_BUSY 5
#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
#define VIRTIO_SCSI_S_TARGET_FAILURE 7
#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
#define VIRTIO_SCSI_S_FAILURE 9
#define VIRTIO_SCSI_S_INCORRECT_LUN 12
The type identifies the remaining fields.
The following commands are defined:
Task management function
#define VIRTIO_SCSI_T_TMF 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
#define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
struct virtio_scsi_ctrl_tmf
{
// Read-only part
u32 type;
u32 subtype;
u8 lun[8];
u64 id;
// Write-only part
u8 response;
}
/* command-specific response values */
#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10
#define VIRTIO_SCSI_S_FUNCTION_REJECTED 11
The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
fields except response are filled by the driver. The subtype
field must always be specified and identifies the requested
task management function.
Other fields may be irrelevant for the requested TMF; if so,
they are ignored but they should still be present. The lun
field is in the same format specified for request queues; the
single level LUN is ignored when the task management function
addresses a whole I_T nexus. When relevant, the value of the id
field is matched against the id values passed on the requestq.
The outcome of the task management function is written by the
device in the response field. The command-specific response
values map 1-to-1 with those defined in SAM.
Asynchronous notification query
#define VIRTIO_SCSI_T_AN_QUERY 1
struct virtio_scsi_ctrl_an {
// Read-only part
u32 type;
u8 lun[8];
u32 event_requested;
// Write-only part
u32 event_actual;
u8 response;
}
#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
By sending this command, the driver asks the device which
events the given LUN can report, as described in paragraphs 6.6
and A.6 of the SCSI MMC specification. The driver writes the
events it is interested in into the event_requested; the device
responds by writing the events that it supports into
event_actual.
The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
fields are written by the driver. The event_actual and response
fields are written by the device.
No command-specific values are defined for the response byte.
Asynchronous notification subscription
#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
struct virtio_scsi_ctrl_an {
// Read-only part
u32 type;
u8 lun[8];
u32 event_requested;
// Write-only part
u32 event_actual;
u8 response;
}
By sending this command, the driver asks the specified LUN to
report events for its physical interface, again as described in
the SCSI MMC specification. The driver writes the events it is
interested in into the event_requested; the device responds by
writing the events that it supports into event_actual.
Event types are the same as for the asynchronous notification
query message.
The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
event_requested fields are written by the driver. The
event_actual and response fields are written by the device.
No command-specific values are defined for the response byte.
Device Operation: eventq
The eventq is used by the device to report information on logical
units that are attached to it. The driver should always leave a
few buffers ready in the eventq. In general, the device will not
queue events to cope with an empty eventq, and will end up
dropping events if it finds no buffer ready. However, when
reporting events for many LUNs (e.g. when a whole target
disappears), the device can throttle events to avoid dropping
them. For this reason, placing 10-15 buffers on the event queue
should be enough.
Buffers are placed in the eventq and filled by the device when
interesting events occur. The buffers should be strictly
write-only (device-filled) and the size of the buffers should be
at least the value given in the device's configuration
information.
Buffers returned by the device on the eventq will be referred to
as "events" in the rest of this section. Events have the
following format:
#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
struct virtio_scsi_event {
// Write-only part
u32 event;
...
}
If bit 31 is set in the event field, the device failed to report
an event due to missing buffers. In this case, the driver should
poll the logical units for unit attention conditions, and/or do
whatever form of bus scan is appropriate for the guest operating
system.
Other data that the device writes to the buffer depends on the
contents of the event field. The following events are defined:
No event
#define VIRTIO_SCSI_T_NO_EVENT 0
This event is fired in the following cases:
When the device detects in the eventq a buffer that is shorter
than what is indicated in the configuration field, it might
use it immediately and put this dummy value in the event
field. A well-written driver will never observe this
situation.
When events are dropped, the device may signal this event as
soon as the drivers makes a buffer available, in order to
request action from the driver. In this case, of course, this
event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
flag.
Transport reset
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
struct virtio_scsi_event_reset {
// Write-only part
u32 event;
u8 lun[8];
u32 reason;
}
#define VIRTIO_SCSI_EVT_RESET_HARD 0
#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
By sending this event, the device signals that a logical unit
on a target has been reset, including the case of a new device
appearing or disappearing on the bus.The device fills in all
fields. The event field is set to
VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
logical unit in the SCSI host.
The reason value is one of the three #define values appearing
above:
VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used if
the target or logical unit is no longer able to receive
commands.
VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the
logical unit has been reset, but is still present.
VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if a
target or logical unit has just appeared on the device.
The “removed” and “rescan” events, when sent for LUN 0, may
apply to the entire target. After receiving them the driver
should ask the initiator to rescan the target, in order to
detect the case when an entire target has appeared or
disappeared. These two events will never be reported unless the
VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host
and the guest.
Events will also be reported via sense codes (this obviously
does not apply to newly appeared buses or targets, since the
application has never discovered them):
“LUN/target removed” maps to sense key ILLEGAL REQUEST, asc
0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
“LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29
(POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
“rescan LUN/target” maps to sense key UNIT ATTENTION, asc 0x3f,
ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
The preferred way to detect transport reset is always to use
events, because sense codes are only seen by the driver when it
sends a SCSI command to the logical unit or target. However, in
case events are dropped, the initiator will still be able to
synchronize with the actual state of the controller if the
driver asks the initiator to rescan of the SCSI bus. During the
rescan, the initiator will be able to observe the above sense
codes, and it will process them as if it the driver had
received the equivalent event.
Asynchronous notification
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
struct virtio_scsi_event_an {
// Write-only part
u32 event;
u8 lun[8];
u32 reason;
}
By sending this event, the device signals that an asynchronous
event was fired from a physical interface.
All fields are written by the device. The event field is set to
VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
unit in the SCSI host. The reason field is a subset of the
events that the driver has subscribed to via the "Asynchronous
notification subscription" command.
When dropped events are reported, the driver should poll for
asynchronous events manually using SCSI commands.
Appendix X: virtio-mmio
Virtual environments without PCI support (a common situation in
embedded devices models) might use simple memory mapped device (“
virtio-mmio”) instead of the PCI device.
The memory mapped virtio device behaviour is based on the PCI
device specification. Therefore most of operations like device
initialization, queues configuration and buffer transfers are
nearly identical. Existing differences are described in the
following sections.
Device Initialization
Instead of using the PCI IO space for virtio header, the “
virtio-mmio” device provides a set of memory mapped control
registers, all 32 bits wide, followed by device-specific
configuration space. The following list presents their layout:
Offset from the device base address | Direction | Name
Description
0x000 | R | MagicValue
“virt” string.
0x004 | R | Version
Device version number. Currently must be 1.
0x008 | R | DeviceID
Virtio Subsystem Device ID (ie. 1 for network card).
0x00c | R | VendorID
Virtio Subsystem Vendor ID.
0x010 | R | HostFeatures
Flags representing features the device supports.
Reading from this register returns 32 consecutive flag bits,
first bit depending on the last value written to
HostFeaturesSel register. Access to this register returns bits HostFeaturesSel*32
to (HostFeaturesSel*32)+31
, eg. feature bits 0 to 31 if
HostFeaturesSel is set to 0 and features bits 32 to 63 if
HostFeaturesSel is set to 1. Also see [sub:Feature-Bits]
0x014 | W | HostFeaturesSel
Device (Host) features word selection.
Writing to this register selects a set of 32 device feature bits
accessible by reading from HostFeatures register. Device driver
must write a value to the HostFeaturesSel register before
reading from the HostFeatures register.
0x020 | W | GuestFeatures
Flags representing device features understood and activated by
the driver.
Writing to this register sets 32 consecutive flag bits, first
bit depending on the last value written to GuestFeaturesSel
register. Access to this register sets bits GuestFeaturesSel*32
to (GuestFeaturesSel*32)+31
, eg. feature bits 0 to 31 if
GuestFeaturesSel is set to 0 and features bits 32 to 63 if
GuestFeaturesSel is set to 1. Also see [sub:Feature-Bits]
0x024 | W | GuestFeaturesSel
Activated (Guest) features word selection.
Writing to this register selects a set of 32 activated feature
bits accessible by writing to the GuestFeatures register.
Device driver must write a value to the GuestFeaturesSel
register before writing to the GuestFeatures register.
0x028 | W | GuestPageSize
Guest page size.
Device driver must write the guest page size in bytes to the
register during initialization, before any queues are used.
This value must be a power of 2 and is used by the Host to
calculate Guest address of the first queue page (see QueuePFN).
0x030 | W | QueueSel
Virtual queue index (first queue is 0).
Writing to this register selects the virtual queue that the
following operations on QueueNum, QueueAlign and QueuePFN apply
to.
0x034 | R | QueueNumMax
Maximum virtual queue size.
Reading from the register returns the maximum size of the queue
the Host is ready to process or zero (0x0) if the queue is not
available. This applies to the queue selected by writing to
QueueSel and is allowed only when QueuePFN is set to zero
(0x0), so when the queue is not actively used.
0x038 | W | QueueNum
Virtual queue size.
Queue size is a number of elements in the queue, therefore size
of the descriptor table and both available and used rings.
Writing to this register notifies the Host what size of the
queue the Guest will use. This applies to the queue selected by
writing to QueueSel.
0x03c | W | QueueAlign
Used Ring alignment in the virtual queue.
Writing to this register notifies the Host about alignment
boundary of the Used Ring in bytes. This value must be a power
of 2 and applies to the queue selected by writing to QueueSel.
0x040 | RW | QueuePFN
Guest physical page number of the virtual queue.
Writing to this register notifies the host about location of the
virtual queue in the Guest's physical address space. This value
is the index number of a page starting with the queue
Descriptor Table. Value zero (0x0) means physical address zero
(0x00000000) and is illegal. When the Guest stops using the
queue it must write zero (0x0) to this register.
Reading from this register returns the currently used page
number of the queue, therefore a value other than zero (0x0)
means that the queue is in use.
Both read and write accesses apply to the queue selected by
writing to QueueSel.
0x050 | W | QueueNotify
Queue notifier.
Writing a queue index to this register notifies the Host that
there are new buffers to process in the queue.
0x60 | R | InterruptStatus
Interrupt status.
Reading from this register returns a bit mask of interrupts
asserted by the device. An interrupt is asserted if the
corresponding bit is set, ie. equals one (1).
Bit 0 | Used Ring Update
This interrupt is asserted when the Host has updated the Used
Ring in at least one of the active virtual queues.
Bit 1 | Configuration change
This interrupt is asserted when configuration of the device has
changed.
0x064 | W | InterruptACK
Interrupt acknowledge.
Writing to this register notifies the Host that the Guest
finished handling interrupts. Set bits in the value clear the
corresponding bits of the InterruptStatus register.
0x070 | RW | Status
Device status.
Reading from this register returns the current device status
flags.
Writing non-zero values to this register sets the status flags,
indicating the Guest progress. Writing zero (0x0) to this
register triggers a device reset.
Also see [sub:Device-Initialization-Sequence]
0x100+ | RW | Config
Device-specific configuration space starts at an offset 0x100
and is accessed with byte alignment. Its meaning and size
depends on the device and the driver.
Virtual queue size is a number of elements in the queue,
therefore size of the descriptor table and both available and
used rings.
The endianness of the registers follows the native endianness of
the Guest. Writing to registers described as “R” and reading from
registers described as “W” is not permitted and can cause
undefined behavior.
The device initialization is performed as described in [sub:Device-Initialization-Sequence]
with one exception: the Guest must notify the Host about its
page size, writing the size in bytes to GuestPageSize register
before the initialization is finished.
The memory mapped virtio devices generate single interrupt only,
therefore no special configuration is required.
Virtqueue Configuration
The virtual queue configuration is performed in a similar way to
the one described in [sec:Virtqueue-Configuration] with a few
additional operations:
Select the queue writing its index (first queue is 0) to the
QueueSel register.
Check if the queue is not already in use: read QueuePFN
register, returned value should be zero (0x0).
Read maximum queue size (number of elements) from the
QueueNumMax register. If the returned value is zero (0x0) the
queue is not available.
Allocate and zero the queue pages in contiguous virtual memory,
aligning the Used Ring to an optimal boundary (usually page
size). Size of the allocated queue may be smaller than or equal
to the maximum size returned by the Host.
Notify the Host about the queue size by writing the size to
QueueNum register.
Notify the Host about the used alignment by writing its value
in bytes to QueueAlign register.
Write the physical number of the first page of the queue to the
QueuePFN register.
The queue and the device are ready to begin normal operations
now.
Device Operation
The memory mapped virtio device behaves in the same way as
described in [sec:Device-Operation], with the following
exceptions:
The device is notified about new buffers available in a queue
by writing the queue index to register QueueNum instead of the
virtio header in PCI I/O space ([sub:Notifying-The-Device]).
The memory mapped virtio device is using single, dedicated
interrupt signal, which is raised when at least one of the
interrupts described in the InterruptStatus register
description is asserted. After receiving an interrupt, the
driver must read the InterruptStatus register to check what
caused the interrupt (see the register description). After the
interrupt is handled, the driver must acknowledge it by writing
a bit mask corresponding to the serviced interrupt to the
InterruptACK register.
...@@ -29,9 +29,6 @@ struct virtio_blk ...@@ -29,9 +29,6 @@ struct virtio_blk
/* The disk structure for the kernel. */ /* The disk structure for the kernel. */
struct gendisk *disk; struct gendisk *disk;
/* Request tracking. */
struct list_head reqs;
mempool_t *pool; mempool_t *pool;
/* Process context for config space updates */ /* Process context for config space updates */
...@@ -55,7 +52,6 @@ struct virtio_blk ...@@ -55,7 +52,6 @@ struct virtio_blk
struct virtblk_req struct virtblk_req
{ {
struct list_head list;
struct request *req; struct request *req;
struct virtio_blk_outhdr out_hdr; struct virtio_blk_outhdr out_hdr;
struct virtio_scsi_inhdr in_hdr; struct virtio_scsi_inhdr in_hdr;
...@@ -99,7 +95,6 @@ static void blk_done(struct virtqueue *vq) ...@@ -99,7 +95,6 @@ static void blk_done(struct virtqueue *vq)
} }
__blk_end_request_all(vbr->req, error); __blk_end_request_all(vbr->req, error);
list_del(&vbr->list);
mempool_free(vbr, vblk->pool); mempool_free(vbr, vblk->pool);
} }
/* In case queue is stopped waiting for more buffers. */ /* In case queue is stopped waiting for more buffers. */
...@@ -184,7 +179,6 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk, ...@@ -184,7 +179,6 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
return false; return false;
} }
list_add_tail(&vbr->list, &vblk->reqs);
return true; return true;
} }
...@@ -437,7 +431,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) ...@@ -437,7 +431,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
goto out_free_index; goto out_free_index;
} }
INIT_LIST_HEAD(&vblk->reqs);
spin_lock_init(&vblk->lock); spin_lock_init(&vblk->lock);
vblk->vdev = vdev; vblk->vdev = vdev;
vblk->sg_elems = sg_elems; vblk->sg_elems = sg_elems;
...@@ -583,21 +576,29 @@ static void __devexit virtblk_remove(struct virtio_device *vdev) ...@@ -583,21 +576,29 @@ static void __devexit virtblk_remove(struct virtio_device *vdev)
{ {
struct virtio_blk *vblk = vdev->priv; struct virtio_blk *vblk = vdev->priv;
int index = vblk->index; int index = vblk->index;
struct virtblk_req *vbr;
unsigned long flags;
/* Prevent config work handler from accessing the device. */ /* Prevent config work handler from accessing the device. */
mutex_lock(&vblk->config_lock); mutex_lock(&vblk->config_lock);
vblk->config_enable = false; vblk->config_enable = false;
mutex_unlock(&vblk->config_lock); mutex_unlock(&vblk->config_lock);
/* Nothing should be pending. */
BUG_ON(!list_empty(&vblk->reqs));
/* Stop all the virtqueues. */ /* Stop all the virtqueues. */
vdev->config->reset(vdev); vdev->config->reset(vdev);
flush_work(&vblk->config_work); flush_work(&vblk->config_work);
del_gendisk(vblk->disk); del_gendisk(vblk->disk);
/* Abort requests dispatched to driver. */
spin_lock_irqsave(&vblk->lock, flags);
while ((vbr = virtqueue_detach_unused_buf(vblk->vq))) {
__blk_end_request_all(vbr->req, -EIO);
mempool_free(vbr, vblk->pool);
}
spin_unlock_irqrestore(&vblk->lock, flags);
blk_cleanup_queue(vblk->disk->queue); blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk); put_disk(vblk->disk);
mempool_destroy(vblk->pool); mempool_destroy(vblk->pool);
......
...@@ -46,4 +46,15 @@ config VIRTIO_BALLOON ...@@ -46,4 +46,15 @@ config VIRTIO_BALLOON
If unsure, say N. If unsure, say N.
config VIRTIO_MMIO_CMDLINE_DEVICES
bool "Memory mapped virtio devices parameter parsing"
depends on VIRTIO_MMIO
---help---
Allow virtio-mmio devices instantiation via the kernel command line
or module parameters. Be aware that using incorrect parameters (base
address in particular) can crash your system - you have been warned.
See Documentation/kernel-parameters.txt for details.
If unsure, say 'N'.
endmenu endmenu
...@@ -2,9 +2,10 @@ ...@@ -2,9 +2,10 @@
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/virtio_config.h> #include <linux/virtio_config.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/idr.h>
/* Unique numbering for virtio devices. */ /* Unique numbering for virtio devices. */
static unsigned int dev_index; static DEFINE_IDA(virtio_index_ida);
static ssize_t device_show(struct device *_d, static ssize_t device_show(struct device *_d,
struct device_attribute *attr, char *buf) struct device_attribute *attr, char *buf)
...@@ -193,7 +194,11 @@ int register_virtio_device(struct virtio_device *dev) ...@@ -193,7 +194,11 @@ int register_virtio_device(struct virtio_device *dev)
dev->dev.bus = &virtio_bus; dev->dev.bus = &virtio_bus;
/* Assign a unique device index and hence name. */ /* Assign a unique device index and hence name. */
dev->index = dev_index++; err = ida_simple_get(&virtio_index_ida, 0, 0, GFP_KERNEL);
if (err < 0)
goto out;
dev->index = err;
dev_set_name(&dev->dev, "virtio%u", dev->index); dev_set_name(&dev->dev, "virtio%u", dev->index);
/* We always start by resetting the device, in case a previous /* We always start by resetting the device, in case a previous
...@@ -208,6 +213,7 @@ int register_virtio_device(struct virtio_device *dev) ...@@ -208,6 +213,7 @@ int register_virtio_device(struct virtio_device *dev)
/* device_register() causes the bus infrastructure to look for a /* device_register() causes the bus infrastructure to look for a
* matching driver. */ * matching driver. */
err = device_register(&dev->dev); err = device_register(&dev->dev);
out:
if (err) if (err)
add_status(dev, VIRTIO_CONFIG_S_FAILED); add_status(dev, VIRTIO_CONFIG_S_FAILED);
return err; return err;
...@@ -217,6 +223,7 @@ EXPORT_SYMBOL_GPL(register_virtio_device); ...@@ -217,6 +223,7 @@ EXPORT_SYMBOL_GPL(register_virtio_device);
void unregister_virtio_device(struct virtio_device *dev) void unregister_virtio_device(struct virtio_device *dev)
{ {
device_unregister(&dev->dev); device_unregister(&dev->dev);
ida_simple_remove(&virtio_index_ida, dev->index);
} }
EXPORT_SYMBOL_GPL(unregister_virtio_device); EXPORT_SYMBOL_GPL(unregister_virtio_device);
......
...@@ -381,21 +381,25 @@ static int virtballoon_probe(struct virtio_device *vdev) ...@@ -381,21 +381,25 @@ static int virtballoon_probe(struct virtio_device *vdev)
return err; return err;
} }
static void __devexit virtballoon_remove(struct virtio_device *vdev) static void remove_common(struct virtio_balloon *vb)
{ {
struct virtio_balloon *vb = vdev->priv;
kthread_stop(vb->thread);
/* There might be pages left in the balloon: free them. */ /* There might be pages left in the balloon: free them. */
while (vb->num_pages) while (vb->num_pages)
leak_balloon(vb, vb->num_pages); leak_balloon(vb, vb->num_pages);
update_balloon_size(vb); update_balloon_size(vb);
/* Now we reset the device so we can clean up the queues. */ /* Now we reset the device so we can clean up the queues. */
vdev->config->reset(vdev); vb->vdev->config->reset(vb->vdev);
vdev->config->del_vqs(vdev); vb->vdev->config->del_vqs(vb->vdev);
}
static void __devexit virtballoon_remove(struct virtio_device *vdev)
{
struct virtio_balloon *vb = vdev->priv;
kthread_stop(vb->thread);
remove_common(vb);
kfree(vb); kfree(vb);
} }
...@@ -409,17 +413,11 @@ static int virtballoon_freeze(struct virtio_device *vdev) ...@@ -409,17 +413,11 @@ static int virtballoon_freeze(struct virtio_device *vdev)
* function is called. * function is called.
*/ */
while (vb->num_pages) remove_common(vb);
leak_balloon(vb, vb->num_pages);
update_balloon_size(vb);
/* Ensure we don't get any more requests from the host */
vdev->config->reset(vdev);
vdev->config->del_vqs(vdev);
return 0; return 0;
} }
static int restore_common(struct virtio_device *vdev) static int virtballoon_restore(struct virtio_device *vdev)
{ {
struct virtio_balloon *vb = vdev->priv; struct virtio_balloon *vb = vdev->priv;
int ret; int ret;
...@@ -432,11 +430,6 @@ static int restore_common(struct virtio_device *vdev) ...@@ -432,11 +430,6 @@ static int restore_common(struct virtio_device *vdev)
update_balloon_size(vb); update_balloon_size(vb);
return 0; return 0;
} }
static int virtballoon_restore(struct virtio_device *vdev)
{
return restore_common(vdev);
}
#endif #endif
static unsigned int features[] = { static unsigned int features[] = {
......
...@@ -6,6 +6,50 @@ ...@@ -6,6 +6,50 @@
* This module allows virtio devices to be used over a virtual, memory mapped * This module allows virtio devices to be used over a virtual, memory mapped
* platform device. * platform device.
* *
* The guest device(s) may be instantiated in one of three equivalent ways:
*
* 1. Static platform device in board's code, eg.:
*
* static struct platform_device v2m_virtio_device = {
* .name = "virtio-mmio",
* .id = -1,
* .num_resources = 2,
* .resource = (struct resource []) {
* {
* .start = 0x1001e000,
* .end = 0x1001e0ff,
* .flags = IORESOURCE_MEM,
* }, {
* .start = 42 + 32,
* .end = 42 + 32,
* .flags = IORESOURCE_IRQ,
* },
* }
* };
*
* 2. Device Tree node, eg.:
*
* virtio_block@1e000 {
* compatible = "virtio,mmio";
* reg = <0x1e000 0x100>;
* interrupts = <42>;
* }
*
* 3. Kernel module (or command line) parameter. Can be used more than once -
* one device will be created for each one. Syntax:
*
* [virtio_mmio.]device=<size>@<baseaddr>:<irq>[:<id>]
* where:
* <size> := size (can use standard suffixes like K, M or G)
* <baseaddr> := physical base address
* <irq> := interrupt number (as passed to request_irq())
* <id> := (optional) platform device id
* eg.:
* virtio_mmio.device=0x100@0x100b0000:48 \
* virtio_mmio.device=1K@0x1001e000:74
*
*
*
* Registers layout (all 32-bit wide): * Registers layout (all 32-bit wide):
* *
* offset d. name description * offset d. name description
...@@ -42,6 +86,8 @@ ...@@ -42,6 +86,8 @@
* See the COPYING file in the top-level directory. * See the COPYING file in the top-level directory.
*/ */
#define pr_fmt(fmt) "virtio-mmio: " fmt
#include <linux/highmem.h> #include <linux/highmem.h>
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/io.h> #include <linux/io.h>
...@@ -449,6 +495,122 @@ static int __devexit virtio_mmio_remove(struct platform_device *pdev) ...@@ -449,6 +495,122 @@ static int __devexit virtio_mmio_remove(struct platform_device *pdev)
/* Devices list parameter */
#if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
static struct device vm_cmdline_parent = {
.init_name = "virtio-mmio-cmdline",
};
static int vm_cmdline_parent_registered;
static int vm_cmdline_id;
static int vm_cmdline_set(const char *device,
const struct kernel_param *kp)
{
int err;
struct resource resources[2] = {};
char *str;
long long int base;
int processed, consumed = 0;
struct platform_device *pdev;
resources[0].flags = IORESOURCE_MEM;
resources[1].flags = IORESOURCE_IRQ;
resources[0].end = memparse(device, &str) - 1;
processed = sscanf(str, "@%lli:%u%n:%d%n",
&base, &resources[1].start, &consumed,
&vm_cmdline_id, &consumed);
if (processed < 2 || processed > 3 || str[consumed])
return -EINVAL;
resources[0].start = base;
resources[0].end += base;
resources[1].end = resources[1].start;
if (!vm_cmdline_parent_registered) {
err = device_register(&vm_cmdline_parent);
if (err) {
pr_err("Failed to register parent device!\n");
return err;
}
vm_cmdline_parent_registered = 1;
}
pr_info("Registering device virtio-mmio.%d at 0x%llx-0x%llx, IRQ %d.\n",
vm_cmdline_id,
(unsigned long long)resources[0].start,
(unsigned long long)resources[0].end,
(int)resources[1].start);
pdev = platform_device_register_resndata(&vm_cmdline_parent,
"virtio-mmio", vm_cmdline_id++,
resources, ARRAY_SIZE(resources), NULL, 0);
if (IS_ERR(pdev))
return PTR_ERR(pdev);
return 0;
}
static int vm_cmdline_get_device(struct device *dev, void *data)
{
char *buffer = data;
unsigned int len = strlen(buffer);
struct platform_device *pdev = to_platform_device(dev);
snprintf(buffer + len, PAGE_SIZE - len, "0x%llx@0x%llx:%llu:%d\n",
pdev->resource[0].end - pdev->resource[0].start + 1ULL,
(unsigned long long)pdev->resource[0].start,
(unsigned long long)pdev->resource[1].start,
pdev->id);
return 0;
}
static int vm_cmdline_get(char *buffer, const struct kernel_param *kp)
{
buffer[0] = '\0';
device_for_each_child(&vm_cmdline_parent, buffer,
vm_cmdline_get_device);
return strlen(buffer) + 1;
}
static struct kernel_param_ops vm_cmdline_param_ops = {
.set = vm_cmdline_set,
.get = vm_cmdline_get,
};
device_param_cb(device, &vm_cmdline_param_ops, NULL, S_IRUSR);
static int vm_unregister_cmdline_device(struct device *dev,
void *data)
{
platform_device_unregister(to_platform_device(dev));
return 0;
}
static void vm_unregister_cmdline_devices(void)
{
if (vm_cmdline_parent_registered) {
device_for_each_child(&vm_cmdline_parent, NULL,
vm_unregister_cmdline_device);
device_unregister(&vm_cmdline_parent);
vm_cmdline_parent_registered = 0;
}
}
#else
static void vm_unregister_cmdline_devices(void)
{
}
#endif
/* Platform driver */ /* Platform driver */
static struct of_device_id virtio_mmio_match[] = { static struct of_device_id virtio_mmio_match[] = {
...@@ -475,6 +637,7 @@ static int __init virtio_mmio_init(void) ...@@ -475,6 +637,7 @@ static int __init virtio_mmio_init(void)
static void __exit virtio_mmio_exit(void) static void __exit virtio_mmio_exit(void)
{ {
platform_driver_unregister(&virtio_mmio_driver); platform_driver_unregister(&virtio_mmio_driver);
vm_unregister_cmdline_devices();
} }
module_init(virtio_mmio_init); module_init(virtio_mmio_init);
......
...@@ -74,15 +74,6 @@ ...@@ -74,15 +74,6 @@
* @set_status: write the status byte * @set_status: write the status byte
* vdev: the virtio_device * vdev: the virtio_device
* status: the new status byte * status: the new status byte
* @request_vqs: request the specified number of virtqueues
* vdev: the virtio_device
* max_vqs: the max number of virtqueues we want
* If supplied, must call before any virtqueues are instantiated.
* To modify the max number of virtqueues after request_vqs has been
* called, call free_vqs and then request_vqs with a new value.
* @free_vqs: cleanup resources allocated by request_vqs
* vdev: the virtio_device
* If supplied, must call after all virtqueues have been deleted.
* @reset: reset the device * @reset: reset the device
* vdev: the virtio device * vdev: the virtio device
* After this, status and feature negotiation must be done again * After this, status and feature negotiation must be done again
...@@ -156,7 +147,7 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev, ...@@ -156,7 +147,7 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
* @vdev: the virtio device * @vdev: the virtio device
* @fbit: the feature bit * @fbit: the feature bit
* @offset: the type to search for. * @offset: the type to search for.
* @val: a pointer to the value to fill in. * @v: a pointer to the value to fill in.
* *
* The return value is -ENOENT if the feature doesn't exist. Otherwise * The return value is -ENOENT if the feature doesn't exist. Otherwise
* the config value is copied into whatever is pointed to by v. */ * the config value is copied into whatever is pointed to by v. */
......
...@@ -615,7 +615,8 @@ static void p9_virtio_remove(struct virtio_device *vdev) ...@@ -615,7 +615,8 @@ static void p9_virtio_remove(struct virtio_device *vdev)
{ {
struct virtio_chan *chan = vdev->priv; struct virtio_chan *chan = vdev->priv;
BUG_ON(chan->inuse); if (chan->inuse)
p9_virtio_close(chan->client);
vdev->config->del_vqs(vdev); vdev->config->del_vqs(vdev);
mutex_lock(&virtio_9p_lock); mutex_lock(&virtio_9p_lock);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment