Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
L
linux
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
linux
Commits
33950c6e
Commit
33950c6e
authored
May 22, 2012
by
Rusty Russell
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
virtio: update documentation to v0.9.5 of spec
Signed-off-by:
Rusty Russell
<
rusty@rustcorp.com.au
>
parent
76e10d15
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1087 additions
and
77 deletions
+1087
-77
Documentation/virtual/virtio-spec.txt
Documentation/virtual/virtio-spec.txt
+1087
-77
No files found.
Documentation/virtual/virtio-spec.txt
View file @
33950c6e
[Generated file: see http://ozlabs.org/~rusty/virtio-spec/]
Virtio PCI Card Specification
v0.9.
1
DRAFT
v0.9.
5
DRAFT
-
Rusty Russell <rusty@rustcorp.com.au>IBM Corporation (Editor)
Rusty Russell <rusty@rustcorp.com.au>
IBM Corporation (Editor)
201
1 August 1
.
201
2 May 7
.
Purpose and Description
...
...
@@ -68,11 +68,11 @@ and consists of three parts:
+-------------------+-----------------------------------+-----------+
When the driver wants to send
buffers to the device, it puts them
in one or more slots in the descriptor table, and writes the
descriptor indices into the available ring. It then notifies the
device. When the device has finished with the buffers, it writes
the descriptors
into the used ring, and sends an interrupt.
When the driver wants to send
a buffer to the device, it fills in
a slot in the descriptor table (or chains several together), and
writes the descriptor index into the available ring. It then
notifies the device. When the device has finished a buffer, it
writes the descriptor
into the used ring, and sends an interrupt.
Specification
...
...
@@ -106,7 +106,13 @@ for informational purposes by the guest).
+----------------------+--------------------+---------------+
| 6 | ioMemory | - |
+----------------------+--------------------+---------------+
| 7 | rpmsg | Appendix H |
+----------------------+--------------------+---------------+
| 8 | SCSI host | Appendix I |
+----------------------+--------------------+---------------+
| 9 | 9P transport | - |
+----------------------+--------------------+---------------+
| 10 | mac80211 wlan | - |
+----------------------+--------------------+---------------+
...
...
@@ -127,7 +133,7 @@ Note that this is possible because while the virtio header is PCI
the native endian of the guest (where such distinction is
applicable).
Device Initialization Sequence
Device Initialization Sequence
<sub:Device-Initialization-Sequence>
We start with an overview of device initialization, then expand
on the details of the device and how each step is preformed.
...
...
@@ -177,7 +183,10 @@ The virtio header looks as follows:
If MSI-X is enabled for the device, two additional fields
immediately follow this header:
immediately follow this header:[footnote:
ie. once you enable MSI-X on the device, the other fields move.
If you turn it off again, they move back!
]
+------------++----------------+--------+
...
...
@@ -191,20 +200,6 @@ immediately follow this header:
+------------++----------------+--------+
Finally, if feature bits (VIRTIO_F_FEATURES_HI) this is
immediately followed by two additional fields:
+------------++----------------------+----------------------
| Bits || 32 | 32
+------------++----------------------+----------------------
| Read/Write || R | R+W
+------------++----------------------+----------------------
| Purpose || Device | Guest
| || Features bits 32:63 | Features bits 32:63
+------------++----------------------+----------------------
Immediately following these general headers, there may be
device-specific headers:
...
...
@@ -238,31 +233,25 @@ at least one bit should be set:
may be a significant (or infinite) delay before setting this
bit.
DRIVER_OK (
3
) Indicates that the driver is set up and ready to
DRIVER_OK (
4
) Indicates that the driver is set up and ready to
drive the device.
FAILED (8) Indicates that something went wrong in the guest,
FAILED (
12
8) Indicates that something went wrong in the guest,
and it has given up on the device. This could be an internal
error, or the driver didn't like the device for some reason, or
even a fatal error during device operation. The device must be
reset before attempting to re-initialize.
Feature Bits
Feature Bits
<sub:Feature-Bits>
The least significant 31 bits of the first configuration field
indicates the features that the device supports (the high bit is
reserved, and will be used to indicate the presence of future
feature bits elsewhere). If more than 31 feature bits are
supported, the device indicates so by setting feature bit 31 (see
[cha:Reserved-Feature-Bits]). The bits are allocated as follows:
Thefirst configuration field indicates the features that the
device supports. The bits are allocated as follows:
0 to 23 Feature bits for the specific device type
24 to
40
Feature bits reserved for extensions to the queue and
24 to
32
Feature bits reserved for extensions to the queue and
feature negotiation mechanisms
41 to 63 Feature bits reserved for future extensions
For example, feature bit 0 for a network device (i.e. Subsystem
Device ID 1) indicates that the device supports checksumming of
packets.
...
...
@@ -286,10 +275,6 @@ will not see that feature bit in the Device Features field and
can go into backwards compatibility mode (or, for poor
implementations, set the FAILED Device Status bit).
Access to feature bits 32 to 63 is enabled by Guest by setting
feature bit 31. If this bit is unset, Device must assume that all
feature bits > 31 are unset.
Configuration/Queue Vectors
When MSI-X capability is present and enabled in the device
...
...
@@ -324,7 +309,7 @@ success, the previously written value is returned, and on
failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver can retry mapping with fewervectors, or disable MSI-X.
Virtqueue Configuration
Virtqueue Configuration
<sec:Virtqueue-Configuration>
As a device can have zero or more virtqueues for bulk data
transport (for example, the network driver has two), the driver
...
...
@@ -587,7 +572,7 @@ and Red Hat under the (3-clause) BSD license so that it can be
freely used by all other projects, and is reproduced (with slight
variation to remove Linux assumptions) in Appendix A.
Device Operation
Device Operation
<sec:Device-Operation>
There are two parts to device operation: supplying new buffers to
the device, and processing used buffers from the device. As an
...
...
@@ -813,7 +798,7 @@ vring.used->ring[vq->last_seen_used%vsz];
}
Dealing With Configuration Changes
Dealing With Configuration Changes
<sub:Dealing-With-Configuration>
Some virtio PCI devices can change the device configuration
state, as reflected in the virtio header in the PCI configuration
...
...
@@ -1260,18 +1245,6 @@ Currently there are five device-independent feature bits defined:
driver should ignore the used_event field; the device should
ignore the avail_event field; the flags field is used
VIRTIO_F_BAD_FEATURE(30) This feature should never be
negotiated by the guest; doing so is an indication that the
guest is faulty[footnote:
An experimental virtio PCI driver contained in Linux version
2.6.25 had this problem, and this feature bit can be used to
detect it.
]
VIRTIO_F_FEATURES_HIGH(31) This feature indicates that the
device supports feature bits 32:63. If unset, feature bits
32:63 are unset.
Appendix C: Network Device
The virtio network device is a virtual ethernet card, and is the
...
...
@@ -1335,11 +1308,17 @@ were required.
VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering.
VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous
packets.
Device configuration layout Two configuration fields are
currently defined. The mac address field always exists (though
is only valid if VIRTIO_NET_F_MAC is set), and the status field
only exists if VIRTIO_NET_F_STATUS is set. Only one bit is
currently defined for the status field: VIRTIO_NET_S_LINK_UP. #define VIRTIO_NET_S_LINK_UP 1
only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits
are currently defined for the status field:
VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE. #define VIRTIO_NET_S_LINK_UP 1
#define VIRTIO_NET_S_ANNOUNCE 2
...
...
@@ -1377,12 +1356,19 @@ struct virtio_net_config {
packets by negotating the VIRTIO_NET_F_CSUM feature. This “
checksum offload” is a common feature on modern network cards.
If that feature is negotiated, a driver can use TCP or UDP
segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4
(IPv4 TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and
VIRTIO_NET_F_HOST_UFO (UDP fragmentation) features. It should
not send TCP packets requiring segmentation offload which have
the Explicit Congestion Notification bit set, unless the
If that feature is negotiated[footnote:
ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
features must offer the checksum feature, and a driver which
accepts the offload features must accept the checksum feature.
Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
depending on VIRTIO_NET_F_GUEST_CSUM.
], a driver can use TCP or UDP segmentation offload by
negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 TCP),
VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
(UDP fragmentation) features. It should not send TCP packets
requiring segmentation offload which have the Explicit
Congestion Notification bit set, unless the
VIRTIO_NET_F_HOST_ECN feature is negotiated.[footnote:
This is a common restriction in real, older network cards.
]
...
...
@@ -1403,7 +1389,7 @@ segmentation, if both guests are amenable.
Packets are transmitted by placing them in the transmitq, and
buffers for incoming packets are placed in the receiveq. In each
case, the packet itself is preceded by a header:
case, the packet itself is prece
e
ded by a header:
struct virtio_net_hdr {
...
...
@@ -1462,9 +1448,10 @@ It will have a 14 byte ethernet header and 20 byte IP header
followed by the TCP header (with the TCP checksum field 16 bytes
into that header). csum_start will be 14+20 = 34 (the TCP
checksum includes the header), and csum_offset will be 16. The
value in the TCP checksum field will be the sum of the TCP pseudo
header, so that replacing it by the ones' complement checksum of
the TCP header and body will give the correct result.
value in the TCP checksum field should be initialized to the sum
of the TCP pseudo header, so that replacing it by the ones'
complement checksum of the TCP header and body will give the
correct result.
]
<enu:If-the-driver>If the driver negotiated
...
...
@@ -1483,8 +1470,8 @@ Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size.
]
gso_size is the
size of the packet beyond that header (ie.
MSS).
gso_size is the
maximum size of each packet beyond that header
(ie.
MSS).
If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the
VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as well,
...
...
@@ -1567,7 +1554,9 @@ Processing packet involves:
If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
negotiated, then the “gso_type” may be something other than
VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the
desired MSS (see [enu:If-the-driver]).Control Virtqueue
desired MSS (see [enu:If-the-driver]).
Control Virtqueue
The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is
negotiated) to send commands to manipulate various features of
...
...
@@ -1642,7 +1631,7 @@ struct virtio_net_ctrl_mac {
The device can filter incoming packets by any number of
destination MAC addresses.[footnote:
Since there are no guar
a
ntees, it can use a hash filter
Since there are no guar
e
ntees, it can use a hash filter
orsilently switch to allmulti or promiscuous mode if it is given
too many addresses.
] This table is set using the class VIRTIO_NET_CTRL_MAC and the
...
...
@@ -1665,6 +1654,38 @@ can control a VLAN filter table in the device.
Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
command take a 16-bit VLAN id as the command-specific-data.
Gratuitous Packet Sending
If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous
packets; this is usually done after the guest has been physically
migrated, and needs to announce its presence on the new network
links. (As hypervisor does not have the knowledge of guest
network configuration (eg. tagged vlan) it is simplest to prod
the guest in this way).
#define VIRTIO_NET_CTRL_ANNOUNCE 3
#define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status
field when it notices the changes of device configuration. The
command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
driver has recevied the notification and device would clear the
VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
this command.
Processing this notification involves:
Sending the gratuitous packets or marking there are pending
gratuitous packets to be sent and letting deferred routine to
send them.
Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
vq.
.
Appendix D: Block Device
The virtio block device is a simple virtual block device (ie.
...
...
@@ -1699,8 +1720,6 @@ device except where noted.
VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
Device configuration layout The capacity of the device
(expressed in 512-byte sectors) is always present. The
availability of the others all depend on various feature bits
...
...
@@ -1743,8 +1762,6 @@ device except where noted.
If the VIRTIO_BLK_F_RO feature is set by the device, any write
requests will fail.
Device Operation
The driver queues requests to the virtqueue, and they are used by
...
...
@@ -1805,7 +1822,7 @@ the FLUSH and FLUSH_OUT types are equivalent, the device does not
distinguish between them
]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit
(VIRTIO_BLK_T_BARRIER) indicates that this request acts as a
barrier and that all preceding requests must be complete before
barrier and that all prece
e
ding requests must be complete before
this one, and all following requests must not be started until
this is complete. Note that a barrier does not flush caches in
the underlying backend device in host, and thus does not serve as
...
...
@@ -2118,7 +2135,7 @@ This is historical, and independent of the guest page size
Otherwise, the guest may begin to re-use pages previously given
to the balloon before the device has acknowledged their
withdraw
a
l. [footnote:
withdrawl. [footnote:
In this case, deflation advice is merely a courtesy
]
...
...
@@ -2198,3 +2215,996 @@ as follows:
VIRTIO_BALLOON_S_MEMTOT The total amount of memory available
(in bytes).
Appendix H: Rpmsg: Remote Processor Messaging
Virtio rpmsg devices represent remote processors on the system
which run in asymmetric multi-processing (AMP) configuration, and
which are usually used to offload cpu-intensive tasks from the
main application processor (a typical SoC methodology).
Virtio is being used to communicate with those remote processors;
empty buffers are placed in one virtqueue for receiving messages,
and non-empty buffers, containing outbound messages, are enqueued
in a second virtqueue for transmission.
Numerous communication channels can be multiplexed over those two
virtqueues, so different entities, running on the application and
remote processor, can directly communicate in a point-to-point
fashion.
Configuration
Subsystem Device ID 7
Virtqueues 0:receiveq. 1:transmitq.
Feature bits
VIRTIO_RPMSG_F_NS (0) Device sends (and capable of receiving)
name service messages announcing the creation (or
destruction) of a channel:/**
* struct rpmsg_ns_msg - dynamic name service announcement
message
* @name: name of remote service that is published
* @addr: address of remote service that is published
* @flags: indicates whether service is created or destroyed
*
* This message is sent across to publish a new service (or
announce
* about its removal). When we receives these messages, an
appropriate
* rpmsg channel (i.e device) is created/destroyed.
*/
struct rpmsg_ns_msgoon_config {
char name[RPMSG_NAME_SIZE];
u32 addr;
u32 flags;
} __packed;
/**
* enum rpmsg_ns_flags - dynamic name service announcement flags
*
* @RPMSG_NS_CREATE: a new remote service was just created
* @RPMSG_NS_DESTROY: a remote service was just destroyed
*/
enum rpmsg_ns_flags {
RPMSG_NS_CREATE = 0,
RPMSG_NS_DESTROY = 1,
};
Device configuration layout
At his point none currently defined.
Device Initialization
The initialization routine should identify the receive and
transmission virtqueues.
The receive virtqueue should be filled with receive buffers.
Device Operation
Messages are transmitted by placing them in the transmitq, and
buffers for inbound messages are placed in the receiveq. In any
case, messages are always preceded by the following header: /**
* struct rpmsg_hdr - common header for all rpmsg messages
* @src: source address
* @dst: destination address
* @reserved: reserved for future use
* @len: length of payload (in bytes)
* @flags: message flags
* @data: @len bytes of message payload data
*
* Every message sent(/received) on the rpmsg bus begins with
this header.
*/
struct rpmsg_hdr {
u32 src;
u32 dst;
u32 reserved;
u16 len;
u16 flags;
u8 data[0];
} __packed;
Appendix I: SCSI Host Device
The virtio SCSI host device groups together one or more virtual
logical units (such as disks), and allows communicating to them
using the SCSI protocol. An instance of the device represents a
SCSI host to which many targets and LUNs are attached.
The virtio SCSI device services two kinds of requests:
command requests for a logical unit;
task management functions related to a logical unit, target or
command.
The device is also able to send out notifications about added and
removed logical units. Together, these capabilities provide a
SCSI transport protocol that uses virtqueues as the transfer
medium. In the transport protocol, the virtio driver acts as the
initiator, while the virtio SCSI host provides one or more
targets that receive and process the requests.
Configuration
Subsystem Device ID 8
Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
Feature bits
VIRTIO_SCSI_F_INOUT (0) A single request can include both
read-only and write-only data buffers.
VIRTIO_SCSI_F_HOTPLUG (1) The host should enable
hot-plug/hot-unplug of new LUNs and targets on the SCSI bus.
Device configuration layout All fields of this configuration
are always available. sense_size and cdb_size are writable by
the guest.struct virtio_scsi_config {
u32 num_queues;
u32 seg_max;
u32 max_sectors;
u32 cmd_per_lun;
u32 event_info_size;
u32 sense_size;
u32 cdb_size;
u16 max_channel;
u16 max_target;
u32 max_lun;
};
num_queues is the total number of request virtqueues exposed by
the device. The driver is free to use only one request queue,
or it can use more to achieve better performance.
seg_max is the maximum number of segments that can be in a
command. A bidirectional command can include seg_max input
segments and seg_max output segments.
max_sectors is a hint to the guest about the maximum transfer
size it should use.
cmd_per_lun is a hint to the guest about the maximum number of
linked commands it should send to one LUN. The actual value
to be used is the minimum of cmd_per_lun and the virtqueue
size.
event_info_size is the maximum size that the device will fill
for buffers that the driver places in the eventq. The driver
should always put buffers at least of this size. It is
written by the device depending on the set of negotated
features.
sense_size is the maximum size of the sense data that the
device will write. The default value is written by the device
and will always be 96, but the driver can modify it. It is
restored to the default when the device is reset.
cdb_size is the maximum size of the CDB that the driver will
write. The default value is written by the device and will
always be 32, but the driver can likewise modify it. It is
restored to the default when the device is reset.
max_channel, max_target and max_lun can be used by the driver
as hints to constrain scanning the logical units on the
host.h
Device Initialization
The initialization routine should first of all discover the
device's virtqueues.
If the driver uses the eventq, it should then place at least a
buffer in the eventq.
The driver can immediately issue requests (for example, INQUIRY
or REPORT LUNS) or task management functions (for example, I_T
RESET).
Device Operation: request queues
The driver queues requests to an arbitrary request queue, and
they are used by the device on that same queue. It is the
responsibility of the driver to ensure strict request ordering
for commands placed on different queues, because they will be
consumed with no order constraints.
Requests have the following format:
struct virtio_scsi_req_cmd {
// Read-only
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
// Write-only part
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};
/* command-specific response values */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_OVERRUN 1
#define VIRTIO_SCSI_S_ABORTED 2
#define VIRTIO_SCSI_S_BAD_TARGET 3
#define VIRTIO_SCSI_S_RESET 4
#define VIRTIO_SCSI_S_BUSY 5
#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
#define VIRTIO_SCSI_S_TARGET_FAILURE 7
#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
#define VIRTIO_SCSI_S_FAILURE 9
/* task_attr */
#define VIRTIO_SCSI_S_SIMPLE 0
#define VIRTIO_SCSI_S_ORDERED 1
#define VIRTIO_SCSI_S_HEAD 2
#define VIRTIO_SCSI_S_ACA 3
The lun field addresses a target and logical unit in the
virtio-scsi device's SCSI domain. The only supported format for
the LUN field is: first byte set to 1, second byte set to target,
third and fourth byte representing a single level LUN structure,
followed by four zero bytes. With this representation, a
virtio-scsi device can serve up to 256 targets and 16384 LUNs per
target.
The id field is the command identifier (“tag”).
task_attr, prio and crn should be left to zero. task_attr defines
the task attribute as in the table above, but all task attributes
may be mapped to SIMPLE by the device; crn may also be provided
by clients, but is generally expected to be 0. The maximum CRN
value defined by the protocol is 255, since CRN is stored in an
8-bit integer.
All of these fields are defined in SAM. They are always
read-only, as are the cdb and dataout field. The cdb_size is
taken from the configuration space.
sense and subsequent fields are always write-only. The sense_len
field indicates the number of bytes actually written to the sense
buffer. The residual field indicates the residual size,
calculated as “data_length - number_of_transferred_bytes”, for
read or write operations. For bidirectional commands, the
number_of_transferred_bytes includes both read and written bytes.
A residual field that is less than the size of datain means that
the dataout field was processed entirely. A residual field that
exceeds the size of datain means that the dataout field was
processed partially and the datain field was not processed at
all.
The status byte is written by the device to be the status code as
defined in SAM.
The response byte is written by the device to be one of the
following:
VIRTIO_SCSI_S_OK when the request was completed and the status
byte is filled with a SCSI status code (not necessarily
"GOOD").
VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires
transferring more data than is available in the data buffers.
VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
ABORT TASK or ABORT TASK SET task management function.
VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
because the target indicated by the lun field does not exist.
VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
or device reset (including a task management function).
VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
problem in the connection between the host and the target
(severed link).
VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
failure and the guest should not retry on other paths.
VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
but retrying on other paths might yield a different result.
VIRTIO_SCSI_S_BUSY if the request failed but retrying on the
same path should work.
VIRTIO_SCSI_S_FAILURE for other host or guest error. In
particular, if neither dataout nor datain is empty, and the
VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
request will be immediately returned with a response equal to
VIRTIO_SCSI_S_FAILURE.
Device Operation: controlq
The controlq is used for other SCSI transport operations.
Requests have the following format:
struct virtio_scsi_ctrl {
u32 type;
...
u8 response;
};
/* response values valid for all commands */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_BAD_TARGET 3
#define VIRTIO_SCSI_S_BUSY 5
#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
#define VIRTIO_SCSI_S_TARGET_FAILURE 7
#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
#define VIRTIO_SCSI_S_FAILURE 9
#define VIRTIO_SCSI_S_INCORRECT_LUN 12
The type identifies the remaining fields.
The following commands are defined:
Task management function
#define VIRTIO_SCSI_T_TMF 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
#define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
struct virtio_scsi_ctrl_tmf
{
// Read-only part
u32 type;
u32 subtype;
u8 lun[8];
u64 id;
// Write-only part
u8 response;
}
/* command-specific response values */
#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10
#define VIRTIO_SCSI_S_FUNCTION_REJECTED 11
The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
fields except response are filled by the driver. The subtype
field must always be specified and identifies the requested
task management function.
Other fields may be irrelevant for the requested TMF; if so,
they are ignored but they should still be present. The lun
field is in the same format specified for request queues; the
single level LUN is ignored when the task management function
addresses a whole I_T nexus. When relevant, the value of the id
field is matched against the id values passed on the requestq.
The outcome of the task management function is written by the
device in the response field. The command-specific response
values map 1-to-1 with those defined in SAM.
Asynchronous notification query
#define VIRTIO_SCSI_T_AN_QUERY 1
struct virtio_scsi_ctrl_an {
// Read-only part
u32 type;
u8 lun[8];
u32 event_requested;
// Write-only part
u32 event_actual;
u8 response;
}
#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
By sending this command, the driver asks the device which
events the given LUN can report, as described in paragraphs 6.6
and A.6 of the SCSI MMC specification. The driver writes the
events it is interested in into the event_requested; the device
responds by writing the events that it supports into
event_actual.
The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
fields are written by the driver. The event_actual and response
fields are written by the device.
No command-specific values are defined for the response byte.
Asynchronous notification subscription
#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
struct virtio_scsi_ctrl_an {
// Read-only part
u32 type;
u8 lun[8];
u32 event_requested;
// Write-only part
u32 event_actual;
u8 response;
}
By sending this command, the driver asks the specified LUN to
report events for its physical interface, again as described in
the SCSI MMC specification. The driver writes the events it is
interested in into the event_requested; the device responds by
writing the events that it supports into event_actual.
Event types are the same as for the asynchronous notification
query message.
The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
event_requested fields are written by the driver. The
event_actual and response fields are written by the device.
No command-specific values are defined for the response byte.
Device Operation: eventq
The eventq is used by the device to report information on logical
units that are attached to it. The driver should always leave a
few buffers ready in the eventq. In general, the device will not
queue events to cope with an empty eventq, and will end up
dropping events if it finds no buffer ready. However, when
reporting events for many LUNs (e.g. when a whole target
disappears), the device can throttle events to avoid dropping
them. For this reason, placing 10-15 buffers on the event queue
should be enough.
Buffers are placed in the eventq and filled by the device when
interesting events occur. The buffers should be strictly
write-only (device-filled) and the size of the buffers should be
at least the value given in the device's configuration
information.
Buffers returned by the device on the eventq will be referred to
as "events" in the rest of this section. Events have the
following format:
#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
struct virtio_scsi_event {
// Write-only part
u32 event;
...
}
If bit 31 is set in the event field, the device failed to report
an event due to missing buffers. In this case, the driver should
poll the logical units for unit attention conditions, and/or do
whatever form of bus scan is appropriate for the guest operating
system.
Other data that the device writes to the buffer depends on the
contents of the event field. The following events are defined:
No event
#define VIRTIO_SCSI_T_NO_EVENT 0
This event is fired in the following cases:
When the device detects in the eventq a buffer that is shorter
than what is indicated in the configuration field, it might
use it immediately and put this dummy value in the event
field. A well-written driver will never observe this
situation.
When events are dropped, the device may signal this event as
soon as the drivers makes a buffer available, in order to
request action from the driver. In this case, of course, this
event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
flag.
Transport reset
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
struct virtio_scsi_event_reset {
// Write-only part
u32 event;
u8 lun[8];
u32 reason;
}
#define VIRTIO_SCSI_EVT_RESET_HARD 0
#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
By sending this event, the device signals that a logical unit
on a target has been reset, including the case of a new device
appearing or disappearing on the bus.The device fills in all
fields. The event field is set to
VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
logical unit in the SCSI host.
The reason value is one of the three #define values appearing
above:
VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used if
the target or logical unit is no longer able to receive
commands.
VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the
logical unit has been reset, but is still present.
VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if a
target or logical unit has just appeared on the device.
The “removed” and “rescan” events, when sent for LUN 0, may
apply to the entire target. After receiving them the driver
should ask the initiator to rescan the target, in order to
detect the case when an entire target has appeared or
disappeared. These two events will never be reported unless the
VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host
and the guest.
Events will also be reported via sense codes (this obviously
does not apply to newly appeared buses or targets, since the
application has never discovered them):
“LUN/target removed” maps to sense key ILLEGAL REQUEST, asc
0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
“LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29
(POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
“rescan LUN/target” maps to sense key UNIT ATTENTION, asc 0x3f,
ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
The preferred way to detect transport reset is always to use
events, because sense codes are only seen by the driver when it
sends a SCSI command to the logical unit or target. However, in
case events are dropped, the initiator will still be able to
synchronize with the actual state of the controller if the
driver asks the initiator to rescan of the SCSI bus. During the
rescan, the initiator will be able to observe the above sense
codes, and it will process them as if it the driver had
received the equivalent event.
Asynchronous notification
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
struct virtio_scsi_event_an {
// Write-only part
u32 event;
u8 lun[8];
u32 reason;
}
By sending this event, the device signals that an asynchronous
event was fired from a physical interface.
All fields are written by the device. The event field is set to
VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
unit in the SCSI host. The reason field is a subset of the
events that the driver has subscribed to via the "Asynchronous
notification subscription" command.
When dropped events are reported, the driver should poll for
asynchronous events manually using SCSI commands.
Appendix X: virtio-mmio
Virtual environments without PCI support (a common situation in
embedded devices models) might use simple memory mapped device (“
virtio-mmio”) instead of the PCI device.
The memory mapped virtio device behaviour is based on the PCI
device specification. Therefore most of operations like device
initialization, queues configuration and buffer transfers are
nearly identical. Existing differences are described in the
following sections.
Device Initialization
Instead of using the PCI IO space for virtio header, the “
virtio-mmio” device provides a set of memory mapped control
registers, all 32 bits wide, followed by device-specific
configuration space. The following list presents their layout:
Offset from the device base address | Direction | Name
Description
0x000 | R | MagicValue
“virt” string.
0x004 | R | Version
Device version number. Currently must be 1.
0x008 | R | DeviceID
Virtio Subsystem Device ID (ie. 1 for network card).
0x00c | R | VendorID
Virtio Subsystem Vendor ID.
0x010 | R | HostFeatures
Flags representing features the device supports.
Reading from this register returns 32 consecutive flag bits,
first bit depending on the last value written to
HostFeaturesSel register. Access to this register returns bits HostFeaturesSel*32
to (HostFeaturesSel*32)+31
, eg. feature bits 0 to 31 if
HostFeaturesSel is set to 0 and features bits 32 to 63 if
HostFeaturesSel is set to 1. Also see [sub:Feature-Bits]
0x014 | W | HostFeaturesSel
Device (Host) features word selection.
Writing to this register selects a set of 32 device feature bits
accessible by reading from HostFeatures register. Device driver
must write a value to the HostFeaturesSel register before
reading from the HostFeatures register.
0x020 | W | GuestFeatures
Flags representing device features understood and activated by
the driver.
Writing to this register sets 32 consecutive flag bits, first
bit depending on the last value written to GuestFeaturesSel
register. Access to this register sets bits GuestFeaturesSel*32
to (GuestFeaturesSel*32)+31
, eg. feature bits 0 to 31 if
GuestFeaturesSel is set to 0 and features bits 32 to 63 if
GuestFeaturesSel is set to 1. Also see [sub:Feature-Bits]
0x024 | W | GuestFeaturesSel
Activated (Guest) features word selection.
Writing to this register selects a set of 32 activated feature
bits accessible by writing to the GuestFeatures register.
Device driver must write a value to the GuestFeaturesSel
register before writing to the GuestFeatures register.
0x028 | W | GuestPageSize
Guest page size.
Device driver must write the guest page size in bytes to the
register during initialization, before any queues are used.
This value must be a power of 2 and is used by the Host to
calculate Guest address of the first queue page (see QueuePFN).
0x030 | W | QueueSel
Virtual queue index (first queue is 0).
Writing to this register selects the virtual queue that the
following operations on QueueNum, QueueAlign and QueuePFN apply
to.
0x034 | R | QueueNumMax
Maximum virtual queue size.
Reading from the register returns the maximum size of the queue
the Host is ready to process or zero (0x0) if the queue is not
available. This applies to the queue selected by writing to
QueueSel and is allowed only when QueuePFN is set to zero
(0x0), so when the queue is not actively used.
0x038 | W | QueueNum
Virtual queue size.
Queue size is a number of elements in the queue, therefore size
of the descriptor table and both available and used rings.
Writing to this register notifies the Host what size of the
queue the Guest will use. This applies to the queue selected by
writing to QueueSel.
0x03c | W | QueueAlign
Used Ring alignment in the virtual queue.
Writing to this register notifies the Host about alignment
boundary of the Used Ring in bytes. This value must be a power
of 2 and applies to the queue selected by writing to QueueSel.
0x040 | RW | QueuePFN
Guest physical page number of the virtual queue.
Writing to this register notifies the host about location of the
virtual queue in the Guest's physical address space. This value
is the index number of a page starting with the queue
Descriptor Table. Value zero (0x0) means physical address zero
(0x00000000) and is illegal. When the Guest stops using the
queue it must write zero (0x0) to this register.
Reading from this register returns the currently used page
number of the queue, therefore a value other than zero (0x0)
means that the queue is in use.
Both read and write accesses apply to the queue selected by
writing to QueueSel.
0x050 | W | QueueNotify
Queue notifier.
Writing a queue index to this register notifies the Host that
there are new buffers to process in the queue.
0x60 | R | InterruptStatus
Interrupt status.
Reading from this register returns a bit mask of interrupts
asserted by the device. An interrupt is asserted if the
corresponding bit is set, ie. equals one (1).
Bit 0 | Used Ring Update
This interrupt is asserted when the Host has updated the Used
Ring in at least one of the active virtual queues.
Bit 1 | Configuration change
This interrupt is asserted when configuration of the device has
changed.
0x064 | W | InterruptACK
Interrupt acknowledge.
Writing to this register notifies the Host that the Guest
finished handling interrupts. Set bits in the value clear the
corresponding bits of the InterruptStatus register.
0x070 | RW | Status
Device status.
Reading from this register returns the current device status
flags.
Writing non-zero values to this register sets the status flags,
indicating the Guest progress. Writing zero (0x0) to this
register triggers a device reset.
Also see [sub:Device-Initialization-Sequence]
0x100+ | RW | Config
Device-specific configuration space starts at an offset 0x100
and is accessed with byte alignment. Its meaning and size
depends on the device and the driver.
Virtual queue size is a number of elements in the queue,
therefore size of the descriptor table and both available and
used rings.
The endianness of the registers follows the native endianness of
the Guest. Writing to registers described as “R” and reading from
registers described as “W” is not permitted and can cause
undefined behavior.
The device initialization is performed as described in [sub:Device-Initialization-Sequence]
with one exception: the Guest must notify the Host about its
page size, writing the size in bytes to GuestPageSize register
before the initialization is finished.
The memory mapped virtio devices generate single interrupt only,
therefore no special configuration is required.
Virtqueue Configuration
The virtual queue configuration is performed in a similar way to
the one described in [sec:Virtqueue-Configuration] with a few
additional operations:
Select the queue writing its index (first queue is 0) to the
QueueSel register.
Check if the queue is not already in use: read QueuePFN
register, returned value should be zero (0x0).
Read maximum queue size (number of elements) from the
QueueNumMax register. If the returned value is zero (0x0) the
queue is not available.
Allocate and zero the queue pages in contiguous virtual memory,
aligning the Used Ring to an optimal boundary (usually page
size). Size of the allocated queue may be smaller than or equal
to the maximum size returned by the Host.
Notify the Host about the queue size by writing the size to
QueueNum register.
Notify the Host about the used alignment by writing its value
in bytes to QueueAlign register.
Write the physical number of the first page of the queue to the
QueuePFN register.
The queue and the device are ready to begin normal operations
now.
Device Operation
The memory mapped virtio device behaves in the same way as
described in [sec:Device-Operation], with the following
exceptions:
The device is notified about new buffers available in a queue
by writing the queue index to register QueueNum instead of the
virtio header in PCI I/O space ([sub:Notifying-The-Device]).
The memory mapped virtio device is using single, dedicated
interrupt signal, which is raised when at least one of the
interrupts described in the InterruptStatus register
description is asserted. After receiving an interrupt, the
driver must read the InterruptStatus register to check what
caused the interrupt (see the register description). After the
interrupt is handled, the driver must acknowledge it by writing
a bit mask corresponding to the serviced interrupt to the
InterruptACK register.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment