aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-05-02 14:14:04 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2013-05-02 14:14:04 -0700
commit736a2dd2571ac56b11ed95a7814d838d5311be04 (patch)
treede10d107025970c6e51d5b6faeba799ed4b9caae
parent0b2e3b6bb4a415379f16e38fc92db42379be47a1 (diff)
parent01d779a14ef800b74684d9692add4944df052461 (diff)
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio & lguest updates from Rusty Russell: "Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now..." Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename. Hopefully correctly resolved. * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits) caif_virtio: Remove bouncing email addresses lguest: improve code readability in lg_cpu_start. virtio-net: fill only rx queues which are being used lguest: map Switcher below fixmap. lguest: cache last cpu we ran on. lguest: map Switcher text whenever we allocate a new pagetable. lguest: don't share Switcher PTE pages between guests. lguest: expost switcher_pages array (as lg_switcher_pages). lguest: extract shadow PTE walking / allocating. lguest: make check_gpte et. al return bool. lguest: assume Switcher text is a single page. lguest: rename switcher_page to switcher_pages. lguest: remove RESERVE_MEM constant. lguest: check vaddr not pgd for Switcher protection. lguest: prepare to make SWITCHER_ADDR a variable. virtio: console: replace EMFILE with EBUSY for already-open port virtio-scsi: reset virtqueue affinity when doing cpu hotplug virtio-scsi: introduce multiqueue support virtio-scsi: push vq lock/unlock into virtscsi_vq_done virtio-scsi: pass struct virtio_scsi to virtqueue completion function ...
-rw-r--r--Documentation/virtual/00-INDEX3
-rw-r--r--Documentation/virtual/virtio-spec.txt3210
-rw-r--r--MAINTAINERS1
-rw-r--r--arch/x86/include/asm/lguest.h17
-rw-r--r--block/blk-integrity.c2
-rw-r--r--block/blk-merge.c2
-rw-r--r--drivers/Makefile2
-rw-r--r--drivers/block/virtio_blk.c148
-rw-r--r--drivers/char/hw_random/virtio-rng.c2
-rw-r--r--drivers/char/virtio_console.c14
-rw-r--r--drivers/lguest/Kconfig5
-rw-r--r--drivers/lguest/core.c67
-rw-r--r--drivers/lguest/lg.h6
-rw-r--r--drivers/lguest/lguest_user.c6
-rw-r--r--drivers/lguest/page_tables.c567
-rw-r--r--drivers/lguest/x86/core.c7
-rw-r--r--drivers/net/caif/Kconfig14
-rw-r--r--drivers/net/caif/Makefile3
-rw-r--r--drivers/net/caif/caif_virtio.c790
-rw-r--r--drivers/net/virtio_net.c77
-rw-r--r--drivers/rpmsg/virtio_rpmsg_bus.c8
-rw-r--r--drivers/scsi/virtio_scsi.c487
-rw-r--r--drivers/vhost/Kconfig8
-rw-r--r--drivers/vhost/Makefile2
-rw-r--r--drivers/vhost/test.c4
-rw-r--r--drivers/vhost/vringh.c1007
-rw-r--r--drivers/virtio/virtio_balloon.c6
-rw-r--r--drivers/virtio/virtio_ring.c297
-rw-r--r--include/linux/scatterlist.h16
-rw-r--r--include/linux/virtio.h20
-rw-r--r--include/linux/virtio_caif.h24
-rw-r--r--include/linux/virtio_ring.h57
-rw-r--r--include/linux/vringh.h225
-rw-r--r--include/uapi/linux/virtio_balloon.h4
-rw-r--r--include/uapi/linux/virtio_ids.h1
-rw-r--r--net/9p/trans_virtio.c48
-rw-r--r--tools/lguest/lguest.txt2
-rw-r--r--tools/virtio/Makefile10
-rw-r--r--tools/virtio/asm/barrier.h14
-rw-r--r--tools/virtio/linux/bug.h10
-rw-r--r--tools/virtio/linux/err.h26
-rw-r--r--tools/virtio/linux/export.h5
-rw-r--r--tools/virtio/linux/irqreturn.h1
-rw-r--r--tools/virtio/linux/kernel.h112
-rw-r--r--tools/virtio/linux/module.h1
-rw-r--r--tools/virtio/linux/printk.h4
-rw-r--r--tools/virtio/linux/ratelimit.h4
-rw-r--r--tools/virtio/linux/scatterlist.h189
-rw-r--r--tools/virtio/linux/types.h28
-rw-r--r--tools/virtio/linux/uaccess.h50
-rw-r--r--tools/virtio/linux/uio.h3
-rw-r--r--tools/virtio/linux/virtio.h171
-rw-r--r--tools/virtio/linux/virtio_config.h6
-rw-r--r--tools/virtio/linux/virtio_ring.h1
-rw-r--r--tools/virtio/linux/vringh.h1
-rw-r--r--tools/virtio/uapi/linux/uio.h1
-rw-r--r--tools/virtio/uapi/linux/virtio_config.h1
-rw-r--r--tools/virtio/uapi/linux/virtio_ring.h4
-rw-r--r--tools/virtio/virtio_test.c13
-rw-r--r--tools/virtio/vringh_test.c741
60 files changed, 4481 insertions, 4074 deletions
diff --git a/Documentation/virtual/00-INDEX b/Documentation/virtual/00-INDEX
index 924bd462675..e952d30bbf0 100644
--- a/Documentation/virtual/00-INDEX
+++ b/Documentation/virtual/00-INDEX
@@ -6,6 +6,3 @@ kvm/
- Kernel Virtual Machine. See also http://linux-kvm.org
uml/
- User Mode Linux, builds/runs Linux kernel as a userspace program.
-virtio.txt
- - Text version of draft virtio spec.
- See http://ozlabs.org/~rusty/virtio-spec
diff --git a/Documentation/virtual/virtio-spec.txt b/Documentation/virtual/virtio-spec.txt
deleted file mode 100644
index eb094039b50..00000000000
--- a/Documentation/virtual/virtio-spec.txt
+++ /dev/null
@@ -1,3210 +0,0 @@
-[Generated file: see http://ozlabs.org/~rusty/virtio-spec/]
-Virtio PCI Card Specification
-v0.9.5 DRAFT
--
-
-Rusty Russell <rusty@rustcorp.com.au> IBM Corporation (Editor)
-
-2012 May 7.
-
-Purpose and Description
-
-This document describes the specifications of the “virtio” family
-of PCI[LaTeX Command: nomenclature] devices. These are devices
-are found in virtual environments[LaTeX Command: nomenclature],
-yet by design they are not all that different from physical PCI
-devices, and this document treats them as such. This allows the
-guest to use standard PCI drivers and discovery mechanisms.
-
-The purpose of virtio and this specification is that virtual
-environments and guests should have a straightforward, efficient,
-standard and extensible mechanism for virtual devices, rather
-than boutique per-environment or per-OS mechanisms.
-
- Straightforward: Virtio PCI devices use normal PCI mechanisms
- of interrupts and DMA which should be familiar to any device
- driver author. There is no exotic page-flipping or COW
- mechanism: it's just a PCI device.[footnote:
-This lack of page-sharing implies that the implementation of the
-device (e.g. the hypervisor or host) needs full access to the
-guest memory. Communication with untrusted parties (i.e.
-inter-guest communication) requires copying.
-]
-
- Efficient: Virtio PCI devices consist of rings of descriptors
- for input and output, which are neatly separated to avoid cache
- effects from both guest and device writing to the same cache
- lines.
-
- Standard: Virtio PCI makes no assumptions about the environment
- in which it operates, beyond supporting PCI. In fact the virtio
- devices specified in the appendices do not require PCI at all:
- they have been implemented on non-PCI buses.[footnote:
-The Linux implementation further separates the PCI virtio code
-from the specific virtio drivers: these drivers are shared with
-the non-PCI implementations (currently lguest and S/390).
-]
-
- Extensible: Virtio PCI devices contain feature bits which are
- acknowledged by the guest operating system during device setup.
- This allows forwards and backwards compatibility: the device
- offers all the features it knows about, and the driver
- acknowledges those it understands and wishes to use.
-
- Virtqueues
-
-The mechanism for bulk data transport on virtio PCI devices is
-pretentiously called a virtqueue. Each device can have zero or
-more virtqueues: for example, the network device has one for
-transmit and one for receive.
-
-Each virtqueue occupies two or more physically-contiguous pages
-(defined, for the purposes of this specification, as 4096 bytes),
-and consists of three parts:
-
-
-+-------------------+-----------------------------------+-----------+
-| Descriptor Table | Available Ring (padding) | Used Ring |
-+-------------------+-----------------------------------+-----------+
-
-
-When the driver wants to send a buffer to the device, it fills in
-a slot in the descriptor table (or chains several together), and
-writes the descriptor index into the available ring. It then
-notifies the device. When the device has finished a buffer, it
-writes the descriptor into the used ring, and sends an interrupt.
-
-Specification
-
- PCI Discovery
-
-Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000
-through 0x103F inclusive is a virtio device[footnote:
-The actual value within this range is ignored
-]. The device must also have a Revision ID of 0 to match this
-specification.
-
-The Subsystem Device ID indicates which virtio device is
-supported by the device. The Subsystem Vendor ID should reflect
-the PCI Vendor ID of the environment (it's currently only used
-for informational purposes by the guest).
-
-
-+----------------------+--------------------+---------------+
-| Subsystem Device ID | Virtio Device | Specification |
-+----------------------+--------------------+---------------+
-+----------------------+--------------------+---------------+
-| 1 | network card | Appendix C |
-+----------------------+--------------------+---------------+
-| 2 | block device | Appendix D |
-+----------------------+--------------------+---------------+
-| 3 | console | Appendix E |
-+----------------------+--------------------+---------------+
-| 4 | entropy source | Appendix F |
-+----------------------+--------------------+---------------+
-| 5 | memory ballooning | Appendix G |
-+----------------------+--------------------+---------------+
-| 6 | ioMemory | - |
-+----------------------+--------------------+---------------+
-| 7 | rpmsg | Appendix H |
-+----------------------+--------------------+---------------+
-| 8 | SCSI host | Appendix I |
-+----------------------+--------------------+---------------+
-| 9 | 9P transport | - |
-+----------------------+--------------------+---------------+
-| 10 | mac80211 wlan | - |
-+----------------------+--------------------+---------------+
-
-
- Device Configuration
-
-To configure the device, we use the first I/O region of the PCI
-device. This contains a virtio header followed by a
-device-specific region.
-
-There may be different widths of accesses to the I/O region; the “
-natural” access method for each field in the virtio header must
-be used (i.e. 32-bit accesses for 32-bit fields, etc), but the
-device-specific region can be accessed using any width accesses,
-and should obtain the same results.
-
-Note that this is possible because while the virtio header is PCI
-(i.e. little) endian, the device-specific region is encoded in
-the native endian of the guest (where such distinction is
-applicable).
-
- Device Initialization Sequence<sub:Device-Initialization-Sequence>
-
-We start with an overview of device initialization, then expand
-on the details of the device and how each step is preformed.
-
- Reset the device. This is not required on initial start up.
-
- The ACKNOWLEDGE status bit is set: we have noticed the device.
-
- The DRIVER status bit is set: we know how to drive the device.
-
- Device-specific setup, including reading the Device Feature
- Bits, discovery of virtqueues for the device, optional MSI-X
- setup, and reading and possibly writing the virtio
- configuration space.
-
- The subset of Device Feature Bits understood by the driver is
- written to the device.
-
- The DRIVER_OK status bit is set.
-
- The device can now be used (ie. buffers added to the
- virtqueues)[footnote:
-Historically, drivers have used the device before steps 5 and 6.
-This is only allowed if the driver does not use any features
-which would alter this early use of the device.
-]
-
-If any of these steps go irrecoverably wrong, the guest should
-set the FAILED status bit to indicate that it has given up on the
-device (it can reset the device later to restart if desired).
-
-We now cover the fields required for general setup in detail.
-
- Virtio Header
-
-The virtio header looks as follows:
-
-
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-| Bits || 32 | 32 | 32 | 16 | 16 | 16 | 8 | 8 |
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-| Read/Write || R | R+W | R+W | R | R+W | R+W | R+W | R |
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-| Purpose || Device | Guest | Queue | Queue | Queue | Queue | Device | ISR |
-| || Features bits 0:31 | Features bits 0:31 | Address | Size | Select | Notify | Status | Status |
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-
-
-If MSI-X is enabled for the device, two additional fields
-immediately follow this header:[footnote:
-ie. once you enable MSI-X on the device, the other fields move.
-If you turn it off again, they move back!
-]
-
-
-+------------++----------------+--------+
-| Bits || 16 | 16 |
- +----------------+--------+
-+------------++----------------+--------+
-| Read/Write || R+W | R+W |
-+------------++----------------+--------+
-| Purpose || Configuration | Queue |
-| (MSI-X) || Vector | Vector |
-+------------++----------------+--------+
-
-
-Immediately following these general headers, there may be
-device-specific headers:
-
-
-+------------++--------------------+
-| Bits || Device Specific |
- +--------------------+
-+------------++--------------------+
-| Read/Write || Device Specific |
-+------------++--------------------+
-| Purpose || Device Specific... |
-| || |
-+------------++--------------------+
-
-
- Device Status
-
-The Device Status field is updated by the guest to indicate its
-progress. This provides a simple low-level diagnostic: it's most
-useful to imagine them hooked up to traffic lights on the console
-indicating the status of each device.
-
-The device can be reset by writing a 0 to this field, otherwise
-at least one bit should be set:
-
- ACKNOWLEDGE (1) Indicates that the guest OS has found the
- device and recognized it as a valid virtio device.
-
- DRIVER (2) Indicates that the guest OS knows how to drive the
- device. Under Linux, drivers can be loadable modules so there
- may be a significant (or infinite) delay before setting this
- bit.
-
- DRIVER_OK (4) Indicates that the driver is set up and ready to
- drive the device.
-
- FAILED (128) Indicates that something went wrong in the guest,
- and it has given up on the device. This could be an internal
- error, or the driver didn't like the device for some reason, or
- even a fatal error during device operation. The device must be
- reset before attempting to re-initialize.
-
- Feature Bits<sub:Feature-Bits>
-
-Thefirst configuration field indicates the features that the
-device supports. The bits are allocated as follows:
-
- 0 to 23 Feature bits for the specific device type
-
- 24 to 32 Feature bits reserved for extensions to the queue and
- feature negotiation mechanisms
-
-For example, feature bit 0 for a network device (i.e. Subsystem
-Device ID 1) indicates that the device supports checksumming of
-packets.
-
-The feature bits are negotiated: the device lists all the
-features it understands in the Device Features field, and the
-guest writes the subset that it understands into the Guest
-Features field. The only way to renegotiate is to reset the
-device.
-
-In particular, new fields in the device configuration header are
-indicated by offering a feature bit, so the guest can check
-before accessing that part of the configuration space.
-
-This allows for forwards and backwards compatibility: if the
-device is enhanced with a new feature bit, older guests will not
-write that feature bit back to the Guest Features field and it
-can go into backwards compatibility mode. Similarly, if a guest
-is enhanced with a feature that the device doesn't support, it
-will not see that feature bit in the Device Features field and
-can go into backwards compatibility mode (or, for poor
-implementations, set the FAILED Device Status bit).
-
- Configuration/Queue Vectors
-
-When MSI-X capability is present and enabled in the device
-(through standard PCI configuration space) 4 bytes at byte offset
-20 are used to map configuration change and queue interrupts to
-MSI-X vectors. In this case, the ISR Status field is unused, and
-device specific configuration starts at byte offset 24 in virtio
-header structure. When MSI-X capability is not enabled, device
-specific configuration starts at byte offset 20 in virtio header.
-
-Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
-Configuration/Queue Vector registers, maps interrupts triggered
-by the configuration change/selected queue events respectively to
-the corresponding MSI-X vector. To disable interrupts for a
-specific event type, unmap it by writing a special NO_VECTOR
-value:
-
-/* Vector value used to disable MSI for queue */
-
-#define VIRTIO_MSI_NO_VECTOR 0xffff
-
-Reading these registers returns vector mapped to a given event,
-or NO_VECTOR if unmapped. All queue and configuration change
-events are unmapped by default.
-
-Note that mapping an event to vector might require allocating
-internal device resources, and might fail. Devices report such
-failures by returning the NO_VECTOR value when the relevant
-Vector field is read. After mapping an event to vector, the
-driver must verify success by reading the Vector field value: on
-success, the previously written value is returned, and on
-failure, NO_VECTOR is returned. If a mapping failure is detected,
-the driver can retry mapping with fewervectors, or disable MSI-X.
-
- Virtqueue Configuration<sec:Virtqueue-Configuration>
-
-As a device can have zero or more virtqueues for bulk data
-transport (for example, the network driver has two), the driver
-needs to configure them as part of the device-specific
-configuration.
-
-This is done as follows, for each virtqueue a device has:
-
- Write the virtqueue index (first queue is 0) to the Queue
- Select field.
-
- Read the virtqueue size from the Queue Size field, which is
- always a power of 2. This controls how big the virtqueue is
- (see below). If this field is 0, the virtqueue does not exist.
-
- Allocate and zero virtqueue in contiguous physical memory, on a
- 4096 byte alignment. Write the physical address, divided by
- 4096 to the Queue Address field.[footnote:
-The 4096 is based on the x86 page size, but it's also large
-enough to ensure that the separate parts of the virtqueue are on
-separate cache lines.
-]
-
- Optionally, if MSI-X capability is present and enabled on the
- device, select a vector to use to request interrupts triggered
- by virtqueue events. Write the MSI-X Table entry number
- corresponding to this vector in Queue Vector field. Read the
- Queue Vector field: on success, previously written value is
- returned; on failure, NO_VECTOR value is returned.
-
-The Queue Size field controls the total number of bytes required
-for the virtqueue according to the following formula:
-
-#define ALIGN(x) (((x) + 4095) & ~4095)
-
-static inline unsigned vring_size(unsigned int qsz)
-
-{
-
- return ALIGN(sizeof(struct vring_desc)*qsz + sizeof(u16)*(2
-+ qsz))
-
- + ALIGN(sizeof(struct vring_used_elem)*qsz);
-
-}
-
-This currently wastes some space with padding, but also allows
-future extensions. The virtqueue layout structure looks like this
-(qsz is the Queue Size field, which is a variable, so this code
-won't compile):
-
-struct vring {
-
- /* The actual descriptors (16 bytes each) */
-
- struct vring_desc desc[qsz];
-
-
-
- /* A ring of available descriptor heads with free-running
-index. */
-
- struct vring_avail avail;
-
-
-
- // Padding to the next 4096 boundary.
-
- char pad[];
-
-
-
- // A ring of used descriptor heads with free-running index.
-
- struct vring_used used;
-
-};
-
- A Note on Virtqueue Endianness
-
-Note that the endian of these fields and everything else in the
-virtqueue is the native endian of the guest, not little-endian as
-PCI normally is. This makes for simpler guest code, and it is
-assumed that the host already has to be deeply aware of the guest
-endian so such an “endian-aware” device is not a significant
-issue.
-
- Descriptor Table
-
-The descriptor table refers to the buffers the guest is using for
-the device. The addresses are physical addresses, and the buffers
-can be chained via the next field. Each descriptor describes a
-buffer which is read-only or write-only, but a chain of
-descriptors can contain both read-only and write-only buffers.
-
-No descriptor chain may be more than 2^32 bytes long in total.struct vring_desc {
-
- /* Address (guest-physical). */
-
- u64 addr;
-
- /* Length. */
-
- u32 len;
-
-/* This marks a buffer as continuing via the next field. */
-
-#define VRING_DESC_F_NEXT 1
-
-/* This marks a buffer as write-only (otherwise read-only). */
-
-#define VRING_DESC_F_WRITE 2
-
-/* This means the buffer contains a list of buffer descriptors.
-*/
-
-#define VRING_DESC_F_INDIRECT 4
-
- /* The flags as indicated above. */
-
- u16 flags;
-
- /* Next field if flags & NEXT */
-
- u16 next;
-
-};
-
-The number of descriptors in the table is specified by the Queue
-Size field for this virtqueue.
-
- <sub:Indirect-Descriptors>Indirect Descriptors
-
-Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_RING_F_INDIRECT_DESC feature can be
-used to allow this (see [cha:Reserved-Feature-Bits]). To increase
-ring capacity it is possible to store a table of indirect
-descriptors anywhere in memory, and insert a descriptor in main
-virtqueue (with flags&INDIRECT on) that refers to memory buffer
-containing this indirect descriptor table; fields addr and len
-refer to the indirect table address and length in bytes,
-respectively. The indirect table layout structure looks like this
-(len is the length of the descriptor that refers to this table,
-which is a variable, so this code won't compile):
-
-struct indirect_descriptor_table {
-
- /* The actual descriptors (16 bytes each) */
-
- struct vring_desc desc[len / 16];
-
-};
-
-The first indirect descriptor is located at start of the indirect
-descriptor table (index 0), additional indirect descriptors are
-chained by next field. An indirect descriptor without next field
-(with flags&NEXT off) signals the end of the indirect descriptor
-table, and transfers control back to the main virtqueue. An
-indirect descriptor can not refer to another indirect descriptor
-table (flags&INDIRECT must be off). A single indirect descriptor
-table can include both read-only and write-only descriptors;
-write-only flag (flags&WRITE) in the descriptor that refers to it
-is ignored.
-
- Available Ring
-
-The available ring refers to what descriptors we are offering the
-device: it refers to the head of a descriptor chain. The “flags”
-field is currently 0 or 1: 1 indicating that we do not need an
-interrupt when the device consumes a descriptor from the
-available ring. Alternatively, the guest can ask the device to
-delay interrupts until an entry with an index specified by the “
-used_event” field is written in the used ring (equivalently,
-until the idx field in the used ring will reach the value
-used_event + 1). The method employed by the device is controlled
-by the VIRTIO_RING_F_EVENT_IDX feature bit (see [cha:Reserved-Feature-Bits]
-). This interrupt suppression is merely an optimization; it may
-not suppress interrupts entirely.
-
-The “idx” field indicates where we would put the next descriptor
-entry (modulo the ring size). This starts at 0, and increases.
-
-struct vring_avail {
-
-#define VRING_AVAIL_F_NO_INTERRUPT 1
-
- u16 flags;
-
- u16 idx;
-
- u16 ring[qsz]; /* qsz is the Queue Size field read from device
-*/
-
- u16 used_event;
-
-};
-
- Used Ring
-
-The used ring is where the device returns buffers once it is done
-with them. The flags field can be used by the device to hint that
-no notification is necessary when the guest adds to the available
-ring. Alternatively, the “avail_event” field can be used by the
-device to hint that no notification is necessary until an entry
-with an index specified by the “avail_event” is written in the
-available ring (equivalently, until the idx field in the
-available ring will reach th