b5964164c4
Refactor the Reader and Writer implementations for DescriptorChains. This has several changes: * Change the DescriptorChainConsumer to keep a VecDeque<VolatileSlice> instead of an iterator. This delegates the fiddly business of sub-slicing chunks of memory to the VolatileSlice implementation. * Read in the entire DescriptorChain once when the Reader or Writer is first constructed. This allows us to validate the DescriptorChain in the beginning rather than having to deal with an invalid DescriptorChain in the middle of the device operating on it. Combined with the check that enforces the ordering of read/write descriptors in a previous change we can be sure that the entire descriptor chain that we have copied in is valid. * Add a new `split_at` method so that we can split the Reader/Writer into multiple pieces, each responsible for reading/writing a separate part of the DescriptorChain. This is particularly useful for implementing zero-copy data transfer as we sometimes need to write the data first and then update an earlier part of the buffer with the number of bytes written. * Stop caching the available bytes in the DescriptorChain. The previous implementation iterated over the remaining descriptors in the chain and then only updated the cached value. If a mis-behaving guest then changed one of the later descriptors, the cached value would no longer be valid. * Check for integer overflow when calculating the number of bytes available in the chain. A guest could fill a chain with five 1GB descriptors and cause an integer overflow on a 32-bit machine. This would previously crash the device process since we compile with integer overflow checks enabled but it would be better to return an error instead. * Clean up the Read/Write impls. Having 2 different functions called `read`, with different behavior is just confusing. Consolidate on the Read/Write traits from `std::io`. * Change the `read_to` and `write_from` functions to be generic over types that implement `FileReadWriteVolatile` since we are not allowed to assume that it's safe to call read or write on something just because it implements `AsRawFd`. Also add `*at` variants that read or write to a particular offset rather than the kernel offset. * Change the callback passed to the `consume` function of `DescriptorChainConsumer` to take a `&[VolatileSlice]` instead. This way we can use the `*vectored` versions of some methods to reduce the number of I/O syscalls we need to make. * Change the `Result` types that are returned. Functions that perform I/O return an `io::Result`. Functions that only work on guest memory return a `guest_memory::Result`. This makes it easier to inter-operate with the functions from `std::io`. * Change some u64/u32 parameters to usize to avoid having to convert back and forth between the two in various places. BUG=b:136128319 TEST=unit tests Change-Id: I15102f7b4035d66b5ce0891df42b656411e8279f Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1757240 Auto-Submit: Chirantan Ekbote <chirantan@chromium.org> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Zach Reizner <zachr@chromium.org> Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> Commit-Queue: Daniel Verkamp <dverkamp@chromium.org> |
||
---|---|---|
aarch64 | ||
arch | ||
assertions | ||
bin | ||
bit_field | ||
crosvm_plugin | ||
data_model | ||
devices | ||
disk | ||
docker | ||
enumn | ||
fuzz | ||
gpu_buffer | ||
gpu_display | ||
gpu_renderer | ||
io_jail | ||
kernel_cmdline | ||
kernel_loader | ||
kokoro | ||
kvm | ||
kvm_sys | ||
msg_socket | ||
net_sys | ||
net_util | ||
p9 | ||
protos | ||
qcow | ||
qcow_utils | ||
rand_ish | ||
render_node_forward | ||
resources | ||
seccomp | ||
src | ||
sync | ||
sys_util | ||
syscall_defines | ||
tempfile | ||
tests | ||
tpm2 | ||
tpm2-sys | ||
usb_sys | ||
usb_util | ||
vfio_sys | ||
vhost | ||
virtio_sys | ||
vm_control | ||
x86_64 | ||
.dockerignore | ||
.gitignore | ||
.gitmodules | ||
.rustfmt.toml | ||
build_test | ||
build_test.py | ||
Cargo.lock | ||
Cargo.toml | ||
LICENSE | ||
OWNERS | ||
README.md | ||
rust-toolchain |
crosvm - The Chrome OS Virtual Machine Monitor
This component, known as crosvm, runs untrusted operating systems along with virtualized devices. No actual hardware is emulated. This only runs VMs through the Linux's KVM interface. What makes crosvm unique is a focus on safety within the programming language and a sandbox around the virtual devices to protect the kernel from attack in case of an exploit in the devices.
Building with Docker
See the README from the docker
subdirectory to learn how
to build crosvm in enviroments outside of the Chrome OS chroot.
Usage
To see the usage information for your version of crosvm, run crosvm
or crosvm run --help
.
Boot a Kernel
To run a very basic VM with just a kernel and default devices:
$ crosvm run "${KERNEL_PATH}"
The uncompressed kernel image, also known as vmlinux, can be found in your kernel
build directory in the case of x86 at arch/x86/boot/compressed/vmlinux
.
Rootfs
In most cases, you will want to give the VM a virtual block device to use as a root file system:
$ crosvm run -r "${ROOT_IMAGE}" "${KERNEL_PATH}"
The root image must be a path to a disk image formatted in a way that the kernel
can read. Typically this is a squashfs image made with mksquashfs
or an ext4
image made with mkfs.ext4
. By using the -r
argument, the kernel is
automatically told to use that image as the root, and therefore can only be
given once. More disks can be given with -d
or --rwdisk
if a writable disk
is desired.
To run crosvm with a writable rootfs:
WARNING: Writable disks are at risk of corruption by a malicious or malfunctioning guest OS.
crosvm run --rwdisk "${ROOT_IMAGE}" -p "root=/dev/vda" vmlinux
NOTE: If more disks arguments are added prior to the desired rootfs image, the
root=/dev/vda
must be adjusted to the appropriate letter.
Control Socket
If the control socket was enabled with -s
, the main process can be controlled
while crosvm is running. To tell crosvm to stop and exit, for example:
NOTE: If the socket path given is for a directory, a socket name underneath that path will be generated based on crosvm's PID.
$ crosvm run -s /run/crosvm.sock ${USUAL_CROSVM_ARGS}
<in another shell>
$ crosvm stop /run/crosvm.sock
WARNING: The guest OS will not be notified or gracefully shutdown.
This will cause the original crosvm process to exit in an orderly fashion, allowing it to clean up any OS resources that might have stuck around if crosvm were terminated early.
Multiprocess Mode
By default crosvm runs in multiprocess mode. Each device that supports running
inside of a sandbox will run in a jailed child process of crosvm. The
appropriate minijail seccomp policy files must be present either in
/usr/share/policy/crosvm
or in the path specified by the
--seccomp-policy-dir
argument. The sandbox can be disabled for testing with
the --disable-sandbox
option.
Virtio Wayland
Virtio Wayland support requires special support on the part of the guest and as
such is unlikely to work out of the box unless you are using a Chrome OS kernel
along with a termina
rootfs.
To use it, ensure that the XDG_RUNTIME_DIR
enviroment variable is set and that
the path $XDG_RUNTIME_DIR/wayland-0
points to the socket of the Wayland
compositor you would like the guest to use.
Defaults
The following are crosvm's default arguments and how to override them.
- 256MB of memory (set with
-m
) - 1 virtual CPU (set with
-c
) - no block devices (set with
-r
,-d
, or--rwdisk
) - no network (set with
--host_ip
,--netmask
, and--mac
) - virtio wayland support if
XDG_RUNTIME_DIR
enviroment variable is set (disable with--no-wl
) - only the kernel arguments necessary to run with the supported devices (add more with
-p
) - run in multiprocess mode (run in single process mode with
--disable-sandbox
) - no control socket (set with
-s
)
System Requirements
A Linux kernel with KVM support (check for /dev/kvm
) is required to run
crosvm. In order to run certain devices, there are additional system
requirements:
virtio-wayland
- Thememfd_create
syscall, introduced in Linux 3.17, and a Wayland compositor.vsock
- Host Linux kernel with vhost-vsock support, introduced in Linux 4.8.multiprocess
- Host Linux kernel with seccomp-bpf and Linux namespacing support.virtio-net
- Host Linux kernel with TUN/TAP support (check for/dev/net/tun
) and running withCAP_NET_ADMIN
privileges.
Emulated Devices
Device | Description |
---|---|
CMOS/RTC |
Used to get the current calendar time. |
i8042 |
Used by the guest kernel to exit crosvm. |
serial |
x86 I/O port driven serial devices that print to stdout and take input from stdin. |
virtio-block |
Basic read/write block device. |
virtio-net |
Device to interface the host and guest networks. |
virtio-rng |
Entropy source used to seed guest OS's entropy pool. |
virtio-vsock |
Enabled VSOCKs for the guests. |
virtio-wayland |
Allowed guest to use host Wayland socket. |
Contributing
Code Health
build_test
There are no automated tests run before code is committed to crosvm. In order to
maintain sanity, please execute build_test
before submitting code for review.
All tests should be passing or ignored and there should be no compiler warnings
or errors. All supported architectures are built, but only tests for x86_64 are
run. In order to build everything without failures, sysroots must be supplied
for each architecture. See build_test -h
for more information.
rustfmt
All code should be formatted with rustfmt
. We have a script that applies
rustfmt to all Rust code in the crosvm repo: please run bin/fmt
before
checking in a change. This is different from cargo fmt --all
which formats
multiple crates but a single workspace only; crosvm consists of multiple
workspaces.
Dependencies
With a few exceptions, external dependencies inside of the Cargo.toml
files
are not allowed. The reason being that community made crates tend to explode the
binary size by including dozens of transitive dependencies. All these
dependencies also must be reviewed to ensure their suitability to the crosvm
project. Currently allowed crates are:
cc
- Build time dependency needed to build C source code used in crosvm.libc
- Required to use the standard library, this crate is a simple wrapper aroundlibc
's symbols.
Code Overview
The crosvm source code is written in Rust and C. To build, crosvm generally requires the most recent stable version of rustc.
Source code is organized into crates, each with their own unit tests. These crates are:
crosvm
- The top-level binary front-end for using crosvm.devices
- Virtual devices exposed to the guest OS.io_jail
- Creates jailed process usinglibminijail
.kernel_loader
- Loads elf64 kernel files to a slice of memory.kvm_sys
- Low-level (mostly) auto-generated structures and constants for using KVM.kvm
- Unsafe, low-level wrapper code for usingkvm_sys
.net_sys
- Low-level (mostly) auto-generated structures and constants for creating TUN/TAP devices.net_util
- Wrapper for creating TUN/TAP devices.sys_util
- Mostly safe wrappers for small system facilities such aseventfd
orsyslog
.syscall_defines
- Lists of syscall numbers in each architecture used to make syscalls not supported inlibc
.vhost
- Wrappers for creating vhost based devices.virtio_sys
- Low-level (mostly) auto-generated structures and constants for interfacing with kernel vhost support.vm_control
- IPC for the VM.x86_64
- Support code specific to 64 bit intel machines.
The seccomp
folder contains minijail seccomp policy files for each sandboxed
device. Because some syscalls vary by architecture, the seccomp policies are
split by architecture.