Commit graph

23 commits

Author SHA1 Message Date
Adrian Ratiu
cdedd7000a seccomp: allow newfstatat in more amd64/arm64 policies
newfstatat has been added to a few policies for the
two 64bit architectures, but some more require it to
avoid crashes, so add it to all which contain fstat
or statx.

BUG=b:187795909
TEST=CQ

Change-Id: I3cd0f5379b87102caa256503a888c5a1aa4103b6
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3198571
Commit-Queue: Manoj Gupta <manojgupta@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
2021-10-01 17:09:16 +00:00
Jorge E. Moreira
c8cff01c36 Specify prctl's policy only once per device
The libminijail version in AOSP complains when there are multiple entries for
the same system call, which was the case for virtio-fs's policy.

BUG=b/185811304

Change-Id: I389c07c86e7d79f16e4f47a893abad598033352a
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2837307
Commit-Queue: Jorge Moreira Broche <jemoreira@google.com>
Tested-by: Jorge Moreira Broche <jemoreira@google.com>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2021-04-20 22:50:20 +00:00
Chirantan Ekbote
a00991cd84 Replace dup with fcntl(F_DUPFD_CLOEXEC)
Fds created via dup don't share file descriptor flags with the original
fd, which means that they don't have the FD_CLOEXEC flag set.  Use
fcntl(F_DUPFD_CLOEXEC) so that this flag gets set for the duplicated fds
as well.

BUG=none
TEST=unit tests

Change-Id: Ib471cf40acac1eacf72969ba45247f50b349ed58
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2809687
Tested-by: kokoro <noreply+kokoro@google.com>
Commit-Queue: Chirantan Ekbote <chirantan@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
2021-04-15 10:34:04 +00:00
Dylan Reid
503c5abef6 devices: Add an asynchronous block device
This enables the use of basic disk images with async IO. A new
block_async.rs is added which mostly mirrors block, except that all
IO operations are asynchronous allowing for multiple virt queues to be
used.

The old block remains unchanged and is still used for qcow, android
sparse, and composite disks. Those should be converted to as time
allows, but this dual approach will have to do for now so ARCVM disk
performance can be properly evaluated.

fio --ioengine=libaio --randrepeat=1 --direct=1 --gtod_reduce=1
--name=test --filename=test --bs=4k --iodepth=64 --size=4G
--readwrite=randrw --rwmixread=75

desktop with nvme:

before:
READ: bw=36.2MiB/s (37.9MB/s), 36.2MiB/s-36.2MiB/s (37.9MB/s-37.9MB/s),
io=3070MiB (3219MB), run=84871-84871msec
WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s),
io=1026MiB (1076MB), run=84871-84871msec
after:
READ: bw=257MiB/s (269MB/s), 257MiB/s-257MiB/s (269MB/s-269MB/s),
io=3070MiB (3219MB), run=11964-11964msec
WRITE: bw=85.8MiB/s (89.9MB/s), 85.8MiB/s-85.8MiB/s (89.9MB/s-89.9MB/s),
io=1026MiB (1076MB), run=11964-11964msec

samus with 5.6 kernel
before:
READ: bw=55.3MiB/s (57.9MB/s), 55.3MiB/s-55.3MiB/s (57.9MB/s-57.9MB/s),
io=768MiB (805MB), run=13890-13890msec
WRITE: bw=18.5MiB/s (19.4MB/s), 18.5MiB/s-18.5MiB/s (19.4MB/s-19.4MB/s),
io=256MiB (269MB), run=13890-13890msec
after:
READ: bw=71.2MiB/s (74.7MB/s), 71.2MiB/s-71.2MiB/s (74.7MB/s-74.7MB/s),
io=3070MiB (3219MB), run=43096-43096msec
WRITE: bw=23.8MiB/s (24.0MB/s), 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s),
io=1026MiB (1076MB), run=43096-43096msec

kevin with 5.6 kernel
before:
READ: bw=12.9MiB/s (13.5MB/s), 12.9MiB/s-12.9MiB/s (13.5MB/s-13.5MB/s),
io=1534MiB (1609MB), run=118963-118963msec
WRITE: bw=4424KiB/s (4530kB/s), 4424KiB/s-4424KiB/s (4530kB/s-4530kB/s),
io=514MiB (539MB), run=118963-118963msec
after:
READ: bw=12.9MiB/s (13.5MB/s), 12.9MiB/s-12.9MiB/s (13.5MB/s-13.5MB/s),
io=1534MiB (1609MB), run=119364-119364msec
WRITE: bw=4409KiB/s (4515kB/s), 4409KiB/s-4409KiB/s (4515kB/s-4515kB/s),
io=514MiB (539MB), run=119364-119364msec

eve with nvme and 5.7 kernel
before:
READ: bw=49.4MiB/s (51.8MB/s), 49.4MiB/s-49.4MiB/s (51.8MB/s-51.8MB/s),
io=3070MiB
(3219MB), run=62195-62195msec
WRITE: bw=16.5MiB/s (17.3MB/s), 16.5MiB/s-16.5MiB/s (17.3MB/s-17.3MB/s),
io=1026MiB
 (1076MB), run=62195-62195msec
after
READ: bw=125MiB/s (131MB/s), 125MiB/s-125MiB/s (131MB/s-131MB/s),
io=3070MiB (3219MB), run=24593-24593msec
WRITE: bw=41.7MiB/s (43.7MB/s), 41.7MiB/s-41.7MiB/s
(43.7MB/s-43.7MB/s), io=1026MiB (1076MB), run=24593-24593msec

rammus with 5.10 kernel
before:
READ: bw=6927KiB/s (7093kB/s), 6927KiB/s-6927KiB/s (7093kB/s-7093kB/s),
io=3070MiB (3219MB), run=453822-453822msec
WRITE: bw=2315KiB/s (2371kB/s), 2315KiB/s-2315KiB/s (2371kB/s-2371kB/s),
io=1026MiB (1076MB), run=453822-453822msec
after:
Run status group 0 (all jobs):
READ: bw=10.0MiB/s (11.5MB/s), 10.0MiB/s-10.0MiB/s (11.5MB/s-11.5MB/s),
io=3070MiB (3219MB), run=279111-279111msec
WRITE: bw=3764KiB/s (3855kB/s), 3764KiB/s-3764KiB/s (3855kB/s-3855kB/s),
io=1026MiB (1076MB), run=279111-279111msec

BUG=chromium:901139
TEST=unitests
TEST=boot a test image and run fio tests from the guest to measure speed.
TEST=start ARCVM
TEST=tast run $DUT crostini.ResizeOk.dlc_stretch_stable

Change-Id: Idb63628871d0352bd18501a69d9c1c887c37607b
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2306786
Tested-by: Keiichi Watanabe <keiichiw@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Commit-Queue: Keiichi Watanabe <keiichiw@chromium.org>
2021-02-17 04:11:55 +00:00
Matt Delco
4389dab579 seccomp: remove redundant unconditional rules
Minijail's policy compiler complains when there's multiple
unconditional rules for a syscall.  In most cases the rules
are redundant to common_device.policy.  I don't know what
to do about the intentionally contradictory rules for open
and openat, other than to remove then from the common device
policy and add it to all the others.

BUG=None
TEST=Ran compile_seccomp_policy.py until it stopped
complaining.

Change-Id: I6813dd1e0b39e975415662bd7de74c25a1be9eb3
Signed-off-by: Matt Delco <delco@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1918607
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2020-01-22 17:36:36 +00:00
Daniel Verkamp
5de0604f29 seccomp: allow statx syscall where stat/fstat was allowed
This is used in Rust 1.40.0's libstd in place of stat/fstat; update the
whitelists to allow the new syscall as well.

BUG=chromium:1042461
TEST=`crosvm disk resize` does not trigger seccomp failure

Change-Id: Ia3f0e49ee009547295c7af7412dfb5eb3ac1efcb
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2003685
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
Commit-Queue: Daniel Verkamp <dverkamp@chromium.org>
2020-01-17 23:04:03 +00:00
Daniel Verkamp
6eadef77a3 sys_util: add WriteZeroesAt trait
Add a variant of WriteZeroes that allows the caller to specify the
offset explicitly instead of using the file's cursor.  This gets rid of
one of the last bits of shared state between disk file users, which will
help in implementing multi-queue support.

Additionally, modify the WriteZeroes trait to use a generic
implementation based on WriteZeroesAt + Seek when possible.

BUG=chromium:858815
TEST=Boot Termina in crosvm

Change-Id: If710159771aeeb55f4f7746dd4354b6c042144e8
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1913519
2019-11-27 21:22:37 +00:00
Daniel Verkamp
ec5916daa0 devices: virtio: block: use FileReadWriteAtVolatile
Use the "at" variants of the read/write functions in the block device.
This reduces the number of syscalls on the host per I/O to one
(pread64/pwrite64) rather than two (lseek + read/write).

The CompositeDiskFile implementation is also updated in this commit,
since it's both a producer and consumer of DiskFile, and it isn't
trivial to update it in a separate commit without breaking compilation.

BUG=None
TEST=Start Crostini on kevin, banon, and nami

Change-Id: I031e7e87cd6c99504db8c56b1725ea51c1e27a53
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1845948
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
2019-10-29 22:06:22 +00:00
Zach Reizner
bae43dd4c9 seccomp: refactor policy into common_device.policy
CQ-DEPEND=CL:1449895
BUG=None
TEST=vmc start termina

Change-Id: Ia3edaafc1d2958bd40e6b1adc89dd5e29b679b06
Reviewed-on: https://chromium-review.googlesource.com/1448292
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
2019-02-07 03:02:12 -08:00
Daniel Verkamp
5656c124af devices: block: fix seccomp failures from free()
It looks like free() will sometimes try to open
/proc/sys/vm/overcommit_memory in order to decide whether to return
freed heap memory to the kernel; change the seccomp filter to fail the
open syscalls with an error code (ENOENT) rather than killing the
process.

Also allow madvise to free memory for the same free() codepath.

BUG=chromium:888212
TEST=Run fio loop test on kevin

Change-Id: I1c27b265b822771f76b7d9572d9759476770000e
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1305756
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2018-10-31 12:42:43 -07:00
Daniel Verkamp
b1570f2672 qcow: track deallocated clusters as unreferenced
In deallocate_cluster(), we call set_cluster_refcount() to unref the
cluster that is being deallocated, but we never actually added the
deallocated cluster to the unref_clusters list.  Add clusters whose
refcounts reach 0 to the unref_clusters list as well.

Also add mremap() to the seccomp whitelist for the block device, since
this is being triggered by libc realloc() and other devices already
include it in the whitelist.

BUG=chromium:850998
TEST=cargo test -p qcow; test crosvm on nami and verify that qcow file
     size stays bounded when creating a 1 GB file and deleting it
     repeatedly

Change-Id: I1bdd96b2176dc13069417e0ac77f0768f9f26012
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1259404
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2018-10-05 07:54:49 -07:00
Daniel Verkamp
616a093d91 devices: block: allow timerfd syscalls in seccomp
"devices: block: Flush a minute after a write" introduced new timerfd_
syscalls into the block device but did not add them to the seccomp
whitelist.

BUG=chromium:885238
TEST=Run crosvm in multiprocess mode and verify that it boots

Change-Id: I1568946c64d86ab7dba535a430a8cbe235f64454
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1231513
Commit-Ready: Dylan Reid <dgreid@chromium.org>
Tested-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2018-09-19 15:40:06 -07:00
Daniel Verkamp
7621d910f5 devices: block: implement discard and write zeroes
Discard and Write Zeroes commands have been added to the virtio block
specification:
88c8553838

Implement both commands using the WriteZeroes trait.

BUG=chromium:850998
TEST=fstrim within termina on a writable qcow image

Change-Id: I33e54e303202328c10f7f2d6e69ab19f419f3998
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1188680
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2018-09-10 13:33:46 -07:00
Dylan Reid
2494ddefb1 qcow: Call fsync(2) when we want to flush to disk
Signal to the OS that we want these writes committed all the way to
disk.  Replace an existing call to flush as that's not sufficient.

Change-Id: I9df9e55d2182e283e15eebc02a54c1ce08434f42
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1060696
Reviewed-by: Zach Reizner <zachr@chromium.org>
2018-05-18 20:08:31 -07:00
Zach Reizner
f96be03cad devices: block: use PollContext in block device
Switching to PollContext so that there is one less user of Poller, which
will be removed.

TEST=run any vm with a block device
BUG=chromium:816692

Change-Id: I2e1301ea9d66012262f1fcb69eaeee9f7464f3b3
Reviewed-on: https://chromium-review.googlesource.com/983036
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
2018-04-04 22:53:22 -07:00
Zach Reizner
5d586b73a4 sys_util: use MADV_DONTDUMP for new mmaps
The mmaps made through the sys_util API are usually for guest memory or
other large shared memory chunks that will pollute the file system with
huge dumps on crash. By using MADV_DONTDUMP, we save the file system
from storing these useless data segments when crosvm crashes.

TEST=./build_test
BUG=None

Change-Id: I2041523648cd7c150bbdbfceef589f42d3f9c2b9
Reviewed-on: https://chromium-review.googlesource.com/890279
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
2018-03-30 21:53:32 -07:00
Zach Reizner
fc44d8059b sys_util: add ppoll to seccomp policies
This really should have been added along with the poll timeout support,
which changed the syscalls used in every jailed device.

TEST=run crosvm with sandboxing enabled
BUG=None

Change-Id: I6129fa589640bb2b85fb4274775192bdd49db672
Reviewed-on: https://chromium-review.googlesource.com/890379
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
2018-01-27 01:36:52 -08:00
Dylan Reid
88624f890e main: Allow qcow files to be used as disks
Using qcow to allow for growable disk. These will be used for user data.

Change-Id: Iefb54eb4255db2ea7693db0020c5f1429acd73fd
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/862629
Reviewed-by: Stephen Barber <smbarber@chromium.org>
2018-01-19 23:29:52 -08:00
Stephen Barber
8b0d12cb0a crosvm: don't die on suspend/resume
Suspend/resume can cause syscall restarts and will cause KVM_RUN ioctls
to return with EINTR. Handle these so the VM doesn't shut down.

BUG=none
TEST=vm survives suspend/resume

Change-Id: I1fab624cb8fe0949d341408f0c962c859a034205
Reviewed-on: https://chromium-review.googlesource.com/750054
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
2017-11-02 11:07:13 -07:00
Stephen Barber
ce374793bf crosvm/devices: set thread names
crosvm spawns a lot of processes/threads, and having these all use the same
name as the original process can be confusing. So at least in the instances
where Rust threads are spawned (vs. minijail_fork()), use a thread::Builder
to allow setting the thread name.

BUG=none
TEST=start crosvm, check thread names with top

Change-Id: I6e55ff5fd60f258880bda8e656ab7f9da82c656e
Reviewed-on: https://chromium-review.googlesource.com/742394
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
2017-10-30 23:21:37 -07:00
Dylan Reid
d37aa9fab5 Add ability to minijail_fork
Change-Id: I0c774816067449cbb838dcf29c6fa947ae5916e1
Reviewed-on: https://chromium-review.googlesource.com/719442
Commit-Ready: Dylan Reid <dgreid@chromium.org>
Tested-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
2017-10-25 05:52:42 -07:00
Zach Reizner
1f77a0daa6 sys_util: use libc's openlog to connect to syslog
By using libc's openlog, we can ensure that the internal state of the
libc syslogger is consistent with the syslog module. Minijail will be
able to print to stderr and the syslog in the same way the logging
macros in crosvm do. The FD the syslog module uses is shared with libc
and via `syslog::get_fds`, jailed processes can inherit the needed FDs
to continue logging.

Now that `sys_log::init()` must be called in single threaded process,
this moves its tests to the list of the serially run ones in
build_test.py.

TEST=./build_test
BUG=None

Change-Id: I8dbc8ebf9d97ef670185259eceac5f6d3d6824ea
Reviewed-on: https://chromium-review.googlesource.com/649951
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2017-09-06 14:31:06 -07:00
Chirantan Ekbote
41d5b5b12a Put seccomp policy files in a common directory
We will almost certainly require different seccomp policy files for
different architectures.  Move all the existing secommp policy files
into a common directory grouped by architecture.

This will make it easier to install them via the ebuild later.

BUG=none
TEST=none

Change-Id: I0495789cd4143dc374ee6ebe083dc20ce724edbb
Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/630058
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
2017-08-25 19:54:16 -07:00
Renamed from block_device.policy (Browse further)