Add a pooling allocator mode based on copy-on-write mappings of memfds.

As first suggested by Jan on the Zulip here [1], a cheap and effective
way to obtain copy-on-write semantics of a "backing image" for a Wasm
memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism
provided by the Linux kernel allows us to create anonymous,
in-memory-only files that we can use for this mapping, so we can
construct the image contents on-the-fly then effectively create a CoW
overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)`
will discard the CoW overlay, returning the mapping to its original
state.

By itself this is almost enough for a very fast
instantiation-termination loop of the same image over and over,
without changing the address space mapping at all (which is
expensive). The only missing bit is how to implement
heap *growth*. But here memfds can help us again: if we create another
anonymous file and map it where the extended parts of the heap would
go, we can take advantage of the fact that a `mmap()` mapping can
be *larger than the file itself*, with accesses beyond the end
generating a `SIGBUS`, and the fact that we can cheaply resize the
file with `ftruncate`, even after a mapping exists. So we can map the
"heap extension" file once with the maximum memory-slot size and grow
the memfd itself as `memory.grow` operations occur.

The above CoW technique and heap-growth technique together allow us a
fastpath of `madvise()` and `ftruncate()` only when we re-instantiate
the same module over and over, as long as we can reuse the same
slot. This fastpath avoids all whole-process address-space locks in
the Linux kernel, which should mean it is highly scalable. It also
avoids the cost of copying data on read, as the `uffd` heap backend
does when servicing pagefaults; the kernel's own optimized CoW
logic (same as used by all file mmaps) is used instead.

[1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
This commit is contained in:
Chris Fallin
2022-01-18 16:42:24 -08:00
parent 90e7cef56c
commit b73ac83c37
26 changed files with 1070 additions and 135 deletions

View File

@@ -2,6 +2,7 @@
//!
//! `RuntimeLinearMemory` is to WebAssembly linear memories what `Table` is to WebAssembly tables.
use crate::instance::MemFdSlot;
use crate::mmap::Mmap;
use crate::vmcontext::VMMemoryDefinition;
use crate::Store;
@@ -208,7 +209,11 @@ pub enum Memory {
/// A callback which makes portions of `base` accessible for when memory
/// is grown. Otherwise it's expected that accesses to `base` will
/// fault.
make_accessible: fn(*mut u8, usize) -> Result<()>,
make_accessible: Option<fn(*mut u8, usize) -> Result<()>>,
/// The MemFdSlot, if any, for this memory. Owned here and
/// returned to the pooling allocator when termination occurs.
memfd_slot: Option<MemFdSlot>,
/// Stores the pages in the linear memory that have faulted as guard pages when using the `uffd` feature.
/// These pages need their protection level reset before the memory can grow.
@@ -236,7 +241,8 @@ impl Memory {
pub fn new_static(
plan: &MemoryPlan,
base: &'static mut [u8],
make_accessible: fn(*mut u8, usize) -> Result<()>,
make_accessible: Option<fn(*mut u8, usize) -> Result<()>>,
memfd_slot: Option<MemFdSlot>,
store: &mut dyn Store,
) -> Result<Self> {
let (minimum, maximum) = Self::limit_new(plan, store)?;
@@ -246,14 +252,17 @@ impl Memory {
_ => base,
};
if minimum > 0 {
make_accessible(base.as_mut_ptr(), minimum)?;
if let Some(make_accessible) = make_accessible {
if minimum > 0 {
make_accessible(base.as_mut_ptr(), minimum)?;
}
}
Ok(Memory::Static {
base,
size: minimum,
make_accessible,
memfd_slot,
#[cfg(all(feature = "uffd", target_os = "linux"))]
guard_page_faults: Vec::new(),
})
@@ -373,6 +382,22 @@ impl Memory {
}
}
/// Returns whether or not this memory is backed by a MemFD
/// image. Note that this is testing whether there is actually an
/// *image* mapped, not just whether the MemFD mechanism is being
/// used. The distinction is important because if we are not using
/// a prevalidated and prepared image, we need to fall back to
/// ordinary initialization code.
pub(crate) fn is_memfd_with_image(&self) -> bool {
match self {
Memory::Static {
memfd_slot: Some(ref slot),
..
} => slot.has_image(),
_ => false,
}
}
/// Grow memory by the specified amount of wasm pages.
///
/// Returns `None` if memory can't be grown by the specified amount
@@ -443,12 +468,33 @@ impl Memory {
}
match self {
Memory::Static {
base,
size,
memfd_slot: Some(ref mut memfd_slot),
..
} => {
// Never exceed static memory size
if new_byte_size > base.len() {
store.memory_grow_failed(&format_err!("static memory size exceeded"));
return Ok(None);
}
if let Err(e) = memfd_slot.set_heap_limit(new_byte_size) {
store.memory_grow_failed(&e);
return Ok(None);
}
*size = new_byte_size;
}
Memory::Static {
base,
size,
make_accessible,
..
} => {
let make_accessible = make_accessible
.expect("make_accessible must be Some if this is not a MemFD memory");
// Never exceed static memory size
if new_byte_size > base.len() {
store.memory_grow_failed(&format_err!("static memory size exceeded"));
@@ -540,7 +586,8 @@ impl Default for Memory {
Memory::Static {
base: &mut [],
size: 0,
make_accessible: |_, _| unreachable!(),
make_accessible: Some(|_, _| unreachable!()),
memfd_slot: None,
#[cfg(all(feature = "uffd", target_os = "linux"))]
guard_page_faults: Vec::new(),
}