Add a pooling allocator mode based on copy-on-write mappings of memfds.
As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap *growth*. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be *larger than the file itself*, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
This commit is contained in:
@@ -2,6 +2,7 @@
|
||||
//!
|
||||
//! `RuntimeLinearMemory` is to WebAssembly linear memories what `Table` is to WebAssembly tables.
|
||||
|
||||
use crate::instance::MemFdSlot;
|
||||
use crate::mmap::Mmap;
|
||||
use crate::vmcontext::VMMemoryDefinition;
|
||||
use crate::Store;
|
||||
@@ -208,7 +209,11 @@ pub enum Memory {
|
||||
/// A callback which makes portions of `base` accessible for when memory
|
||||
/// is grown. Otherwise it's expected that accesses to `base` will
|
||||
/// fault.
|
||||
make_accessible: fn(*mut u8, usize) -> Result<()>,
|
||||
make_accessible: Option<fn(*mut u8, usize) -> Result<()>>,
|
||||
|
||||
/// The MemFdSlot, if any, for this memory. Owned here and
|
||||
/// returned to the pooling allocator when termination occurs.
|
||||
memfd_slot: Option<MemFdSlot>,
|
||||
|
||||
/// Stores the pages in the linear memory that have faulted as guard pages when using the `uffd` feature.
|
||||
/// These pages need their protection level reset before the memory can grow.
|
||||
@@ -236,7 +241,8 @@ impl Memory {
|
||||
pub fn new_static(
|
||||
plan: &MemoryPlan,
|
||||
base: &'static mut [u8],
|
||||
make_accessible: fn(*mut u8, usize) -> Result<()>,
|
||||
make_accessible: Option<fn(*mut u8, usize) -> Result<()>>,
|
||||
memfd_slot: Option<MemFdSlot>,
|
||||
store: &mut dyn Store,
|
||||
) -> Result<Self> {
|
||||
let (minimum, maximum) = Self::limit_new(plan, store)?;
|
||||
@@ -246,14 +252,17 @@ impl Memory {
|
||||
_ => base,
|
||||
};
|
||||
|
||||
if minimum > 0 {
|
||||
make_accessible(base.as_mut_ptr(), minimum)?;
|
||||
if let Some(make_accessible) = make_accessible {
|
||||
if minimum > 0 {
|
||||
make_accessible(base.as_mut_ptr(), minimum)?;
|
||||
}
|
||||
}
|
||||
|
||||
Ok(Memory::Static {
|
||||
base,
|
||||
size: minimum,
|
||||
make_accessible,
|
||||
memfd_slot,
|
||||
#[cfg(all(feature = "uffd", target_os = "linux"))]
|
||||
guard_page_faults: Vec::new(),
|
||||
})
|
||||
@@ -373,6 +382,22 @@ impl Memory {
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns whether or not this memory is backed by a MemFD
|
||||
/// image. Note that this is testing whether there is actually an
|
||||
/// *image* mapped, not just whether the MemFD mechanism is being
|
||||
/// used. The distinction is important because if we are not using
|
||||
/// a prevalidated and prepared image, we need to fall back to
|
||||
/// ordinary initialization code.
|
||||
pub(crate) fn is_memfd_with_image(&self) -> bool {
|
||||
match self {
|
||||
Memory::Static {
|
||||
memfd_slot: Some(ref slot),
|
||||
..
|
||||
} => slot.has_image(),
|
||||
_ => false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Grow memory by the specified amount of wasm pages.
|
||||
///
|
||||
/// Returns `None` if memory can't be grown by the specified amount
|
||||
@@ -443,12 +468,33 @@ impl Memory {
|
||||
}
|
||||
|
||||
match self {
|
||||
Memory::Static {
|
||||
base,
|
||||
size,
|
||||
memfd_slot: Some(ref mut memfd_slot),
|
||||
..
|
||||
} => {
|
||||
// Never exceed static memory size
|
||||
if new_byte_size > base.len() {
|
||||
store.memory_grow_failed(&format_err!("static memory size exceeded"));
|
||||
return Ok(None);
|
||||
}
|
||||
|
||||
if let Err(e) = memfd_slot.set_heap_limit(new_byte_size) {
|
||||
store.memory_grow_failed(&e);
|
||||
return Ok(None);
|
||||
}
|
||||
*size = new_byte_size;
|
||||
}
|
||||
Memory::Static {
|
||||
base,
|
||||
size,
|
||||
make_accessible,
|
||||
..
|
||||
} => {
|
||||
let make_accessible = make_accessible
|
||||
.expect("make_accessible must be Some if this is not a MemFD memory");
|
||||
|
||||
// Never exceed static memory size
|
||||
if new_byte_size > base.len() {
|
||||
store.memory_grow_failed(&format_err!("static memory size exceeded"));
|
||||
@@ -540,7 +586,8 @@ impl Default for Memory {
|
||||
Memory::Static {
|
||||
base: &mut [],
|
||||
size: 0,
|
||||
make_accessible: |_, _| unreachable!(),
|
||||
make_accessible: Some(|_, _| unreachable!()),
|
||||
memfd_slot: None,
|
||||
#[cfg(all(feature = "uffd", target_os = "linux"))]
|
||||
guard_page_faults: Vec::new(),
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user