Use mmap'd *.cwasm as a source for memory initialization images (#3787)

* Skip memfd creation with precompiled modules

This commit updates the memfd support internally to not actually use a
memfd if a compiled module originally came from disk via the
`wasmtime::Module::deserialize_file` API. In this situation we already
have a file descriptor open and there's no need to copy a module's heap
image to a new file descriptor.

To facilitate a new source of `mmap` the currently-memfd-specific-logic
of creating a heap image is generalized to a new form of
`MemoryInitialization` which is attempted for all modules at
module-compile-time. This means that the serialized artifact to disk
will have the memory image in its entirety waiting for us. Furthermore
the memory image is ensured to be padded and aligned carefully to the
target system's page size, notably meaning that the data section in the
final object file is page-aligned and the size of the data section is
also page aligned.

This means that when a precompiled module is mapped from disk we can
reuse the underlying `File` to mmap all initial memory images. This
means that the offset-within-the-memory-mapped-file can differ for
memfd-vs-not, but that's just another piece of state to track in the
memfd implementation.

In the limit this waters down the term "memfd" for this technique of
quickly initializing memory because we no longer use memfd
unconditionally (only when the backing file isn't available).
This does however open up an avenue in the future to porting this
support to other OSes because while `memfd_create` is Linux-specific
both macOS and Windows support mapping a file with copy-on-write. This
porting isn't done in this PR and is left for a future refactoring.

Closes #3758

* Enable "memfd" support on all unix systems

Cordon off the Linux-specific bits and enable the memfd support to
compile and run on platforms like macOS which have a Linux-like `mmap`.
This only works if a module is mapped from a precompiled module file on
disk, but that's better than not supporting it at all!

* Fix linux compile

* Use `Arc<File>` instead of `MmapVecFileBacking`

* Use a named struct instead of mysterious tuples

* Comment about unsafety in `Module::deserialize_file`

* Fix tests

* Fix uffd compile

* Always align data segments

No need to have conditional alignment since their sizes are all aligned
anyway

* Update comment in build.rs

* Use rustix, not `region`

* Fix some confusing logic/names around memory indexes

These functions all work with memory indexes, not specifically defined
memory indexes.
This commit is contained in:
Alex Crichton
2022-02-10 15:40:40 -06:00
committed by GitHub
parent 1cb08d4e67
commit c0c368d151
18 changed files with 629 additions and 280 deletions

View File

@@ -353,6 +353,10 @@ impl wasmtime_environ::Compiler for Compiler {
self.isa.triple()
}
fn page_size_align(&self) -> u64 {
self.isa.code_section_alignment()
}
fn flags(&self) -> BTreeMap<String, FlagValue> {
self.isa
.flags()

View File

@@ -221,6 +221,12 @@ pub trait Compiler: Send + Sync {
/// Returns the target triple that this compiler is compiling for.
fn triple(&self) -> &target_lexicon::Triple;
/// Returns the alignment necessary to align values to the page size of the
/// compilation target. Note that this may be an upper-bound where the
/// alignment is larger than necessary for some platforms since it may
/// depend on the platform's runtime configuration.
fn page_size_align(&self) -> u64;
/// Returns a list of configured settings for this compiler.
fn flags(&self) -> BTreeMap<String, FlagValue>;

View File

@@ -114,6 +114,18 @@ pub struct MemoryInitializer {
pub data: Range<u32>,
}
/// Similar to the above `MemoryInitializer` but only used when memory
/// initializers are statically known to be valid.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct StaticMemoryInitializer {
/// The 64-bit offset, in bytes, of where this initializer starts.
pub offset: u64,
/// The range of data to write at `offset`, where these indices are indexes
/// into the compiled wasm module's data section.
pub data: Range<u32>,
}
/// The type of WebAssembly linear memory initialization to use for a module.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum MemoryInitialization {
@@ -159,7 +171,42 @@ pub enum MemoryInitialization {
/// indices, like those in `MemoryInitializer`, point within a data
/// segment that will come as an auxiliary descriptor with other data
/// such as the compiled code for the wasm module.
map: PrimaryMap<MemoryIndex, Vec<(u64, Range<u32>)>>,
map: PrimaryMap<MemoryIndex, Vec<StaticMemoryInitializer>>,
},
/// Memory initialization is statically known and involves a single `memcpy`
/// or otherwise simply making the defined data visible.
///
/// To be statically initialized the same requirements as `Paged` must be
/// met, namely that everything references a dfeined memory and all data
/// segments have a staitcally known in-bounds base (no globals).
///
/// This form of memory initialization is a more optimized version of
/// `Segmented` where memory can be initialized with one of a few methods:
///
/// * First it could be initialized with a single `memcpy` of data from the
/// module to the linear memory.
/// * Otherwise techniques like `mmap` are also possible to make this data,
/// which might reside in a compiled module on disk, available immediately
/// in a linear memory's address space.
///
/// To facilitate the latter fo these techniques the `try_static_init`
/// function below, which creates this variant, takes a host page size
/// argument which can page-align everything to make mmap-ing possible.
Static {
/// The initialization contents for each linear memory.
///
/// This array has, for each module's own linear memory, the contents
/// necessary to initialize it. If the memory has a `None` value then no
/// initialization is necessary (it's zero-filled). Otherwise with
/// `Some` the first element of the tuple is the offset in memory to
/// start the initialization and the `Range` is the range within the
/// final data section of the compiled module of bytes to copy into the
/// memory.
///
/// The offset, range base, and range end are all guaranteed to be page
/// aligned to the page size passed in to `try_static_init`.
map: PrimaryMap<MemoryIndex, Option<StaticMemoryInitializer>>,
},
}
@@ -192,9 +239,9 @@ impl ModuleTranslation<'_> {
let mut data = self.data.iter();
let ok = self.module.memory_initialization.init_memory(
InitMemory::CompileTime(&self.module),
&mut |memory, offset, data_range| {
&mut |memory, init| {
let data = data.next().unwrap();
assert_eq!(data.len(), data_range.len());
assert_eq!(data.len(), init.data.len());
// If an initializer references an imported memory then
// everything will need to be processed in-order anyway to
// handle the dynamic limits of the memory specified.
@@ -203,8 +250,8 @@ impl ModuleTranslation<'_> {
};
let page_size = u64::from(WASM_PAGE_SIZE);
let contents = &mut page_contents[memory];
let mut page_index = offset / page_size;
let mut page_offset = (offset % page_size) as usize;
let mut page_index = init.offset / page_size;
let mut page_offset = (init.offset % page_size) as usize;
let mut data = &data[..];
while !data.is_empty() {
@@ -249,9 +296,12 @@ impl ModuleTranslation<'_> {
let mut offset = 0;
for (memory, pages) in page_contents {
let mut page_offsets = Vec::with_capacity(pages.len());
for (byte_offset, page) in pages {
for (page_index, page) in pages {
let end = offset + (page.len() as u32);
page_offsets.push((byte_offset, offset..end));
page_offsets.push(StaticMemoryInitializer {
offset: page_index * u64::from(WASM_PAGE_SIZE),
data: offset..end,
});
offset = end;
self.data.push(page.into());
}
@@ -261,6 +311,155 @@ impl ModuleTranslation<'_> {
self.module.memory_initialization = MemoryInitialization::Paged { map };
}
/// Similar to the `try_paged_init` method, but attempts to use the
/// `MemoryInitialization::Static` variant.
///
/// Note that the constraints for `Paged` are the same as those for
/// `Static`.
pub fn try_static_init(&mut self, page_size: u64) {
const MAX_IMAGE_SIZE: usize = 1024 * 1024 * 1024; // limit to 1GiB.
// This method only attempts to transform a a `Segmented` memory init
// into a `Static` one, no other state.
if !self.module.memory_initialization.is_segmented() {
return;
}
let page_align = |x: u64| x & !(page_size - 1);
let page_align_up = |x: u64| page_align(x + page_size - 1);
// First build up an in-memory image for each memory. This in-memory
// representation is discarded if the memory initializers aren't "of
// the right shape" where the desired shape is:
//
// * Only initializers for defined memories.
// * Only initializers with static offsets (no globals).
// * Only in-bound initializers.
//
// The `init_memory` method of `MemoryInitialization` is used here to
// do most of the validation for us, and otherwise the data chunks are
// collected into the `images` array here.
let mut images: PrimaryMap<MemoryIndex, Vec<u8>> =
PrimaryMap::with_capacity(self.module.memory_plans.len());
for _ in 0..self.module.memory_plans.len() {
images.push(Vec::new());
}
let mut data = self.data.iter();
let ok = self.module.memory_initialization.init_memory(
InitMemory::CompileTime(&self.module),
&mut |memory, init| {
let data = data.next().unwrap();
assert_eq!(data.len(), init.data.len());
// Static initialization with only one memcpy is only possible
// for defined memories which have a known-starting-as-zero
// state to account for holes between data segments. This means
// that if this is an imported memory then static memory
// initialization isn't possible.
if self.module.defined_memory_index(memory).is_none() {
return false;
}
// Splat the `data_range` into the `image` for this memory,
// updating it as necessary with 0s for holes and such.
let image = &mut images[memory];
let offset = usize::try_from(init.offset).unwrap();
let new_image_len = offset + data.len();
if image.len() < new_image_len {
if new_image_len > MAX_IMAGE_SIZE {
return false;
}
image.resize(new_image_len, 0);
}
image[offset..][..data.len()].copy_from_slice(data);
true
},
);
// If any initializer wasn't applicable then we skip static init
// entirely.
if !ok {
return;
}
// At this point all memories in this module are initialized with
// in-bounds initializers which are all statically known to be valid.
// This means that the memory `images` built up so far are valid for an
// instance of this linear memory. These images are trimmed of their
// leading and trailing zeros and then `self.data` is re-populated with
// new data.
self.data.clear();
assert!(self.data_align.is_none());
self.data_align = Some(page_size);
let mut map = PrimaryMap::with_capacity(images.len());
let mut offset = 0u32;
for (memory, mut image) in images {
// Find the first nonzero byte, and if all the bytes are zero then
// we can skip initialization of this memory entirely since memories
// otherwise start with all zero bytes.
let nonzero_start = match image.iter().position(|b| *b != 0) {
Some(i) => i as u64,
None => {
map.push(None);
continue;
}
};
// Find the last nonzero byte, which must exist at this point since
// we found one going forward. Add one to find the index of the
// last zero, which may also be the length of the image.
let nonzero_end = image.iter().rposition(|b| *b != 0).unwrap() as u64 + 1;
// The offset and length of this image are now page-aligned. This
// isn't strictly required for a runtime-compiled module which is
// never persisted to disk. The purpose of doing this, however is to
// enable the in-memory image, if persisted to disk, possible to
// mmap into an address space at a future date to enable efficient
// initialization of memory. Mapping a file into the address space
// requires a page-aligned offset in the file as well as a
// page-aligned length, so the offset/length are aligned here.
//
// Note that in the future if we can distinguish between modules
// only ever used at runtime and those persisted to disk and used
// later then it's possible to pass a page size parameter to this
// function as "1" which means we won't pad the data with extra
// zeros or anything like that.
let image_offset = page_align(nonzero_start);
let image_len = page_align_up(nonzero_end - image_offset);
assert_eq!(image_offset % page_size, 0);
assert_eq!(image_len % page_size, 0);
// Drop the leading zero bytes on the image, and then also set the
// length of the image to the specified length. Note that this may
// truncate trailing zeros from the image or it may extend the image
// to be page-aligned with zeros.
image.drain(..image_offset as usize);
image.resize(image_len as usize, 0);
self.data.push(image.into());
// Record how this image is initialized, where `image_offset` is the
// offset from the start of linear memory and the length is added to
// the `offset` variable outside of this loop which keeps track of
// the current offset in the data section. This is used to build the
// range `offset..end` which is the range, in the concatenation of
// all memory images of `self.data`, of what memory is located at
// `image_offset`.
let end = offset
.checked_add(u32::try_from(image_len).unwrap())
.unwrap();
let idx = map.push(Some(StaticMemoryInitializer {
offset: image_offset,
data: offset..end,
}));
assert_eq!(idx, memory);
offset = end;
assert_eq!(offset % (page_size as u32), 0);
}
self.module.memory_initialization = MemoryInitialization::Static { map };
}
/// Attempts to convert the module's table initializers to
/// FuncTable form where possible. This enables lazy table
/// initialization later by providing a one-to-one map of initial
@@ -399,7 +598,7 @@ impl MemoryInitialization {
pub fn init_memory(
&self,
state: InitMemory<'_>,
write: &mut dyn FnMut(MemoryIndex, u64, &Range<u32>) -> bool,
write: &mut dyn FnMut(MemoryIndex, &StaticMemoryInitializer) -> bool,
) -> bool {
let initializers = match self {
// Fall through below to the segmented memory one-by-one
@@ -413,9 +612,23 @@ impl MemoryInitialization {
// indices are in-bounds.
MemoryInitialization::Paged { map } => {
for (index, pages) in map {
for (page_index, page) in pages {
debug_assert_eq!(page.end - page.start, WASM_PAGE_SIZE);
let result = write(index, *page_index * u64::from(WASM_PAGE_SIZE), page);
for init in pages {
debug_assert_eq!(init.data.end - init.data.start, WASM_PAGE_SIZE);
let result = write(index, init);
if !result {
return result;
}
}
}
return true;
}
// Like `Paged` above everything's already been validated so this
// can simply forward through the data.
MemoryInitialization::Static { map } => {
for (index, init) in map {
if let Some(init) = init {
let result = write(index, init);
if !result {
return result;
}
@@ -484,7 +697,11 @@ impl MemoryInitialization {
// The limits of the data segment have been validated at this point
// so the `write` callback is called with the range of data being
// written. Any erroneous result is propagated upwards.
let result = write(memory_index, start, data);
let init = StaticMemoryInitializer {
offset: start,
data: data.clone(),
};
let result = write(memory_index, &init);
if !result {
return result;
}

View File

@@ -92,6 +92,14 @@ pub struct ModuleTranslation<'data> {
/// `MemoryInitializer` type.
pub data: Vec<Cow<'data, [u8]>>,
/// The desired alignment of `data` in the final data section of the object
/// file that we'll emit.
///
/// Note that this is 1 by default but `MemoryInitialization::Static` might
/// switch this to a higher alignment to facilitate mmap-ing data from
/// an object file into a linear memory.
pub data_align: Option<u64>,
/// Total size of all data pushed onto `data` so far.
total_data: u32,

View File

@@ -1,10 +1,10 @@
//! Memory management for executable code.
use crate::unwind::UnwindRegistration;
use crate::MmapVec;
use anyhow::{bail, Context, Result};
use object::read::{File, Object, ObjectSection};
use std::mem::ManuallyDrop;
use wasmtime_runtime::MmapVec;
/// Management of executable memory within a `MmapVec`
///

View File

@@ -5,9 +5,9 @@
use crate::code_memory::CodeMemory;
use crate::debug::create_gdbjit_image;
use crate::{MmapVec, ProfilingAgent};
use anyhow::{anyhow, bail, Context, Result};
use object::write::{Object, StandardSegment};
use crate::ProfilingAgent;
use anyhow::{anyhow, bail, Context, Error, Result};
use object::write::{Object, StandardSegment, WritableBuffer};
use object::{File, Object as _, ObjectSection, SectionKind};
use serde::{Deserialize, Serialize};
use std::convert::TryFrom;
@@ -23,7 +23,7 @@ use wasmtime_environ::{
};
use wasmtime_runtime::{
CompiledModuleId, CompiledModuleIdAllocator, GdbJitImageRegistration, InstantiationError,
VMFunctionBody, VMTrampoline,
MmapVec, VMFunctionBody, VMTrampoline,
};
/// This is the name of the section in the final ELF image which contains
@@ -167,6 +167,7 @@ pub fn finish_compile(
debuginfo,
has_unparsed_debuginfo,
data,
data_align,
passive_data,
..
} = translation;
@@ -179,8 +180,13 @@ pub fn finish_compile(
SectionKind::ReadOnlyData,
);
let mut total_data_len = 0;
for data in data.iter() {
obj.append_section_data(data_id, data, 1);
for data in data {
let offset = obj.append_section_data(data_id, &data, data_align.unwrap_or(1));
// All data segments are expected to be adjacent to one another, and
// with a higher alignment each data segment needs to be individually
// aligned to make this so, so assert that the offset this was placed at
// is always against the previous segment.
assert_eq!(offset as usize, total_data_len);
total_data_len += data.len();
}
for data in passive_data.iter() {
@@ -266,7 +272,7 @@ pub fn finish_compile(
bincode::serialize_into(&mut bytes, &info)?;
obj.append_section_data(info_id, &bytes, 1);
return Ok((MmapVec::from_obj(obj)?, info));
return Ok((mmap_vec_from_obj(obj)?, info));
fn push_debug<'a, T>(obj: &mut Object, section: &T)
where
@@ -285,6 +291,74 @@ pub fn finish_compile(
}
}
/// Creates a new `MmapVec` from serializing the specified `obj`.
///
/// The returned `MmapVec` will contain the serialized version of `obj` and
/// is sized appropriately to the exact size of the object serialized.
pub fn mmap_vec_from_obj(obj: Object) -> Result<MmapVec> {
let mut result = ObjectMmap::default();
return match obj.emit(&mut result) {
Ok(()) => {
assert!(result.mmap.is_some(), "no reserve");
let mmap = result.mmap.expect("reserve not called");
assert_eq!(mmap.len(), result.len);
Ok(mmap)
}
Err(e) => match result.err.take() {
Some(original) => Err(original.context(e)),
None => Err(e.into()),
},
};
/// Helper struct to implement the `WritableBuffer` trait from the `object`
/// crate.
///
/// This enables writing an object directly into an mmap'd memory so it's
/// immediately usable for execution after compilation. This implementation
/// relies on a call to `reserve` happening once up front with all the needed
/// data, and the mmap internally does not attempt to grow afterwards.
#[derive(Default)]
struct ObjectMmap {
mmap: Option<MmapVec>,
len: usize,
err: Option<Error>,
}
impl WritableBuffer for ObjectMmap {
fn len(&self) -> usize {
self.len
}
fn reserve(&mut self, additional: usize) -> Result<(), ()> {
assert!(self.mmap.is_none(), "cannot reserve twice");
self.mmap = match MmapVec::with_capacity(additional) {
Ok(mmap) => Some(mmap),
Err(e) => {
self.err = Some(e);
return Err(());
}
};
Ok(())
}
fn resize(&mut self, new_len: usize) {
// Resizing always appends 0 bytes and since new mmaps start out as 0
// bytes we don't actually need to do anything as part of this other
// than update our own length.
if new_len <= self.len {
return;
}
self.len = new_len;
}
fn write_bytes(&mut self, val: &[u8]) {
let mmap = self.mmap.as_mut().expect("write before reserve");
mmap[self.len..][..val.len()].copy_from_slice(val);
self.len += val.len();
}
}
}
/// This is intended to mirror the type tables in `wasmtime_environ`, except that
/// it doesn't store the native signatures which are no longer needed past compilation.
#[derive(Serialize, Deserialize)]

View File

@@ -25,16 +25,14 @@ mod debug;
mod demangling;
mod instantiate;
mod link;
mod mmap_vec;
mod profiling;
mod unwind;
pub use crate::code_memory::CodeMemory;
pub use crate::instantiate::{
finish_compile, subslice_range, CompiledModule, CompiledModuleInfo, SetupError,
SymbolizeContext, TypeTables,
finish_compile, mmap_vec_from_obj, subslice_range, CompiledModule, CompiledModuleInfo,
SetupError, SymbolizeContext, TypeTables,
};
pub use crate::mmap_vec::MmapVec;
pub use demangling::*;
pub use profiling::*;

View File

@@ -11,13 +11,17 @@ fn main() {
.file("src/helpers.c")
.compile("wasmtime-helpers");
// Check to see if we are on Linux and the `memfd` feature is
// Check to see if we are on Unix and the `memfd` feature is
// active. If so, enable the `memfd` rustc cfg so `#[cfg(memfd)]`
// will work.
let os = env::var("CARGO_CFG_TARGET_OS").unwrap();
//
// Note that while this is called memfd it only actually uses the `memfd`
// crate on Linux and on other Unix platforms this tries to reuse mmap'd
// `*.cwasm` files.
let family = env::var("CARGO_CFG_TARGET_FAMILY").unwrap();
let is_memfd = env::var("CARGO_FEATURE_MEMFD").is_ok();
let is_uffd = env::var("CARGO_FEATURE_UFFD").is_ok();
if &os == "linux" && is_memfd && !is_uffd {
if &family == "unix" && is_memfd && !is_uffd {
println!("cargo:rustc-cfg=memfd");
}
}

View File

@@ -366,7 +366,7 @@ fn initialize_memories(instance: &mut Instance, module: &Module) -> Result<(), I
memory_size_in_pages,
get_global_as_u64,
},
&mut |memory_index, offset, data| {
&mut |memory_index, init| {
// If this initializer applies to a defined memory but that memory
// doesn't need initialization, due to something like uffd or memfd
// pre-initializing it via mmap magic, then this initializer can be
@@ -379,8 +379,8 @@ fn initialize_memories(instance: &mut Instance, module: &Module) -> Result<(), I
let memory = instance.get_memory(memory_index);
let dst_slice =
unsafe { slice::from_raw_parts_mut(memory.base, memory.current_length) };
let dst = &mut dst_slice[usize::try_from(offset).unwrap()..][..data.len()];
dst.copy_from_slice(instance.wasm_data(data.clone()));
let dst = &mut dst_slice[usize::try_from(init.offset).unwrap()..][..init.data.len()];
dst.copy_from_slice(instance.wasm_data(init.data.clone()));
true
},
);
@@ -401,7 +401,7 @@ fn check_init_bounds(instance: &mut Instance, module: &Module) -> Result<(), Ins
check_memory_init_bounds(instance, initializers)?;
}
// Statically validated already to have everything in-bounds.
MemoryInitialization::Paged { .. } => {}
MemoryInitialization::Paged { .. } | MemoryInitialization::Static { .. } => {}
}
Ok(())

View File

@@ -268,9 +268,9 @@ unsafe fn initialize_wasm_page(
let memory_index = instance.module().memory_index(memory_index);
let pages = &map[memory_index];
let pos = pages.binary_search_by_key(&(page_index as u64), |k| k.0);
let pos = pages.binary_search_by_key(&((page_index * WASM_PAGE_SIZE) as u64), |k| k.offset);
if let Ok(i) = pos {
let data = instance.wasm_data(pages[i].1.clone());
let data = instance.wasm_data(pages[i].data.clone());
debug_assert_eq!(data.len(), WASM_PAGE_SIZE);
log::trace!(

View File

@@ -37,6 +37,7 @@ mod instance;
mod jit_int;
mod memory;
mod mmap;
mod mmap_vec;
mod table;
mod traphandlers;
mod vmcontext;
@@ -58,6 +59,7 @@ pub use crate::instance::{
pub use crate::jit_int::GdbJitImageRegistration;
pub use crate::memory::{DefaultMemoryCreator, Memory, RuntimeLinearMemory, RuntimeMemoryCreator};
pub use crate::mmap::Mmap;
pub use crate::mmap_vec::MmapVec;
pub use crate::table::{Table, TableElement};
pub use crate::traphandlers::{
catch_traps, init_traps, raise_lib_trap, raise_user_trap, resume_panic, tls_eager_initialize,

View File

@@ -2,14 +2,14 @@
//! to support mapping these backing images into memory.
use crate::InstantiationError;
use crate::MmapVec;
use anyhow::Result;
use libc::c_void;
use memfd::{Memfd, MemfdOptions};
use rustix::fd::AsRawFd;
use std::io::Write;
use std::fs::File;
use std::sync::Arc;
use std::{convert::TryFrom, ops::Range};
use wasmtime_environ::{DefinedMemoryIndex, InitMemory, Module, PrimaryMap};
use wasmtime_environ::{DefinedMemoryIndex, MemoryInitialization, Module, PrimaryMap};
/// MemFDs containing backing images for certain memories in a module.
///
@@ -19,8 +19,6 @@ pub struct ModuleMemFds {
memories: PrimaryMap<DefinedMemoryIndex, Option<Arc<MemoryMemFd>>>,
}
const MAX_MEMFD_IMAGE_SIZE: usize = 1024 * 1024 * 1024; // limit to 1GiB.
impl ModuleMemFds {
/// Get the MemoryMemFd for a given memory.
pub fn get_memory_image(&self, defined_index: DefinedMemoryIndex) -> Option<&Arc<MemoryMemFd>> {
@@ -31,30 +29,160 @@ impl ModuleMemFds {
/// One backing image for one memory.
#[derive(Debug)]
pub struct MemoryMemFd {
/// The actual memfd image: an anonymous file in memory which we
/// use as the backing content for a copy-on-write (CoW) mapping
/// in the memory region.
pub fd: Memfd,
/// Length of image. Note that initial memory size may be larger;
/// leading and trailing zeroes are truncated (handled by
/// anonymous backing memfd).
/// The file descriptor source of this image.
///
/// This might be an mmaped `*.cwasm` file or on Linux it could also be a
/// `Memfd` as an anonymous file in memory. In either case this is used as
/// the backing-source for the CoW image.
fd: MemFdSource,
/// Length of image, in bytes.
///
/// Note that initial memory size may be larger; leading and trailing zeroes
/// are truncated (handled by backing fd).
///
/// Must be a multiple of the system page size.
pub len: usize,
/// Image starts this many bytes into heap space. Note that the
/// memfd's offsets are always equal to the heap offsets, so we
/// map at an offset into the fd as well. (This simplifies
/// construction.)
len: usize,
/// Image starts this many bytes into `fd` source.
///
/// This is 0 for anonymous-backed memfd files and is the offset of the data
/// section in a `*.cwasm` file for `*.cwasm`-backed images.
///
/// Must be a multiple of the system page size.
pub offset: usize,
fd_offset: u64,
/// Image starts this many bytes into heap space.
///
/// Must be a multiple of the system page size.
linear_memory_offset: usize,
}
fn create_memfd() -> Result<Memfd> {
#[derive(Debug)]
enum MemFdSource {
Mmap(Arc<File>),
#[cfg(target_os = "linux")]
Memfd(memfd::Memfd),
}
impl MemFdSource {
fn as_file(&self) -> &File {
match self {
MemFdSource::Mmap(file) => file,
#[cfg(target_os = "linux")]
MemFdSource::Memfd(memfd) => memfd.as_file(),
}
}
}
impl MemoryMemFd {
fn new(
page_size: u32,
offset: u64,
data: &[u8],
mmap: Option<&MmapVec>,
) -> Result<Option<MemoryMemFd>> {
// Sanity-check that various parameters are page-aligned.
let len = data.len();
let offset = u32::try_from(offset).unwrap();
assert_eq!(offset % page_size, 0);
assert_eq!((len as u32) % page_size, 0);
let linear_memory_offset = usize::try_from(offset).unwrap();
// If a backing `mmap` is present then `data` should be a sub-slice of
// the `mmap`. The sanity-checks here double-check that. Additionally
// compilation should have ensured that the `data` section is
// page-aligned within `mmap`, so that's also all double-checked here.
//
// Finally if the `mmap` itself comes from a backing file on disk, such
// as a `*.cwasm` file, then that's a valid source of data for the
// memory image so we simply return referencing that.
//
// Note that this path is platform-agnostic in the sense of all
// platforms we support support memory mapping copy-on-write data from
// files, but for now this is still a Linux-specific region of Wasmtime.
// Some work will be needed to get this file compiling for macOS and
// Windows.
if let Some(mmap) = mmap {
let start = mmap.as_ptr() as usize;
let end = start + mmap.len();
let data_start = data.as_ptr() as usize;
let data_end = data_start + data.len();
assert!(start <= data_start && data_end <= end);
assert_eq!((start as u32) % page_size, 0);
assert_eq!((data_start as u32) % page_size, 0);
assert_eq!((data_end as u32) % page_size, 0);
assert_eq!((mmap.original_offset() as u32) % page_size, 0);
if let Some(file) = mmap.original_file() {
return Ok(Some(MemoryMemFd {
fd: MemFdSource::Mmap(file.clone()),
fd_offset: u64::try_from(mmap.original_offset() + (data_start - start))
.unwrap(),
linear_memory_offset,
len,
}));
}
}
// If `mmap` doesn't come from a file then platform-specific mechanisms
// may be used to place the data in a form that's amenable to an mmap.
cfg_if::cfg_if! {
if #[cfg(target_os = "linux")] {
// On Linux `memfd_create` is used to create an anonymous
// in-memory file to represent the heap image. This anonymous
// file is then used as the basis for further mmaps.
use std::io::Write;
let memfd = create_memfd()?;
memfd.as_file().write_all(data)?;
// Seal the memfd's data and length.
//
// This is a defense-in-depth security mitigation. The
// memfd will serve as the starting point for the heap of
// every instance of this module. If anything were to
// write to this, it could affect every execution. The
// memfd object itself is owned by the machinery here and
// not exposed elsewhere, but it is still an ambient open
// file descriptor at the syscall level, so some other
// vulnerability that allowed writes to arbitrary fds
// could modify it. Or we could have some issue with the
// way that we map it into each instance. To be
// extra-super-sure that it never changes, and because
// this costs very little, we use the kernel's "seal" API
// to make the memfd image permanently read-only.
memfd.add_seal(memfd::FileSeal::SealGrow)?;
memfd.add_seal(memfd::FileSeal::SealShrink)?;
memfd.add_seal(memfd::FileSeal::SealWrite)?;
memfd.add_seal(memfd::FileSeal::SealSeal)?;
Ok(Some(MemoryMemFd {
fd: MemFdSource::Memfd(memfd),
fd_offset: 0,
linear_memory_offset,
len,
}))
} else {
// Other platforms don't have an easily available way of
// representing the heap image as an mmap-source right now. We
// could theoretically create a file and immediately unlink it
// but that means that data may likely be preserved to disk
// which isn't what we want here.
Ok(None)
}
}
}
}
#[cfg(target_os = "linux")]
fn create_memfd() -> Result<memfd::Memfd> {
// Create the memfd. It needs a name, but the
// documentation for `memfd_create()` says that names can
// be duplicated with no issues.
MemfdOptions::new()
memfd::MemfdOptions::new()
.allow_sealing(true)
.create("wasm-memory-image")
.map_err(|e| e.into())
@@ -64,135 +192,47 @@ impl ModuleMemFds {
/// Create a new `ModuleMemFds` for the given module. This can be
/// passed in as part of a `InstanceAllocationRequest` to speed up
/// instantiation and execution by using memfd-backed memories.
pub fn new(module: &Module, wasm_data: &[u8]) -> Result<Option<ModuleMemFds>> {
let page_size = region::page::size() as u64;
let page_align = |x: u64| x & !(page_size - 1);
let page_align_up = |x: u64| page_align(x + page_size - 1);
pub fn new(
module: &Module,
wasm_data: &[u8],
mmap: Option<&MmapVec>,
) -> Result<Option<ModuleMemFds>> {
let map = match &module.memory_initialization {
MemoryInitialization::Static { map } => map,
_ => return Ok(None),
};
let mut memories = PrimaryMap::with_capacity(map.len());
let page_size = region::page::size() as u32;
for (memory_index, init) in map {
// mmap-based-initialization only works for defined memories with a
// known starting point of all zeros, so bail out if the mmeory is
// imported.
let defined_memory = match module.defined_memory_index(memory_index) {
Some(idx) => idx,
None => return Ok(None),
};
// First build up an in-memory image for each memory. This in-memory
// representation is discarded if the memory initializers aren't "of
// the right shape" where the desired shape is:
//
// * Only initializers for defined memories.
// * Only initializers with static offsets (no globals).
// * Only in-bound initializers.
//
// The `init_memory` method of `MemoryInitialization` is used here to
// do most of the validation for us, and otherwise the data chunks are
// collected into the `images` array here.
let mut images: PrimaryMap<DefinedMemoryIndex, Vec<u8>> = PrimaryMap::default();
let num_defined_memories = module.memory_plans.len() - module.num_imported_memories;
for _ in 0..num_defined_memories {
images.push(Vec::new());
}
let ok = module.memory_initialization.init_memory(
InitMemory::CompileTime(module),
&mut |memory, offset, data_range| {
// Memfd-based initialization of an imported memory isn't
// implemented right now, although might perhaps be
// theoretically possible for statically-known-in-bounds
// segments with page-aligned portions.
let memory = match module.defined_memory_index(memory) {
Some(index) => index,
None => return false,
};
// Splat the `data_range` into the `image` for this memory,
// updating it as necessary with 0s for holes and such.
let image = &mut images[memory];
let data = &wasm_data[data_range.start as usize..data_range.end as usize];
let offset = offset as usize;
let new_image_len = offset + data.len();
if image.len() < new_image_len {
if new_image_len > MAX_MEMFD_IMAGE_SIZE {
return false;
}
image.resize(new_image_len, 0);
}
image[offset..][..data.len()].copy_from_slice(data);
true
},
);
// If any initializer wasn't applicable then we skip memfds entirely.
if !ok {
return Ok(None);
}
// With an in-memory representation of all memory images a `memfd` is
// now created and the data is pushed into the memfd. Note that the
// memfd representation will trim leading and trailing pages of zeros
// to store as little data as possible in the memfd. This is not only a
// performance improvement in the sense of "copy less data to the
// kernel" but it's also more performant to fault in zeros from
// anonymous-backed pages instead of memfd-backed pages-of-zeros (as
// the kernel knows anonymous mappings are always zero and has a cache
// of zero'd pages).
let mut memories = PrimaryMap::default();
for (defined_memory, image) in images {
// Find the first nonzero byte, and if all the bytes are zero then
// we can skip the memfd for this memory since there's no
// meaningful initialization.
let nonzero_start = match image.iter().position(|b| *b != 0) {
Some(i) => i as u64,
// If there's no initialization for this memory known then we don't
// need an image for the memory so push `None` and move on.
let init = match init {
Some(init) => init,
None => {
memories.push(None);
continue;
}
};
// Find the last nonzero byte, which must exist at this point since
// we found one going forward. Add one to find the index of the
// last zero, which may also be the length of the image.
let nonzero_end = image.iter().rposition(|b| *b != 0).unwrap() as u64 + 1;
// Get the image for this wasm module as a subslice of `wasm_data`,
// and then use that to try to create the `MemoryMemFd`. If this
// creation files then we fail creating `ModuleMemFds` since this
// memory couldn't be represented.
let data = &wasm_data[init.data.start as usize..init.data.end as usize];
let memfd = match MemoryMemFd::new(page_size, init.offset, data, mmap)? {
Some(memfd) => memfd,
None => return Ok(None),
};
// The offset of this image must be OS-page-aligned since we'll be
// starting the mmap at an aligned address. Align down the start
// index to the first index that's page aligned.
let offset = page_align(nonzero_start);
// The length of the image must also be page aligned and may reach
// beyond the end of the `image` array we have already. Take the
// length of the nonzero portion and then align it up to the page size.
let len = page_align_up(nonzero_end - offset);
// Write the nonzero data to the memfd and then use `set_len` to
// ensure that the length of the memfd is page-aligned where the gap
// at the end, if any, is filled with zeros.
let memfd = create_memfd()?;
memfd
.as_file()
.write_all(&image[offset as usize..nonzero_end as usize])?;
memfd.as_file().set_len(len)?;
// Seal the memfd's data and length.
//
// This is a defense-in-depth security mitigation. The
// memfd will serve as the starting point for the heap of
// every instance of this module. If anything were to
// write to this, it could affect every execution. The
// memfd object itself is owned by the machinery here and
// not exposed elsewhere, but it is still an ambient open
// file descriptor at the syscall level, so some other
// vulnerability that allowed writes to arbitrary fds
// could modify it. Or we could have some issue with the
// way that we map it into each instance. To be
// extra-super-sure that it never changes, and because
// this costs very little, we use the kernel's "seal" API
// to make the memfd image permanently read-only.
memfd.add_seal(memfd::FileSeal::SealGrow)?;
memfd.add_seal(memfd::FileSeal::SealShrink)?;
memfd.add_seal(memfd::FileSeal::SealWrite)?;
memfd.add_seal(memfd::FileSeal::SealSeal)?;
assert_eq!(offset % page_size, 0);
assert_eq!(len % page_size, 0);
let idx = memories.push(Some(Arc::new(MemoryMemFd {
fd: memfd,
offset: usize::try_from(offset).unwrap(),
len: usize::try_from(len).unwrap(),
})));
let idx = memories.push(Some(Arc::new(memfd)));
assert_eq!(idx, defined_memory);
}
@@ -398,19 +438,21 @@ impl MemFdSlot {
// The initial memory image, if given. If not, we just get a
// memory filled with zeroes.
if let Some(image) = maybe_image.as_ref() {
assert!(image.offset.checked_add(image.len).unwrap() <= initial_size_bytes);
assert!(
image.linear_memory_offset.checked_add(image.len).unwrap() <= initial_size_bytes
);
if image.len > 0 {
unsafe {
let ptr = rustix::io::mmap(
(self.base + image.offset) as *mut c_void,
(self.base + image.linear_memory_offset) as *mut c_void,
image.len,
rustix::io::ProtFlags::READ | rustix::io::ProtFlags::WRITE,
rustix::io::MapFlags::PRIVATE | rustix::io::MapFlags::FIXED,
image.fd.as_file(),
0,
image.fd_offset,
)
.map_err(|e| InstantiationError::Resource(e.into()))?;
assert_eq!(ptr as usize, self.base + image.offset);
assert_eq!(ptr as usize, self.base + image.linear_memory_offset);
}
}
}
@@ -424,15 +466,33 @@ impl MemFdSlot {
#[allow(dead_code)] // ignore warnings as this is only used in some cfgs
pub(crate) fn clear_and_remain_ready(&mut self) -> Result<()> {
assert!(self.dirty);
// madvise the image range. This will throw away dirty pages,
// which are CoW-private pages on top of the initial heap
// image memfd.
unsafe {
rustix::io::madvise(
self.base as *mut c_void,
self.cur_size,
rustix::io::Advice::LinuxDontNeed,
)?;
cfg_if::cfg_if! {
if #[cfg(target_os = "linux")] {
// On Linux we can use `madvise` to reset the virtual memory
// back to its original state. This means back to all zeros for
// anonymous-backed pages and back to the original contents for
// CoW memory (the initial heap image). This has the precise
// semantics we want for reuse between instances, so it's all we
// need to do.
unsafe {
rustix::io::madvise(
self.base as *mut c_void,
self.cur_size,
rustix::io::Advice::LinuxDontNeed,
)?;
}
} else {
// If we're not on Linux, however, then there's no generic
// platform way to reset memory back to its original state, so
// instead this is "feigned" by resetting memory back to
// entirely zeros with an anonymous backing.
//
// Additionally the previous image, if any, is dropped here
// since it's no longer applicable to this mapping.
self.reset_with_anon_memory()?;
self.image = None;
}
}
// mprotect the initial heap region beyond the initial heap size back to PROT_NONE.
@@ -521,13 +581,11 @@ impl Drop for MemFdSlot {
}
}
#[cfg(test)]
#[cfg(all(test, target_os = "linux"))]
mod test {
use std::sync::Arc;
use super::create_memfd;
use super::MemFdSlot;
use super::MemoryMemFd;
use super::{create_memfd, MemFdSlot, MemFdSource, MemoryMemFd};
use crate::mmap::Mmap;
use anyhow::Result;
use std::io::Write;
@@ -544,9 +602,10 @@ mod test {
memfd.as_file().set_len(image_len as u64)?;
Ok(MemoryMemFd {
fd: memfd,
fd: MemFdSource::Memfd(memfd),
len: image_len,
offset,
fd_offset: 0,
linear_memory_offset: offset,
})
}

View File

@@ -2,7 +2,7 @@
//! included. Enables unconditional use of the type and its methods
//! throughout higher-level code.
use crate::InstantiationError;
use crate::{InstantiationError, MmapVec};
use anyhow::Result;
use std::sync::Arc;
use wasmtime_environ::{DefinedMemoryIndex, Module};
@@ -19,7 +19,7 @@ impl ModuleMemFds {
/// Construct a new set of memfd images. This variant is used
/// when memfd support is not included; it always returns no
/// images.
pub fn new(_: &Module, _: &[u8]) -> Result<Option<ModuleMemFds>> {
pub fn new(_: &Module, _: &[u8], _: Option<&MmapVec>) -> Result<Option<ModuleMemFds>> {
Ok(None)
}

View File

@@ -10,6 +10,7 @@ use std::ops::Range;
use std::path::Path;
use std::ptr;
use std::slice;
use std::sync::Arc;
/// A simple struct consisting of a page-aligned pointer to page-aligned
/// and initially-zeroed memory and a length.
@@ -21,7 +22,7 @@ pub struct Mmap {
// the coordination all happens at the OS layer.
ptr: usize,
len: usize,
file: Option<File>,
file: Option<Arc<File>>,
}
impl Mmap {
@@ -77,7 +78,7 @@ impl Mmap {
Ok(Self {
ptr: ptr as usize,
len,
file: Some(file),
file: Some(Arc::new(file)),
})
}
@@ -135,7 +136,7 @@ impl Mmap {
let ret = Self {
ptr: ptr as usize,
len,
file: Some(file),
file: Some(Arc::new(file)),
};
// Protect the entire file as PAGE_READONLY to start (i.e.
@@ -418,6 +419,11 @@ impl Mmap {
)?;
Ok(())
}
/// Returns the underlying file that this mmap is mapping, if present.
pub fn original_file(&self) -> Option<&Arc<File>> {
self.file.as_ref()
}
}
impl Drop for Mmap {

View File

@@ -1,9 +1,9 @@
use anyhow::{Context, Error, Result};
use object::write::{Object, WritableBuffer};
use crate::Mmap;
use anyhow::{Context, Result};
use std::fs::File;
use std::ops::{Deref, DerefMut, Range, RangeTo};
use std::path::Path;
use std::sync::Arc;
use wasmtime_runtime::Mmap;
/// A type akin to `Vec<u8>`, but backed by `mmap` and able to be split.
///
@@ -54,26 +54,6 @@ impl MmapVec {
Ok(result)
}
/// Creates a new `MmapVec` from serializing the specified `obj`.
///
/// The returned `MmapVec` will contain the serialized version of `obj` and
/// is sized appropriately to the exact size of the object serialized.
pub fn from_obj(obj: Object) -> Result<MmapVec> {
let mut result = ObjectMmap::default();
match obj.emit(&mut result) {
Ok(()) => {
assert!(result.mmap.is_some(), "no reserve");
let mmap = result.mmap.expect("reserve not called");
assert_eq!(mmap.len(), result.len);
Ok(mmap)
}
Err(e) => match result.err.take() {
Some(original) => Err(original.context(e)),
None => Err(e.into()),
},
}
}
/// Creates a new `MmapVec` which is the `path` specified mmap'd into
/// memory.
///
@@ -137,6 +117,17 @@ impl MmapVec {
self.mmap
.make_executable(range.start + self.range.start..range.end + self.range.start)
}
/// Returns the underlying file that this mmap is mapping, if present.
pub fn original_file(&self) -> Option<&Arc<File>> {
self.mmap.original_file()
}
/// Returns the offset within the original mmap that this `MmapVec` is
/// created from.
pub fn original_offset(&self) -> usize {
self.range.start
}
}
impl Deref for MmapVec {
@@ -164,54 +155,6 @@ impl DerefMut for MmapVec {
}
}
/// Helper struct to implement the `WritableBuffer` trait from the `object`
/// crate.
///
/// This enables writing an object directly into an mmap'd memory so it's
/// immediately usable for execution after compilation. This implementation
/// relies on a call to `reserve` happening once up front with all the needed
/// data, and the mmap internally does not attempt to grow afterwards.
#[derive(Default)]
struct ObjectMmap {
mmap: Option<MmapVec>,
len: usize,
err: Option<Error>,
}
impl WritableBuffer for ObjectMmap {
fn len(&self) -> usize {
self.len
}
fn reserve(&mut self, additional: usize) -> Result<(), ()> {
assert!(self.mmap.is_none(), "cannot reserve twice");
self.mmap = match MmapVec::with_capacity(additional) {
Ok(mmap) => Some(mmap),
Err(e) => {
self.err = Some(e);
return Err(());
}
};
Ok(())
}
fn resize(&mut self, new_len: usize) {
// Resizing always appends 0 bytes and since new mmaps start out as 0
// bytes we don't actually need to do anything as part of this other
// than update our own length.
if new_len <= self.len {
return;
}
self.len = new_len;
}
fn write_bytes(&mut self, val: &[u8]) {
let mmap = self.mmap.as_mut().expect("write before reserve");
mmap[self.len..][..val.len()].copy_from_slice(val);
self.len += val.len();
}
}
#[cfg(test)]
mod tests {
use super::MmapVec;

View File

@@ -14,8 +14,10 @@ use wasmtime_environ::{
DefinedFuncIndex, DefinedMemoryIndex, FunctionInfo, ModuleEnvironment, ModuleIndex, PrimaryMap,
SignatureIndex,
};
use wasmtime_jit::{CompiledModule, CompiledModuleInfo, MmapVec, TypeTables};
use wasmtime_runtime::{CompiledModuleId, MemoryMemFd, ModuleMemFds, VMSharedSignatureIndex};
use wasmtime_jit::{CompiledModule, CompiledModuleInfo, TypeTables};
use wasmtime_runtime::{
CompiledModuleId, MemoryMemFd, MmapVec, ModuleMemFds, VMSharedSignatureIndex,
};
mod registry;
mod serialization;
@@ -424,6 +426,15 @@ impl Module {
translation.try_paged_init();
}
// If configured attempt to use static memory initialization which
// can either at runtime be implemented as a single memcpy to
// initialize memory or otherwise enabling virtual-memory-tricks
// such as mmap'ing from a file to get copy-on-write.
if engine.config().memfd {
let align = engine.compiler().page_size_align();
translation.try_static_init(align);
}
// Attempt to convert table initializer segments to
// FuncTable representation where possible, to enable
// table lazy init.
@@ -495,14 +506,26 @@ impl Module {
/// Same as [`deserialize`], except that the contents of `path` are read to
/// deserialize into a [`Module`].
///
/// For more information see the documentation of the [`deserialize`]
/// method for why this function is `unsafe`.
///
/// This method is provided because it can be faster than [`deserialize`]
/// since the data doesn't need to be copied around, but rather the module
/// can be used directly from an mmap'd view of the file provided.
///
/// [`deserialize`]: Module::deserialize
///
/// # Unsafety
///
/// All of the reasons that [`deserialize`] is `unsafe` applies to this
/// function as well. Arbitrary data loaded from a file may trick Wasmtime
/// into arbitrary code execution since the contents of the file are not
/// validated to be a valid precompiled module.
///
/// Additionally though this function is also `unsafe` because the file
/// referenced must remain unchanged and a valid precompiled module for the
/// entire lifetime of the [`Module`] returned. Any changes to the file on
/// disk may change future instantiations of the module to be incorrect.
/// This is because the file is mapped into memory and lazily loaded pages
/// reflect the current state of the file, not necessarily the origianl
/// state of the file.
pub unsafe fn deserialize_file(engine: &Engine, path: impl AsRef<Path>) -> Result<Module> {
let module = SerializedModule::from_file(path.as_ref(), &engine.config().module_version)?;
module.into_module(engine)
@@ -1013,9 +1036,13 @@ impl wasmtime_runtime::ModuleRuntimeInfo for ModuleInner {
return Ok(None);
}
let memfds = self
.memfds
.get_or_try_init(|| ModuleMemFds::new(self.module.module(), self.module.wasm_data()))?;
let memfds = self.memfds.get_or_try_init(|| {
ModuleMemFds::new(
self.module.module(),
self.module.wasm_data(),
Some(self.module.mmap()),
)
})?;
Ok(memfds
.as_ref()
.and_then(|memfds| memfds.get_memory_image(memory)))

View File

@@ -59,7 +59,8 @@ use std::path::Path;
use std::str::FromStr;
use std::sync::Arc;
use wasmtime_environ::{Compiler, FlagValue, Tunables};
use wasmtime_jit::{subslice_range, CompiledModule, CompiledModuleInfo, MmapVec, TypeTables};
use wasmtime_jit::{subslice_range, CompiledModule, CompiledModuleInfo, TypeTables};
use wasmtime_runtime::MmapVec;
const HEADER: &[u8] = b"\0wasmtime-aot";

View File

@@ -7,7 +7,7 @@ use std::any::Any;
use std::panic::{self, AssertUnwindSafe};
use std::sync::Arc;
use wasmtime_environ::{EntityIndex, FunctionInfo, Module, ModuleType, SignatureIndex};
use wasmtime_jit::{CodeMemory, MmapVec, ProfilingAgent};
use wasmtime_jit::{CodeMemory, ProfilingAgent};
use wasmtime_runtime::{
Imports, InstanceAllocationRequest, InstanceAllocator, InstanceHandle,
OnDemandInstanceAllocator, StorePtr, VMContext, VMFunctionBody, VMSharedSignatureIndex,
@@ -115,7 +115,7 @@ where
stub_fn::<F> as usize,
&mut obj,
)?;
let obj = MmapVec::from_obj(obj)?;
let obj = wasmtime_jit::mmap_vec_from_obj(obj)?;
// Copy the results of JIT compilation into executable memory, and this will
// also take care of unwind table registration.