When new wasm instances are created repeatedly in high-concurrency environments one of the largest bottlenecks is the contention on kernel-level locks having to do with the virtual memory. It's expected that usage in this environment is leveraging the pooling instance allocator with the `memory-init-cow` feature enabled which means that the kernel level VM lock is acquired in operations such as: 1. Growing a heap with `mprotect` (write lock) 2. Faulting in memory during usage (read lock) 3. Resetting a heap's contents with `madvise` (read lock) 4. Shrinking a heap with `mprotect` when reusing a slot (write lock) Rapid usage of these operations can lead to detrimental performance especially on otherwise heavily loaded systems, worsening the more frequent the above operations are. This commit is aimed at addressing the (2) case above, reducing the number of page faults that are fulfilled by the kernel. Currently these page faults happen for three reasons: * When memory is first accessed after the heap is grown. * When the initial linear memory image is accessed for the first time. * When the initial zero'd heap contents, not part of the linear memory image, are accessed. This PR is attempting to address the latter of these cases, and to a lesser extent the first case as well. Specifically this PR provides the ability to partially reset a pooled linear memory with `memset` rather than `madvise`. This is done to have the same effect of resetting contents to zero but namely has a different effect on paging, notably keeping the pages resident in memory rather than returning them to the kernel. This means that reuse of a linear memory slot on a page that was previously `memset` will not trigger a page fault since everything remains paged into the process. The end result is that any access to linear memory which has been touched by `memset` will no longer page fault on reuse. On more recent kernels (6.0+) this also means pages which were zero'd by `memset`, made inaccessible with `PROT_NONE`, and then made accessible again with `PROT_READ | PROT_WRITE` will not page fault. This can be common when a wasm instances grows its heap slightly, uses that memory, but then it's shrunk when the memory is reused for the next instance. Note that this kernel optimization requires a 6.0+ kernel. This same optimization is furthermore applied to both async stacks with the pooling memory allocator in addition to table elements. The defaults of Wasmtime are not changing with this PR, instead knobs are being exposed for embedders to turn if they so desire. This is currently being experimented with at Fastly and I may come back and alter the defaults of Wasmtime if it seems suitable after our measurements.
About
This crate is the Rust embedding API for the Wasmtime project: a cross-platform engine for running WebAssembly programs. Notable features of Wasmtime are:
-
Fast. Wasmtime is built on the optimizing Cranelift code generator to quickly generate high-quality machine code either at runtime or ahead-of-time. Wasmtime's runtime is also optimized for cases such as efficient instantiation, low-overhead transitions between the embedder and wasm, and scalability of concurrent instances.
-
Secure. Wasmtime's development is strongly focused on the correctness of its implementation with 24/7 fuzzing donated by Google's OSS Fuzz, leveraging Rust's API and runtime safety guarantees, careful design of features and APIs through an RFC process, a security policy in place for when things go wrong, and a release policy for patching older versions as well. We follow best practices for defense-in-depth and known protections and mitigations for issues like Spectre. Finally, we're working to push the state-of-the-art by collaborating with academic researchers to formally verify critical parts of Wasmtime and Cranelift.
-
Configurable. Wastime supports a rich set of APIs and build time configuration to provide many options such as further means of restricting WebAssembly beyond its basic guarantees such as its CPU and Memory consumption. Wasmtime also runs in tiny environments all the way up to massive servers with many concurrent instances.
-
WASI. Wasmtime supports a rich set of APIs for interacting with the host environment through the WASI standard.
-
Standards Compliant. Wasmtime passes the official WebAssembly test suite, implements the official C API of wasm, and implements future proposals to WebAssembly as well. Wasmtime developers are intimately engaged with the WebAssembly standards process all along the way too.
Example
An example of using the Wasmtime embedding API for running a small WebAssembly module might look like:
use anyhow::Result;
use wasmtime::*;
fn main() -> Result<()> {
// Modules can be compiled through either the text or binary format
let engine = Engine::default();
let wat = r#"
(module
(import "host" "hello" (func $host_hello (param i32)))
(func (export "hello")
i32.const 3
call $host_hello)
)
"#;
let module = Module::new(&engine, wat)?;
// Create a `Linker` which will be later used to instantiate this module.
// Host functionality is defined by name within the `Linker`.
let mut linker = Linker::new(&engine);
linker.func_wrap("host", "hello", |caller: Caller<'_, u32>, param: i32| {
println!("Got {} from WebAssembly", param);
println!("my host state is: {}", caller.data());
})?;
// All wasm objects operate within the context of a "store". Each
// `Store` has a type parameter to store host-specific data, which in
// this case we're using `4` for.
let mut store = Store::new(&engine, 4);
let instance = linker.instantiate(&mut store, &module)?;
let hello = instance.get_typed_func::<(), (), _>(&mut store, "hello")?;
// And finally we can call the wasm!
hello.call(&mut store, ())?;
Ok(())
}
More examples and information can be found in the wasmtime crate's online
documentation as well.
Documentation
📚 Read the Wasmtime guide here! 📚
The wasmtime guide is the best starting point to learn about what Wasmtime can do for you or help answer your questions about Wasmtime. If you're curious in contributing to Wasmtime, it can also help you do that!