Implement lazy funcref table and anyfunc initialization. (#3733)
During instance initialization, we build two sorts of arrays eagerly:
- We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function
in an instance.
- We initialize every element of a funcref table with an initializer to
a pointer to one of these anyfuncs.
Most instances will not touch (via call_indirect or table.get) all
funcref table elements. And most anyfuncs will never be referenced,
because most functions are never placed in tables or used with
`ref.func`. Thus, both of these initialization tasks are quite wasteful.
Profiling shows that a significant fraction of the remaining
instance-initialization time after our other recent optimizations is
going into these two tasks.
This PR implements two basic ideas:
- The anyfunc array can be lazily initialized as long as we retain the
information needed to do so. For now, in this PR, we just recreate the
anyfunc whenever a pointer is taken to it, because doing so is fast
enough; in the future we could keep some state to know whether the
anyfunc has been written yet and skip this work if redundant.
This technique allows us to leave the anyfunc array as uninitialized
memory, which can be a significant savings. Filling it with
initialized anyfuncs is very expensive, but even zeroing it is
expensive: e.g. in a large module, it can be >500KB.
- A funcref table can be lazily initialized as long as we retain a link
to its corresponding instance and function index for each element. A
zero in a table element means "uninitialized", and a slowpath does the
initialization.
Funcref tables are a little tricky because funcrefs can be null. We need
to distinguish "element was initially non-null, but user stored explicit
null later" from "element never touched" (ie the lazy init should not
blow away an explicitly stored null). We solve this by stealing the LSB
from every funcref (anyfunc pointer): when the LSB is set, the funcref
is initialized and we don't hit the lazy-init slowpath. We insert the
bit on storing to the table and mask it off after loading.
We do have to set up a precomputed array of `FuncIndex`s for the table
in order for this to work. We do this as part of the module compilation.
This PR also refactors the way that the runtime crate gains access to
information computed during module compilation.
Performance effect measured with in-tree benches/instantiation.rs, using
SpiderMonkey built for WASI, and with memfd enabled:
```
BEFORE:
sequential/default/spidermonkey.wasm
time: [68.569 us 68.696 us 68.856 us]
sequential/pooling/spidermonkey.wasm
time: [69.406 us 69.435 us 69.465 us]
parallel/default/spidermonkey.wasm: with 1 background thread
time: [69.444 us 69.470 us 69.497 us]
parallel/default/spidermonkey.wasm: with 16 background threads
time: [183.72 us 184.31 us 184.89 us]
parallel/pooling/spidermonkey.wasm: with 1 background thread
time: [69.018 us 69.070 us 69.136 us]
parallel/pooling/spidermonkey.wasm: with 16 background threads
time: [326.81 us 337.32 us 347.01 us]
WITH THIS PR:
sequential/default/spidermonkey.wasm
time: [6.7821 us 6.8096 us 6.8397 us]
change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05)
Performance has improved.
sequential/pooling/spidermonkey.wasm
time: [3.0410 us 3.0558 us 3.0724 us]
change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05)
Performance has improved.
parallel/default/spidermonkey.wasm: with 1 background thread
time: [7.2643 us 7.2689 us 7.2735 us]
change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05)
Performance has improved.
parallel/default/spidermonkey.wasm: with 16 background threads
time: [147.36 us 148.99 us 150.74 us]
change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05)
Performance has improved.
parallel/pooling/spidermonkey.wasm: with 1 background thread
time: [3.1009 us 3.1021 us 3.1033 us]
change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05)
Performance has improved.
parallel/pooling/spidermonkey.wasm: with 16 background threads
time: [49.449 us 50.475 us 51.540 us]
change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05)
Performance has improved.
```
So an improvement of something like 80-95% for a very large module (7420
functions in its one funcref table, 31928 functions total).
This commit is contained in:
@@ -8,7 +8,7 @@ use anyhow::{bail, format_err, Error, Result};
|
||||
use std::convert::{TryFrom, TryInto};
|
||||
use std::ops::Range;
|
||||
use std::ptr;
|
||||
use wasmtime_environ::{TablePlan, TrapCode, WasmType};
|
||||
use wasmtime_environ::{TablePlan, TrapCode, WasmType, FUNCREF_INIT_BIT, FUNCREF_MASK};
|
||||
|
||||
/// An element going into or coming out of a table.
|
||||
///
|
||||
@@ -19,6 +19,11 @@ pub enum TableElement {
|
||||
FuncRef(*mut VMCallerCheckedAnyfunc),
|
||||
/// An `exrernref`.
|
||||
ExternRef(Option<VMExternRef>),
|
||||
/// An uninitialized funcref value. This should never be exposed
|
||||
/// beyond the `wasmtime` crate boundary; the upper-level code
|
||||
/// (which has access to the info needed for lazy initialization)
|
||||
/// will replace it when fetched.
|
||||
UninitFunc,
|
||||
}
|
||||
|
||||
#[derive(Copy, Clone, PartialEq, Eq)]
|
||||
@@ -33,41 +38,43 @@ unsafe impl Send for TableElement where VMExternRef: Send {}
|
||||
unsafe impl Sync for TableElement where VMExternRef: Sync {}
|
||||
|
||||
impl TableElement {
|
||||
/// Consumes the given raw pointer into a table element.
|
||||
/// Consumes the given raw table element value into a table element.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// This is unsafe as it will *not* clone any externref, leaving the reference count unchanged.
|
||||
///
|
||||
/// This should only be used if the raw pointer is no longer in use.
|
||||
unsafe fn from_raw(ty: TableElementType, ptr: usize) -> Self {
|
||||
match ty {
|
||||
TableElementType::Func => Self::FuncRef(ptr as _),
|
||||
TableElementType::Extern => Self::ExternRef(if ptr == 0 {
|
||||
None
|
||||
} else {
|
||||
Some(VMExternRef::from_raw(ptr as *mut u8))
|
||||
}),
|
||||
unsafe fn from_table_value(ty: TableElementType, ptr: usize) -> Self {
|
||||
match (ty, ptr) {
|
||||
(TableElementType::Func, 0) => Self::UninitFunc,
|
||||
(TableElementType::Func, ptr) => Self::FuncRef((ptr & FUNCREF_MASK) as _),
|
||||
(TableElementType::Extern, 0) => Self::ExternRef(None),
|
||||
(TableElementType::Extern, ptr) => {
|
||||
Self::ExternRef(Some(VMExternRef::from_raw(ptr as *mut u8)))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Clones a table element from the underlying raw pointer.
|
||||
/// Clones a table element from the underlying table element.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// This is unsafe as it will clone any externref, incrementing the reference count.
|
||||
unsafe fn clone_from_raw(ty: TableElementType, ptr: usize) -> Self {
|
||||
match ty {
|
||||
TableElementType::Func => Self::FuncRef(ptr as _),
|
||||
TableElementType::Extern => Self::ExternRef(if ptr == 0 {
|
||||
None
|
||||
} else {
|
||||
Some(VMExternRef::clone_from_raw(ptr as *mut u8))
|
||||
}),
|
||||
unsafe fn clone_from_table_value(ty: TableElementType, ptr: usize) -> Self {
|
||||
match (ty, ptr) {
|
||||
(TableElementType::Func, 0) => Self::UninitFunc,
|
||||
(TableElementType::Func, ptr) => Self::FuncRef((ptr & FUNCREF_MASK) as _),
|
||||
(TableElementType::Extern, 0) => Self::ExternRef(None),
|
||||
(TableElementType::Extern, ptr) => {
|
||||
Self::ExternRef(Some(VMExternRef::clone_from_raw(ptr as *mut u8)))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Consumes a table element into a raw pointer.
|
||||
/// Consumes a table element into a raw table element value. This
|
||||
/// includes any tag bits or other storage details that we
|
||||
/// maintain in the table slot.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
@@ -75,12 +82,41 @@ impl TableElement {
|
||||
/// the reference count.
|
||||
///
|
||||
/// Use `from_raw` to properly drop any table elements stored as raw pointers.
|
||||
unsafe fn into_raw(self) -> usize {
|
||||
unsafe fn into_table_value(self) -> usize {
|
||||
match self {
|
||||
Self::FuncRef(e) => e as _,
|
||||
Self::UninitFunc => 0,
|
||||
Self::FuncRef(e) => (e as usize) | FUNCREF_INIT_BIT,
|
||||
Self::ExternRef(e) => e.map_or(0, |e| e.into_raw() as usize),
|
||||
}
|
||||
}
|
||||
|
||||
/// Consumes a table element into a pointer/reference, as it
|
||||
/// exists outside the table itself. This strips off any tag bits
|
||||
/// or other information that only lives inside the table.
|
||||
///
|
||||
/// Can only be done to an initialized table element; lazy init
|
||||
/// must occur first. (In other words, lazy values do not survive
|
||||
/// beyond the table, as every table read path initializes them.)
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// The same warnings as for `into_table_values()` apply.
|
||||
pub(crate) unsafe fn into_ref_asserting_initialized(self) -> usize {
|
||||
match self {
|
||||
Self::FuncRef(e) => (e as usize),
|
||||
Self::ExternRef(e) => e.map_or(0, |e| e.into_raw() as usize),
|
||||
Self::UninitFunc => panic!("Uninitialized table element value outside of table slot"),
|
||||
}
|
||||
}
|
||||
|
||||
/// Indicates whether this value is the "uninitialized element"
|
||||
/// value.
|
||||
pub(crate) fn is_uninit(&self) -> bool {
|
||||
match self {
|
||||
Self::UninitFunc => true,
|
||||
_ => false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<*mut VMCallerCheckedAnyfunc> for TableElement {
|
||||
@@ -334,7 +370,7 @@ impl Table {
|
||||
pub fn get(&self, index: u32) -> Option<TableElement> {
|
||||
self.elements()
|
||||
.get(index as usize)
|
||||
.map(|p| unsafe { TableElement::clone_from_raw(self.element_type(), *p) })
|
||||
.map(|p| unsafe { TableElement::clone_from_table_value(self.element_type(), *p) })
|
||||
}
|
||||
|
||||
/// Set reference to the specified element.
|
||||
@@ -436,10 +472,10 @@ impl Table {
|
||||
fn set_raw(ty: TableElementType, elem: &mut usize, val: TableElement) {
|
||||
unsafe {
|
||||
let old = *elem;
|
||||
*elem = val.into_raw();
|
||||
*elem = val.into_table_value();
|
||||
|
||||
// Drop the old element
|
||||
let _ = TableElement::from_raw(ty, old);
|
||||
let _ = TableElement::from_table_value(ty, old);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -465,7 +501,7 @@ impl Table {
|
||||
let dst = dst_table.elements_mut();
|
||||
let src = src_table.elements();
|
||||
for (s, d) in src_range.zip(dst_range) {
|
||||
let elem = unsafe { TableElement::clone_from_raw(ty, src[s]) };
|
||||
let elem = unsafe { TableElement::clone_from_table_value(ty, src[s]) };
|
||||
Self::set_raw(ty, &mut dst[d], elem);
|
||||
}
|
||||
}
|
||||
@@ -485,12 +521,12 @@ impl Table {
|
||||
// ranges
|
||||
if dst_range.start <= src_range.start {
|
||||
for (s, d) in src_range.zip(dst_range) {
|
||||
let elem = unsafe { TableElement::clone_from_raw(ty, dst[s]) };
|
||||
let elem = unsafe { TableElement::clone_from_table_value(ty, dst[s]) };
|
||||
Self::set_raw(ty, &mut dst[d], elem);
|
||||
}
|
||||
} else {
|
||||
for (s, d) in src_range.rev().zip(dst_range.rev()) {
|
||||
let elem = unsafe { TableElement::clone_from_raw(ty, dst[s]) };
|
||||
let elem = unsafe { TableElement::clone_from_table_value(ty, dst[s]) };
|
||||
Self::set_raw(ty, &mut dst[d], elem);
|
||||
}
|
||||
}
|
||||
@@ -510,7 +546,7 @@ impl Drop for Table {
|
||||
|
||||
// Properly drop any table elements stored in the table
|
||||
for element in self.elements() {
|
||||
drop(unsafe { TableElement::from_raw(ty, *element) });
|
||||
drop(unsafe { TableElement::from_table_value(ty, *element) });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user