Further minor optimizations to instantiation (#3791)
* Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic
This commit is contained in:
@@ -1556,13 +1556,26 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
|
||||
let sig_id_type = Type::int(u16::from(sig_id_size) * 8).unwrap();
|
||||
let vmctx = self.vmctx(builder.func);
|
||||
let base = builder.ins().global_value(pointer_type, vmctx);
|
||||
let offset =
|
||||
i32::try_from(self.offsets.vmctx_vmshared_signature_id(ty_index)).unwrap();
|
||||
|
||||
// Load the caller ID.
|
||||
// Load the caller ID. This requires loading the
|
||||
// `*mut VMCallerCheckedAnyfunc` base pointer from `VMContext`
|
||||
// and then loading, based on `SignatureIndex`, the
|
||||
// corresponding entry.
|
||||
let mut mem_flags = ir::MemFlags::trusted();
|
||||
mem_flags.set_readonly();
|
||||
let caller_sig_id = builder.ins().load(sig_id_type, mem_flags, base, offset);
|
||||
let signatures = builder.ins().load(
|
||||
pointer_type,
|
||||
mem_flags,
|
||||
base,
|
||||
i32::try_from(self.offsets.vmctx_signature_ids_array()).unwrap(),
|
||||
);
|
||||
let sig_index = self.module.types[ty_index].unwrap_function();
|
||||
let offset =
|
||||
i32::try_from(sig_index.as_u32().checked_mul(sig_id_type.bytes()).unwrap())
|
||||
.unwrap();
|
||||
let caller_sig_id = builder
|
||||
.ins()
|
||||
.load(sig_id_type, mem_flags, signatures, offset);
|
||||
|
||||
// Load the callee ID.
|
||||
let mem_flags = ir::MemFlags::trusted();
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
// interrupts: *const VMInterrupts,
|
||||
// externref_activations_table: *mut VMExternRefActivationsTable,
|
||||
// store: *mut dyn Store,
|
||||
// signature_ids: [VMSharedSignatureIndex; module.num_signature_ids],
|
||||
// signature_ids: *const VMSharedSignatureIndex,
|
||||
// imported_functions: [VMFunctionImport; module.num_imported_functions],
|
||||
// imported_tables: [VMTableImport; module.num_imported_tables],
|
||||
// imported_memories: [VMMemoryImport; module.num_imported_memories],
|
||||
@@ -21,7 +21,7 @@
|
||||
|
||||
use crate::{
|
||||
DefinedGlobalIndex, DefinedMemoryIndex, DefinedTableIndex, FuncIndex, GlobalIndex, MemoryIndex,
|
||||
Module, TableIndex, TypeIndex,
|
||||
Module, TableIndex,
|
||||
};
|
||||
use more_asserts::assert_lt;
|
||||
use std::convert::TryFrom;
|
||||
@@ -52,8 +52,6 @@ fn align(offset: u32, width: u32) -> u32 {
|
||||
pub struct VMOffsets<P> {
|
||||
/// The size in bytes of a pointer on the target.
|
||||
pub ptr: P,
|
||||
/// The number of signature declarations in the module.
|
||||
pub num_signature_ids: u32,
|
||||
/// The number of imported functions in the module.
|
||||
pub num_imported_functions: u32,
|
||||
/// The number of imported tables in the module.
|
||||
@@ -117,8 +115,6 @@ impl PtrSize for u8 {
|
||||
pub struct VMOffsetsFields<P> {
|
||||
/// The size in bytes of a pointer on the target.
|
||||
pub ptr: P,
|
||||
/// The number of signature declarations in the module.
|
||||
pub num_signature_ids: u32,
|
||||
/// The number of imported functions in the module.
|
||||
pub num_imported_functions: u32,
|
||||
/// The number of imported tables in the module.
|
||||
@@ -142,7 +138,6 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
pub fn new(ptr: P, module: &Module) -> Self {
|
||||
VMOffsets::from(VMOffsetsFields {
|
||||
ptr,
|
||||
num_signature_ids: cast_to_u32(module.types.len()),
|
||||
num_imported_functions: cast_to_u32(module.num_imported_funcs),
|
||||
num_imported_tables: cast_to_u32(module.num_imported_tables),
|
||||
num_imported_memories: cast_to_u32(module.num_imported_memories),
|
||||
@@ -165,7 +160,6 @@ impl<P: PtrSize> From<VMOffsetsFields<P>> for VMOffsets<P> {
|
||||
fn from(fields: VMOffsetsFields<P>) -> VMOffsets<P> {
|
||||
let mut ret = Self {
|
||||
ptr: fields.ptr,
|
||||
num_signature_ids: fields.num_signature_ids,
|
||||
num_imported_functions: fields.num_imported_functions,
|
||||
num_imported_tables: fields.num_imported_tables,
|
||||
num_imported_memories: fields.num_imported_memories,
|
||||
@@ -210,12 +204,7 @@ impl<P: PtrSize> From<VMOffsetsFields<P>> for VMOffsets<P> {
|
||||
.unwrap();
|
||||
ret.imported_functions = ret
|
||||
.signature_ids
|
||||
.checked_add(
|
||||
fields
|
||||
.num_signature_ids
|
||||
.checked_mul(u32::from(ret.size_of_vmshared_signature_index()))
|
||||
.unwrap(),
|
||||
)
|
||||
.checked_add(u32::from(ret.ptr.size()))
|
||||
.unwrap();
|
||||
ret.imported_tables = ret
|
||||
.imported_functions
|
||||
@@ -535,9 +524,9 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
self.store
|
||||
}
|
||||
|
||||
/// The offset of the `signature_ids` array.
|
||||
/// The offset of the `signature_ids` array pointer.
|
||||
#[inline]
|
||||
pub fn vmctx_signature_ids_begin(&self) -> u32 {
|
||||
pub fn vmctx_signature_ids_array(&self) -> u32 {
|
||||
self.signature_ids
|
||||
}
|
||||
|
||||
@@ -603,14 +592,6 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
self.size
|
||||
}
|
||||
|
||||
/// Return the offset to `VMSharedSignatureId` index `index`.
|
||||
#[inline]
|
||||
pub fn vmctx_vmshared_signature_id(&self, index: TypeIndex) -> u32 {
|
||||
assert_lt!(index.as_u32(), self.num_signature_ids);
|
||||
self.vmctx_signature_ids_begin()
|
||||
+ index.as_u32() * u32::from(self.size_of_vmshared_signature_index())
|
||||
}
|
||||
|
||||
/// Return the offset to `VMFunctionImport` index `index`.
|
||||
#[inline]
|
||||
pub fn vmctx_vmfunction_import(&self, index: FuncIndex) -> u32 {
|
||||
|
||||
@@ -1037,7 +1037,6 @@ mod tests {
|
||||
|
||||
let offsets = wasmtime_environ::VMOffsets::from(wasmtime_environ::VMOffsetsFields {
|
||||
ptr: 8,
|
||||
num_signature_ids: 0,
|
||||
num_imported_functions: 0,
|
||||
num_imported_tables: 0,
|
||||
num_imported_memories: 0,
|
||||
@@ -1064,7 +1063,6 @@ mod tests {
|
||||
|
||||
let offsets = wasmtime_environ::VMOffsets::from(wasmtime_environ::VMOffsetsFields {
|
||||
ptr: 8,
|
||||
num_signature_ids: 0,
|
||||
num_imported_functions: 0,
|
||||
num_imported_tables: 0,
|
||||
num_imported_memories: 0,
|
||||
@@ -1091,7 +1089,6 @@ mod tests {
|
||||
|
||||
let offsets = wasmtime_environ::VMOffsets::from(wasmtime_environ::VMOffsetsFields {
|
||||
ptr: 8,
|
||||
num_signature_ids: 0,
|
||||
num_imported_functions: 0,
|
||||
num_imported_tables: 0,
|
||||
num_imported_memories: 0,
|
||||
|
||||
@@ -3,9 +3,7 @@ use crate::instance::{Instance, InstanceHandle, RuntimeMemoryCreator};
|
||||
use crate::memory::{DefaultMemoryCreator, Memory};
|
||||
use crate::table::Table;
|
||||
use crate::traphandlers::Trap;
|
||||
use crate::vmcontext::{
|
||||
VMBuiltinFunctionsArray, VMCallerCheckedAnyfunc, VMGlobalDefinition, VMSharedSignatureIndex,
|
||||
};
|
||||
use crate::vmcontext::{VMBuiltinFunctionsArray, VMCallerCheckedAnyfunc, VMGlobalDefinition};
|
||||
use crate::ModuleRuntimeInfo;
|
||||
use crate::Store;
|
||||
use anyhow::Result;
|
||||
@@ -18,8 +16,8 @@ use std::sync::Arc;
|
||||
use thiserror::Error;
|
||||
use wasmtime_environ::{
|
||||
DefinedMemoryIndex, DefinedTableIndex, EntityRef, GlobalInit, InitMemory, MemoryInitialization,
|
||||
MemoryInitializer, Module, ModuleType, PrimaryMap, TableInitialization, TableInitializer,
|
||||
TrapCode, WasmType, WASM_PAGE_SIZE,
|
||||
MemoryInitializer, Module, PrimaryMap, TableInitialization, TableInitializer, TrapCode,
|
||||
WasmType, WASM_PAGE_SIZE,
|
||||
};
|
||||
|
||||
#[cfg(feature = "pooling-allocator")]
|
||||
@@ -445,14 +443,8 @@ unsafe fn initialize_vmcontext(instance: &mut Instance, req: InstanceAllocationR
|
||||
let module = req.runtime_info.module();
|
||||
|
||||
// Initialize shared signatures
|
||||
let mut ptr = instance.vmctx_plus_offset(instance.offsets.vmctx_signature_ids_begin());
|
||||
for sig in module.types.values() {
|
||||
*ptr = match sig {
|
||||
ModuleType::Function(sig) => req.runtime_info.signature(*sig),
|
||||
_ => VMSharedSignatureIndex::new(u32::max_value()),
|
||||
};
|
||||
ptr = ptr.add(1);
|
||||
}
|
||||
let signatures = req.runtime_info.signature_ids();
|
||||
*instance.vmctx_plus_offset(instance.offsets.vmctx_signature_ids_array()) = signatures.as_ptr();
|
||||
|
||||
// Initialize the built-in functions
|
||||
*instance.vmctx_plus_offset(instance.offsets.vmctx_builtin_functions()) =
|
||||
@@ -460,25 +452,25 @@ unsafe fn initialize_vmcontext(instance: &mut Instance, req: InstanceAllocationR
|
||||
|
||||
// Initialize the imports
|
||||
debug_assert_eq!(req.imports.functions.len(), module.num_imported_funcs);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.functions.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_functions_begin()),
|
||||
req.imports.functions.len(),
|
||||
);
|
||||
debug_assert_eq!(req.imports.tables.len(), module.num_imported_tables);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.tables.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_tables_begin()),
|
||||
req.imports.tables.len(),
|
||||
);
|
||||
debug_assert_eq!(req.imports.memories.len(), module.num_imported_memories);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.memories.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_memories_begin()),
|
||||
req.imports.memories.len(),
|
||||
);
|
||||
debug_assert_eq!(req.imports.globals.len(), module.num_imported_globals);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.globals.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_globals_begin()),
|
||||
req.imports.globals.len(),
|
||||
|
||||
@@ -299,7 +299,6 @@ impl InstancePool {
|
||||
// Calculate the maximum size of an Instance structure given the limits
|
||||
let offsets = VMOffsets::from(VMOffsetsFields {
|
||||
ptr: HostPtr,
|
||||
num_signature_ids: module_limits.types,
|
||||
num_imported_functions: module_limits.imported_functions,
|
||||
num_imported_tables: module_limits.imported_tables,
|
||||
num_imported_memories: module_limits.imported_memories,
|
||||
@@ -1453,6 +1452,9 @@ mod test {
|
||||
fn wasm_data(&self) -> &[u8] {
|
||||
&[]
|
||||
}
|
||||
fn signature_ids(&self) -> &[VMSharedSignatureIndex] {
|
||||
&[]
|
||||
}
|
||||
}
|
||||
|
||||
Arc::new(RuntimeInfo(module))
|
||||
|
||||
@@ -190,4 +190,8 @@ pub trait ModuleRuntimeInfo: Send + Sync + 'static {
|
||||
|
||||
/// A slice pointing to all data that is referenced by this instance.
|
||||
fn wasm_data(&self) -> &[u8];
|
||||
|
||||
/// Returns an array, indexed by `SignatureIndex` of all
|
||||
/// `VMSharedSignatureIndex` entries corresponding to the `SignatureIndex`.
|
||||
fn signature_ids(&self) -> &[VMSharedSignatureIndex];
|
||||
}
|
||||
|
||||
@@ -188,7 +188,10 @@ pub(crate) struct FuncData {
|
||||
// optimized use cases (e.g. `TypedFunc`) it's not actually needed or it's
|
||||
// only needed rarely. To handle that this is an optionally-contained field
|
||||
// which is lazily loaded into as part of `Func::call`.
|
||||
ty: Option<FuncType>,
|
||||
//
|
||||
// Also note that this is intentionally placed behind a poiner to keep it
|
||||
// small as `FuncData` instances are often inserted into a `Store`.
|
||||
ty: Option<Box<FuncType>>,
|
||||
}
|
||||
|
||||
/// The three ways that a function can be created and referenced from within a
|
||||
@@ -216,7 +219,12 @@ enum FuncKind {
|
||||
/// `Func::new` or similar APIs. The `HostFunc` internally owns the
|
||||
/// `InstanceHandle` and that will get dropped when this `HostFunc` itself
|
||||
/// is dropped.
|
||||
Host(HostFunc),
|
||||
///
|
||||
/// Note that this is intentionally placed behind a `Box` to minimize the
|
||||
/// size of this enum since the most common variant for high-peformance
|
||||
/// situations is `SharedHost` and `StoreOwned`, so this ideally isn't
|
||||
/// larger than those two.
|
||||
Host(Box<HostFunc>),
|
||||
}
|
||||
|
||||
macro_rules! for_each_function_signature {
|
||||
@@ -711,7 +719,7 @@ impl Func {
|
||||
// this time.
|
||||
if store.store_data()[self.0].ty.is_none() {
|
||||
let ty = self.load_ty(store);
|
||||
store.store_data_mut()[self.0].ty = Some(ty);
|
||||
store.store_data_mut()[self.0].ty = Some(Box::new(ty));
|
||||
}
|
||||
|
||||
(store.store_data()[self.0].ty.as_ref().unwrap(), store)
|
||||
@@ -2091,7 +2099,7 @@ impl HostFunc {
|
||||
/// Same as [`HostFunc::to_func`], different ownership.
|
||||
unsafe fn into_func(self, store: &mut StoreOpaque) -> Func {
|
||||
self.validate_store(store);
|
||||
Func::from_func_kind(FuncKind::Host(self), store)
|
||||
Func::from_func_kind(FuncKind::Host(Box::new(self)), store)
|
||||
}
|
||||
|
||||
fn validate_store(&self, store: &mut StoreOpaque) {
|
||||
|
||||
@@ -902,6 +902,9 @@ impl<'a> ImportsBuilder<'a> {
|
||||
pub struct InstancePre<T> {
|
||||
module: Module,
|
||||
items: Vec<Definition>,
|
||||
/// A count of `Definition::HostFunc` entries in `items` above to
|
||||
/// preallocate space in a `Store` up front for all entries to be inserted.
|
||||
host_funcs: usize,
|
||||
_marker: std::marker::PhantomData<fn() -> T>,
|
||||
}
|
||||
|
||||
@@ -911,6 +914,7 @@ impl<T> Clone for InstancePre<T> {
|
||||
Self {
|
||||
module: self.module.clone(),
|
||||
items: self.items.clone(),
|
||||
host_funcs: self.host_funcs,
|
||||
_marker: self._marker,
|
||||
}
|
||||
}
|
||||
@@ -923,9 +927,17 @@ impl<T> InstancePre<T> {
|
||||
items: Vec<Definition>,
|
||||
) -> Result<InstancePre<T>> {
|
||||
typecheck_defs(store, module, &items)?;
|
||||
let host_funcs = items
|
||||
.iter()
|
||||
.filter(|i| match i {
|
||||
Definition::HostFunc(_) => true,
|
||||
_ => false,
|
||||
})
|
||||
.count();
|
||||
Ok(InstancePre {
|
||||
module: module.clone(),
|
||||
items,
|
||||
host_funcs,
|
||||
_marker: std::marker::PhantomData,
|
||||
})
|
||||
}
|
||||
@@ -952,7 +964,7 @@ impl<T> InstancePre<T> {
|
||||
// passed in.
|
||||
let mut store = store.as_context_mut();
|
||||
let mut instantiator = unsafe {
|
||||
self.ensure_comes_from_same_store(&store.0)?;
|
||||
self.verify_store_and_reserve_space(&mut store.0)?;
|
||||
Instantiator::new(
|
||||
store.0,
|
||||
&self.module,
|
||||
@@ -984,7 +996,7 @@ impl<T> InstancePre<T> {
|
||||
// For the unsafety here see above
|
||||
let mut store = store.as_context_mut();
|
||||
let mut i = unsafe {
|
||||
self.ensure_comes_from_same_store(&store.0)?;
|
||||
self.verify_store_and_reserve_space(&mut store.0)?;
|
||||
Instantiator::new(
|
||||
store.0,
|
||||
&self.module,
|
||||
@@ -1002,12 +1014,17 @@ impl<T> InstancePre<T> {
|
||||
.await?
|
||||
}
|
||||
|
||||
fn ensure_comes_from_same_store(&self, store: &StoreOpaque) -> Result<()> {
|
||||
fn verify_store_and_reserve_space(&self, store: &mut StoreOpaque) -> Result<()> {
|
||||
for import in self.items.iter() {
|
||||
if !import.comes_from_same_store(store) {
|
||||
bail!("cross-`Store` instantiation is not currently supported");
|
||||
}
|
||||
}
|
||||
// Any linker-defined function of the `Definition::HostFunc` variant
|
||||
// will insert a function into the store automatically as part of
|
||||
// instantiation, so reserve space here to make insertion more efficient
|
||||
// as it won't have to realloc during the instantiation.
|
||||
store.store_data_mut().reserve_funcs(self.host_funcs);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1055,6 +1055,10 @@ impl wasmtime_runtime::ModuleRuntimeInfo for ModuleInner {
|
||||
fn wasm_data(&self) -> &[u8] {
|
||||
self.module.wasm_data()
|
||||
}
|
||||
|
||||
fn signature_ids(&self) -> &[VMSharedSignatureIndex] {
|
||||
self.signatures.as_module_map().values().as_slice()
|
||||
}
|
||||
}
|
||||
|
||||
/// A barebones implementation of ModuleRuntimeInfo that is useful for
|
||||
@@ -1144,4 +1148,11 @@ impl wasmtime_runtime::ModuleRuntimeInfo for BareModuleInfo {
|
||||
fn wasm_data(&self) -> &[u8] {
|
||||
&[]
|
||||
}
|
||||
|
||||
fn signature_ids(&self) -> &[VMSharedSignatureIndex] {
|
||||
match &self.one_signature {
|
||||
Some((_, id)) => std::slice::from_ref(id),
|
||||
None => &[],
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -103,6 +103,10 @@ impl StoreData {
|
||||
pub(crate) fn funcs(&self) -> impl Iterator<Item = &crate::func::FuncData> {
|
||||
self.funcs.iter()
|
||||
}
|
||||
|
||||
pub(crate) fn reserve_funcs(&mut self, count: usize) {
|
||||
self.funcs.reserve(count);
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Index<Stored<T>> for StoreData
|
||||
|
||||
Reference in New Issue
Block a user