Further minor optimizations to instantiation (#3791)
* Shrink the size of `FuncData` Before this commit on a 64-bit system the `FuncData` type had a size of 88 bytes and after this commit it has a size of 32 bytes. A `FuncData` is required for all host functions in a store, including those inserted from a `Linker` into a store used during linking. This means that instantiation ends up creating a nontrivial number of these types and pushing them into the store. Looking at some profiles there were some surprisingly expensive movements of `FuncData` from the stack to a vector for moves-by-value generated by Rust. Shrinking this type enables more efficient code to be generated and additionally means less storage is needed in a store's function array. For instantiating the spidermonkey and rustpython modules this improves instantiation by 10% since they each import a fair number of host functions and the speedup here is relative to the number of items imported. * Use `ptr::copy_nonoverlapping` during initialization Prevoiusly `ptr::copy` was used for copying imports into place which translates to `memmove`, but `ptr::copy_nonoverlapping` can be used here since it's statically known these areas don't overlap. While this doesn't end up having a performance difference it's something I kept noticing while looking at the disassembly of `initialize_vmcontext` so I figured I'd go ahead and implement. * Indirect shared signature ids in the VMContext This commit is a small improvement for the instantiation time of modules by avoiding copying a list of `VMSharedSignatureIndex` entries into each `VMContext`, instead building one inside of a module and sharing that amongst all instances. This involves less lookups at instantiation time and less movement of data during instantiation. The downside is that type-checks on `call_indirect` now involve an additionally load, but I'm assuming that these are somewhat pessimized enough as-is that the runtime impact won't be much there. For instantiation performance this is a 5-10% win with rustpyhon/spidermonky instantiation. This should also reduce the size of each `VMContext` for an instantiation since signatures are no longer stored inline but shared amongst all instances with one module. Note that one subtle change here is that the array of `VMSharedSignatureIndex` was previously indexed by `TypeIndex`, and now it's indexed by `SignaturedIndex` which is a deduplicated form of `TypeIndex`. This is done because we already had a list of those lying around in `Module`, so it was easier to reuse that than to build a separate array and store it somewhere. * Reserve space in `Store<T>` with `InstancePre` This commit updates the instantiation process to reserve space in a `Store<T>` for the functions that an `InstancePre<T>`, as part of instantiation, will insert into it. Using an `InstancePre<T>` to instantiate allows pre-computing the number of host functions that will be inserted into a store, and by pre-reserving space we can avoid costly reallocations during instantiation by ensuring the function vector has enough space to fit everything during the instantiation process. Overall this makes instantiation of rustpython/spidermonkey about 8% faster locally. * Fix tests * Use checked arithmetic
This commit is contained in:
@@ -3,9 +3,7 @@ use crate::instance::{Instance, InstanceHandle, RuntimeMemoryCreator};
|
||||
use crate::memory::{DefaultMemoryCreator, Memory};
|
||||
use crate::table::Table;
|
||||
use crate::traphandlers::Trap;
|
||||
use crate::vmcontext::{
|
||||
VMBuiltinFunctionsArray, VMCallerCheckedAnyfunc, VMGlobalDefinition, VMSharedSignatureIndex,
|
||||
};
|
||||
use crate::vmcontext::{VMBuiltinFunctionsArray, VMCallerCheckedAnyfunc, VMGlobalDefinition};
|
||||
use crate::ModuleRuntimeInfo;
|
||||
use crate::Store;
|
||||
use anyhow::Result;
|
||||
@@ -18,8 +16,8 @@ use std::sync::Arc;
|
||||
use thiserror::Error;
|
||||
use wasmtime_environ::{
|
||||
DefinedMemoryIndex, DefinedTableIndex, EntityRef, GlobalInit, InitMemory, MemoryInitialization,
|
||||
MemoryInitializer, Module, ModuleType, PrimaryMap, TableInitialization, TableInitializer,
|
||||
TrapCode, WasmType, WASM_PAGE_SIZE,
|
||||
MemoryInitializer, Module, PrimaryMap, TableInitialization, TableInitializer, TrapCode,
|
||||
WasmType, WASM_PAGE_SIZE,
|
||||
};
|
||||
|
||||
#[cfg(feature = "pooling-allocator")]
|
||||
@@ -445,14 +443,8 @@ unsafe fn initialize_vmcontext(instance: &mut Instance, req: InstanceAllocationR
|
||||
let module = req.runtime_info.module();
|
||||
|
||||
// Initialize shared signatures
|
||||
let mut ptr = instance.vmctx_plus_offset(instance.offsets.vmctx_signature_ids_begin());
|
||||
for sig in module.types.values() {
|
||||
*ptr = match sig {
|
||||
ModuleType::Function(sig) => req.runtime_info.signature(*sig),
|
||||
_ => VMSharedSignatureIndex::new(u32::max_value()),
|
||||
};
|
||||
ptr = ptr.add(1);
|
||||
}
|
||||
let signatures = req.runtime_info.signature_ids();
|
||||
*instance.vmctx_plus_offset(instance.offsets.vmctx_signature_ids_array()) = signatures.as_ptr();
|
||||
|
||||
// Initialize the built-in functions
|
||||
*instance.vmctx_plus_offset(instance.offsets.vmctx_builtin_functions()) =
|
||||
@@ -460,25 +452,25 @@ unsafe fn initialize_vmcontext(instance: &mut Instance, req: InstanceAllocationR
|
||||
|
||||
// Initialize the imports
|
||||
debug_assert_eq!(req.imports.functions.len(), module.num_imported_funcs);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.functions.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_functions_begin()),
|
||||
req.imports.functions.len(),
|
||||
);
|
||||
debug_assert_eq!(req.imports.tables.len(), module.num_imported_tables);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.tables.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_tables_begin()),
|
||||
req.imports.tables.len(),
|
||||
);
|
||||
debug_assert_eq!(req.imports.memories.len(), module.num_imported_memories);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.memories.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_memories_begin()),
|
||||
req.imports.memories.len(),
|
||||
);
|
||||
debug_assert_eq!(req.imports.globals.len(), module.num_imported_globals);
|
||||
ptr::copy(
|
||||
ptr::copy_nonoverlapping(
|
||||
req.imports.globals.as_ptr(),
|
||||
instance.vmctx_plus_offset(instance.offsets.vmctx_imported_globals_begin()),
|
||||
req.imports.globals.len(),
|
||||
|
||||
@@ -299,7 +299,6 @@ impl InstancePool {
|
||||
// Calculate the maximum size of an Instance structure given the limits
|
||||
let offsets = VMOffsets::from(VMOffsetsFields {
|
||||
ptr: HostPtr,
|
||||
num_signature_ids: module_limits.types,
|
||||
num_imported_functions: module_limits.imported_functions,
|
||||
num_imported_tables: module_limits.imported_tables,
|
||||
num_imported_memories: module_limits.imported_memories,
|
||||
@@ -1453,6 +1452,9 @@ mod test {
|
||||
fn wasm_data(&self) -> &[u8] {
|
||||
&[]
|
||||
}
|
||||
fn signature_ids(&self) -> &[VMSharedSignatureIndex] {
|
||||
&[]
|
||||
}
|
||||
}
|
||||
|
||||
Arc::new(RuntimeInfo(module))
|
||||
|
||||
Reference in New Issue
Block a user