Add *_unchecked variants of Func APIs for the C API (#3350)
* Add `*_unchecked` variants of `Func` APIs for the C API This commit is what is hopefully going to be my last installment within the saga of optimizing function calls in/out of WebAssembly modules in the C API. This is yet another alternative approach to #3345 (sorry) but also contains everything necessary to make the C API fast. As in #3345 the general idea is just moving checks out of the call path in the same style of `TypedFunc`. This new strategy takes inspiration from previously learned attempts effectively "just" exposes how we previously passed `*mut u128` through trampolines for arguments/results. This storage format is formalized through a new `ValRaw` union that is exposed from the `wasmtime` crate. By doing this it made it relatively easy to expose two new APIs: * `Func::new_unchecked` * `Func::call_unchecked` These are the same as their checked equivalents except that they're `unsafe` and they work with `*mut ValRaw` rather than safe slices of `Val`. Working with these eschews type checks and such and requires callers/embedders to do the right thing. These two new functions are then exposed via the C API with new functions, enabling C to have a fast-path of calling/defining functions. This fast path is akin to `Func::wrap` in Rust, although that API can't be built in C due to C not having generics in the same way that Rust has. For some benchmarks, the benchmarks here are: * `nop` - Call a wasm function from the host that does nothing and returns nothing. * `i64` - Call a wasm function from the host, the wasm function calls a host function, and the host function returns an `i64` all the way out to the original caller. * `many` - Call a wasm function from the host, the wasm calls host function with 5 `i32` parameters, and then an `i64` result is returned back to the original host * `i64` host - just the overhead of the wasm calling the host, so the wasm calls the host function in a loop. * `many` host - same as `i64` host, but calling the `many` host function. All numbers in this table are in nanoseconds, and this is just one measurement as well so there's bound to be some variation in the precise numbers here. | Name | Rust | C (before) | C (after) | |-----------|------|------------|-----------| | nop | 19 | 112 | 25 | | i64 | 22 | 207 | 32 | | many | 27 | 189 | 34 | | i64 host | 2 | 38 | 5 | | many host | 7 | 75 | 8 | The main conclusion here is that the C API is significantly faster than before when using the `*_unchecked` variants of APIs. The Rust implementation is still the ceiling (or floor I guess?) for performance The main reason that C is slower than Rust is that a little bit more has to travel through memory where on the Rust side of things we can monomorphize and inline a bit more to get rid of that. Overall though the costs are way way down from where they were originally and I don't plan on doing a whole lot more myself at this time. There's various things we theoretically could do I've considered but implementation-wise I think they'll be much more weighty. * Tweak `wasmtime_externref_t` API comments
This commit is contained in:
@@ -344,6 +344,7 @@ impl VMExternRef {
|
||||
/// Nor does this method increment the reference count. You must ensure
|
||||
/// that `self` (or some other clone of `self`) stays alive until
|
||||
/// `clone_from_raw` is called.
|
||||
#[inline]
|
||||
pub fn as_raw(&self) -> *mut u8 {
|
||||
let ptr = self.0.cast::<u8>().as_ptr();
|
||||
ptr
|
||||
|
||||
@@ -56,7 +56,7 @@ pub use crate::traphandlers::{
|
||||
pub use crate::vmcontext::{
|
||||
VMCallerCheckedAnyfunc, VMContext, VMFunctionBody, VMFunctionImport, VMGlobalDefinition,
|
||||
VMGlobalImport, VMInterrupts, VMInvokeArgument, VMMemoryDefinition, VMMemoryImport,
|
||||
VMSharedSignatureIndex, VMTableDefinition, VMTableImport, VMTrampoline,
|
||||
VMSharedSignatureIndex, VMTableDefinition, VMTableImport, VMTrampoline, ValRaw,
|
||||
};
|
||||
|
||||
/// Version number of this crate.
|
||||
|
||||
@@ -790,10 +790,28 @@ impl VMContext {
|
||||
}
|
||||
}
|
||||
|
||||
/// A "raw" and unsafe representation of a WebAssembly value.
|
||||
///
|
||||
/// This is provided for use with the `Func::new_unchecked` and
|
||||
/// `Func::call_unchecked` APIs. In general it's unlikely you should be using
|
||||
/// this from Rust, rather using APIs like `Func::wrap` and `TypedFunc::call`.
|
||||
#[allow(missing_docs)]
|
||||
#[repr(C)]
|
||||
#[derive(Copy, Clone)]
|
||||
pub union ValRaw {
|
||||
pub i32: i32,
|
||||
pub i64: i64,
|
||||
pub f32: u32,
|
||||
pub f64: u64,
|
||||
pub v128: u128,
|
||||
pub funcref: usize,
|
||||
pub externref: usize,
|
||||
}
|
||||
|
||||
/// Trampoline function pointer type.
|
||||
pub type VMTrampoline = unsafe extern "C" fn(
|
||||
*mut VMContext, // callee vmctx
|
||||
*mut VMContext, // caller vmctx
|
||||
*const VMFunctionBody, // function we're actually calling
|
||||
*mut u128, // space for arguments and return values
|
||||
*mut ValRaw, // space for arguments and return values
|
||||
);
|
||||
|
||||
Reference in New Issue
Block a user