Add *_unchecked variants of Func APIs for the C API (#3350)

* Add `*_unchecked` variants of `Func` APIs for the C API

This commit is what is hopefully going to be my last installment within
the saga of optimizing function calls in/out of WebAssembly modules in
the C API. This is yet another alternative approach to #3345 (sorry) but
also contains everything necessary to make the C API fast. As in #3345
the general idea is just moving checks out of the call path in the same
style of `TypedFunc`.

This new strategy takes inspiration from previously learned attempts
effectively "just" exposes how we previously passed `*mut u128` through
trampolines for arguments/results. This storage format is formalized
through a new `ValRaw` union that is exposed from the `wasmtime` crate.
By doing this it made it relatively easy to expose two new APIs:

* `Func::new_unchecked`
* `Func::call_unchecked`

These are the same as their checked equivalents except that they're
`unsafe` and they work with `*mut ValRaw` rather than safe slices of
`Val`. Working with these eschews type checks and such and requires
callers/embedders to do the right thing.

These two new functions are then exposed via the C API with new
functions, enabling C to have a fast-path of calling/defining functions.
This fast path is akin to `Func::wrap` in Rust, although that API can't
be built in C due to C not having generics in the same way that Rust
has.

For some benchmarks, the benchmarks here are:

* `nop` - Call a wasm function from the host that does nothing and
  returns nothing.
* `i64` - Call a wasm function from the host, the wasm function calls a
  host function, and the host function returns an `i64` all the way out to
  the original caller.
* `many` - Call a wasm function from the host, the wasm calls
   host function with 5 `i32` parameters, and then an `i64` result is
   returned back to the original host
* `i64` host - just the overhead of the wasm calling the host, so the
  wasm calls the host function in a loop.
* `many` host - same as `i64` host, but calling the `many` host function.

All numbers in this table are in nanoseconds, and this is just one
measurement as well so there's bound to be some variation in the precise
numbers here.

| Name      | Rust | C (before) | C (after) |
|-----------|------|------------|-----------|
| nop       | 19   | 112        | 25        |
| i64       | 22   | 207        | 32        |
| many      | 27   | 189        | 34        |
| i64 host  | 2    | 38         | 5         |
| many host | 7    | 75         | 8         |

The main conclusion here is that the C API is significantly faster than
before when using the `*_unchecked` variants of APIs. The Rust
implementation is still the ceiling (or floor I guess?) for performance
The main reason that C is slower than Rust is that a little bit more has
to travel through memory where on the Rust side of things we can
monomorphize and inline a bit more to get rid of that. Overall though
the costs are way way down from where they were originally and I don't
plan on doing a whole lot more myself at this time. There's various
things we theoretically could do I've considered but implementation-wise
I think they'll be much more weighty.

* Tweak `wasmtime_externref_t` API comments
This commit is contained in:
Alex Crichton
2021-09-24 14:05:45 -05:00
committed by GitHub
parent 344a219245
commit bfdbd10a13
16 changed files with 659 additions and 217 deletions

View File

@@ -344,6 +344,7 @@ impl VMExternRef {
/// Nor does this method increment the reference count. You must ensure
/// that `self` (or some other clone of `self`) stays alive until
/// `clone_from_raw` is called.
#[inline]
pub fn as_raw(&self) -> *mut u8 {
let ptr = self.0.cast::<u8>().as_ptr();
ptr

View File

@@ -56,7 +56,7 @@ pub use crate::traphandlers::{
pub use crate::vmcontext::{
VMCallerCheckedAnyfunc, VMContext, VMFunctionBody, VMFunctionImport, VMGlobalDefinition,
VMGlobalImport, VMInterrupts, VMInvokeArgument, VMMemoryDefinition, VMMemoryImport,
VMSharedSignatureIndex, VMTableDefinition, VMTableImport, VMTrampoline,
VMSharedSignatureIndex, VMTableDefinition, VMTableImport, VMTrampoline, ValRaw,
};
/// Version number of this crate.

View File

@@ -790,10 +790,28 @@ impl VMContext {
}
}
/// A "raw" and unsafe representation of a WebAssembly value.
///
/// This is provided for use with the `Func::new_unchecked` and
/// `Func::call_unchecked` APIs. In general it's unlikely you should be using
/// this from Rust, rather using APIs like `Func::wrap` and `TypedFunc::call`.
#[allow(missing_docs)]
#[repr(C)]
#[derive(Copy, Clone)]
pub union ValRaw {
pub i32: i32,
pub i64: i64,
pub f32: u32,
pub f64: u64,
pub v128: u128,
pub funcref: usize,
pub externref: usize,
}
/// Trampoline function pointer type.
pub type VMTrampoline = unsafe extern "C" fn(
*mut VMContext, // callee vmctx
*mut VMContext, // caller vmctx
*const VMFunctionBody, // function we're actually calling
*mut u128, // space for arguments and return values
*mut ValRaw, // space for arguments and return values
);