Files
wasmtime/crates/c-api/include
Alex Crichton bfdbd10a13 Add *_unchecked variants of Func APIs for the C API (#3350)
* Add `*_unchecked` variants of `Func` APIs for the C API

This commit is what is hopefully going to be my last installment within
the saga of optimizing function calls in/out of WebAssembly modules in
the C API. This is yet another alternative approach to #3345 (sorry) but
also contains everything necessary to make the C API fast. As in #3345
the general idea is just moving checks out of the call path in the same
style of `TypedFunc`.

This new strategy takes inspiration from previously learned attempts
effectively "just" exposes how we previously passed `*mut u128` through
trampolines for arguments/results. This storage format is formalized
through a new `ValRaw` union that is exposed from the `wasmtime` crate.
By doing this it made it relatively easy to expose two new APIs:

* `Func::new_unchecked`
* `Func::call_unchecked`

These are the same as their checked equivalents except that they're
`unsafe` and they work with `*mut ValRaw` rather than safe slices of
`Val`. Working with these eschews type checks and such and requires
callers/embedders to do the right thing.

These two new functions are then exposed via the C API with new
functions, enabling C to have a fast-path of calling/defining functions.
This fast path is akin to `Func::wrap` in Rust, although that API can't
be built in C due to C not having generics in the same way that Rust
has.

For some benchmarks, the benchmarks here are:

* `nop` - Call a wasm function from the host that does nothing and
  returns nothing.
* `i64` - Call a wasm function from the host, the wasm function calls a
  host function, and the host function returns an `i64` all the way out to
  the original caller.
* `many` - Call a wasm function from the host, the wasm calls
   host function with 5 `i32` parameters, and then an `i64` result is
   returned back to the original host
* `i64` host - just the overhead of the wasm calling the host, so the
  wasm calls the host function in a loop.
* `many` host - same as `i64` host, but calling the `many` host function.

All numbers in this table are in nanoseconds, and this is just one
measurement as well so there's bound to be some variation in the precise
numbers here.

| Name      | Rust | C (before) | C (after) |
|-----------|------|------------|-----------|
| nop       | 19   | 112        | 25        |
| i64       | 22   | 207        | 32        |
| many      | 27   | 189        | 34        |
| i64 host  | 2    | 38         | 5         |
| many host | 7    | 75         | 8         |

The main conclusion here is that the C API is significantly faster than
before when using the `*_unchecked` variants of APIs. The Rust
implementation is still the ceiling (or floor I guess?) for performance
The main reason that C is slower than Rust is that a little bit more has
to travel through memory where on the Rust side of things we can
monomorphize and inline a bit more to get rid of that. Overall though
the costs are way way down from where they were originally and I don't
plan on doing a whole lot more myself at this time. There's various
things we theoretically could do I've considered but implementation-wise
I think they'll be much more weighty.

* Tweak `wasmtime_externref_t` API comments
2021-09-24 14:05:45 -05:00
..