This commit implements a few optimizations, mainly inlining, that should improve the performance of calling a WebAssembly function. This code path can be quite hot depending on the embedding case and we hadn't really put much effort into optimizing the nitty gritty. The predominant optimization here is adding `#[inline]` to trivial functions so performance is improved without having to compile with LTO. Another optimization is to call `lazy_per_thread_init` when traps are initialized per-thread (when a `Store` is created) rather than each time a function is called. The next optimization is to change the unwind reason in the `CallThreadState` to `MaybeUninit` to avoid extra checks in the default case about whether we need to drop its variants (since in the happy path we never need to drop it). The final optimization is to optimize out a few checks when `async` support is disabled for a small speed boost. In a small benchmark where wasmtime calls a simple wasm function my macOS computer dropped from 110ns to 86ns overhead, a 20% decrease. The macOS overhead is still largely dominated by the global lock acquisition and hash table management for traps right now, but I suspect the Linux overhead is much better (should be on the order of ~30 or so ns). We still have a long way to go to compete with SpiderMonkey which, in testing, seem to have ~6ns overhead in calling the same wasm function on my computer.
This is the wasmtime-runtime crate, which contains wasm runtime library
support, supporting the wasm ABI used by wasmtime-environ,
wasmtime-jit, and wasmtime-obj.
This crate does not make a host vs. target distinction; it is meant to be compiled for the target.
Most users will want to use the main wasmtime crate instead of using this
crate directly.