This commit removes the `differential_spec` fuzz target for now,
although this removal is intended to be temporary. We have #3251 to
track re-enabling the spec interpreter in a way that it won't time out,
and additionally the spec interpreter is also failing to build with
ocaml on oss-fuzz so that will also need to be investigated when
re-enabling.
We already validate wasm functions in parallel when compiling a module,
but the same parallelism wasn't available to the `Module::validate` API.
This commit peforms a minor tweak to the validate-the-whole-module API
to validate all functions in parallel in the same manner that module
compilation does.
This commit removes the Lightbeam backend from Wasmtime as per [RFC 14].
This backend hasn't received maintenance in quite some time, and as [RFC
14] indicates this doesn't meet the threshold for keeping the code
in-tree, so this commit removes it.
A fast "baseline" compiler may still be added in the future. The
addition of such a backend should be in line with [RFC 14], though, with
the principles we now have for stable releases of Wasmtime. I'll close
out Lightbeam-related issues once this is merged.
[RFC 14]: https://github.com/bytecodealliance/rfcs/pull/14
* Add `*_unchecked` variants of `Func` APIs for the C API
This commit is what is hopefully going to be my last installment within
the saga of optimizing function calls in/out of WebAssembly modules in
the C API. This is yet another alternative approach to #3345 (sorry) but
also contains everything necessary to make the C API fast. As in #3345
the general idea is just moving checks out of the call path in the same
style of `TypedFunc`.
This new strategy takes inspiration from previously learned attempts
effectively "just" exposes how we previously passed `*mut u128` through
trampolines for arguments/results. This storage format is formalized
through a new `ValRaw` union that is exposed from the `wasmtime` crate.
By doing this it made it relatively easy to expose two new APIs:
* `Func::new_unchecked`
* `Func::call_unchecked`
These are the same as their checked equivalents except that they're
`unsafe` and they work with `*mut ValRaw` rather than safe slices of
`Val`. Working with these eschews type checks and such and requires
callers/embedders to do the right thing.
These two new functions are then exposed via the C API with new
functions, enabling C to have a fast-path of calling/defining functions.
This fast path is akin to `Func::wrap` in Rust, although that API can't
be built in C due to C not having generics in the same way that Rust
has.
For some benchmarks, the benchmarks here are:
* `nop` - Call a wasm function from the host that does nothing and
returns nothing.
* `i64` - Call a wasm function from the host, the wasm function calls a
host function, and the host function returns an `i64` all the way out to
the original caller.
* `many` - Call a wasm function from the host, the wasm calls
host function with 5 `i32` parameters, and then an `i64` result is
returned back to the original host
* `i64` host - just the overhead of the wasm calling the host, so the
wasm calls the host function in a loop.
* `many` host - same as `i64` host, but calling the `many` host function.
All numbers in this table are in nanoseconds, and this is just one
measurement as well so there's bound to be some variation in the precise
numbers here.
| Name | Rust | C (before) | C (after) |
|-----------|------|------------|-----------|
| nop | 19 | 112 | 25 |
| i64 | 22 | 207 | 32 |
| many | 27 | 189 | 34 |
| i64 host | 2 | 38 | 5 |
| many host | 7 | 75 | 8 |
The main conclusion here is that the C API is significantly faster than
before when using the `*_unchecked` variants of APIs. The Rust
implementation is still the ceiling (or floor I guess?) for performance
The main reason that C is slower than Rust is that a little bit more has
to travel through memory where on the Rust side of things we can
monomorphize and inline a bit more to get rid of that. Overall though
the costs are way way down from where they were originally and I don't
plan on doing a whole lot more myself at this time. There's various
things we theoretically could do I've considered but implementation-wise
I think they'll be much more weighty.
* Tweak `wasmtime_externref_t` API comments
If Wasmtime thinks a module stack-overflows and v8 says that it does
something else that's ok. This means that the limits on v8 and Wasmtime
are different which is expected and not something we want fuzz-bugs
about.
The LICENSE file is missing in wasmtime-types crate.
As per the Apache 2.0 license, the license file itself
should be redistributed with the source code.
* Optimize `Func::call` and its C API
This commit is an alternative to #3298 which achieves effectively the
same goal of optimizing the `Func::call` API as well as its C API
sibling of `wasmtime_func_call`. The strategy taken here is different
than #3298 though where a new API isn't created, rather a small tweak to
an existing API is done. Specifically this commit handles the major
sources of slowness with `Func::call` with:
* Looking up the type of a function, to typecheck the arguments with and
use to guide how the results should be loaded, no longer hits the
rwlock in the `Engine` but instead each `Func` contains its own
`FuncType`. This can be an unnecessary allocation for funcs not used
with `Func::call`, so this is a downside of this implementation
relative to #3298. A mitigating factor, though, is that instance
exports are loaded lazily into the `Store` and in theory not too many
funcs are active in the store as `Func` objects.
* Temporary storage is amortized with a long-lived `Vec` in the `Store`
rather than allocating a new vector on each call. This is basically
the same strategy as #3294 only applied to different types in
different places. Specifically `wasmtime::Store` now retains a
`Vec<u128>` for `Func::call`, and the C API retains a `Vec<Val>` for
calling `Func::call`.
* Finally, an API breaking change is made to `Func::call` and its type
signature (as well as `Func::call_async`). Instead of returning
`Box<[Val]>` as it did before this function now takes a
`results: &mut [Val]` parameter. This allows the caller to manage the
allocation and we can amortize-remove it in `wasmtime_func_call` by
using space after the parameters in the `Vec<Val>` we're passing in.
This change is naturally a breaking change and we'll want to consider
it carefully, but mitigating factors are that most embeddings are
likely using `TypedFunc::call` instead and this signature taking a
mutable slice better aligns with `Func::new` which receives a mutable
slice for the results.
Overall this change, in the benchmark of "call a nop function from the C
API" is not quite as good as #3298. It's still a bit slower, on the
order of 15ns, because there's lots of capacity checks around vectors
and the type checks are slightly less optimized than before. Overall
though this is still significantly better than today because allocations
and the rwlock to acquire the type information are both avoided. I
personally feel that this change is the best to do because it has less
of an API impact than #3298.
* Rebase issues
* Load generated trampolines into jitdump when profiling
This commit updates the jitdump profiler to generate JIT profiling
records for generated trampolines in a wasm module in addition to the
functions already in a module. It's also updated to learn about
trampolines generated via `Func::new` and friends. These trampolines
were all not previously registered meaning that stack traces with these
pc values would be confusing to see in the profile output. While the
names aren't the best it should at least be more clear than before if a
function is hot!
* Fix more builds
We've got a large crop of fuzz-bugs from fuzzing with enabled-with-SIMD
on oss-fuzz but at this point the fuzz stats from oss-fuzz say that the
fuzzers like v8 are spending less than 50% of its time actually fuzzing
and presumably mostly hitting crashes and such. While we fix the other
issues this disables simd for fuzzing with v8 so we can try to see if we
can weed out other issues.
* Exports symbols to be shared with external GDB/JIT debugging interface tools.
Windows O/S specific requirement.
* Moved comments into platform specific compiler directive sections.
* Use `__builtin_setjmp` instead of `sigsetjmp`.
Use [`__builtin_setjmp`] instead of `sigsetjmp`, as it is implemented in
the compiler, performed inline, and saves much less state. This speeds up
calls into wasm by about 8% on my machine.
[`__builtin_setjmp`]: https://gcc.gnu.org/onlinedocs/gcc/Nonlocal-Gotos.html
* Add a comment confirming that 5 really is the documented size.
* Add a comment about callee-saved state and __builtin_setjmp.
* On clang on aarch64, use sigsetjmp.
* Fix a stray `#endif`.
- Add relocation handling needed after PR #3275
- Fix incorrect handling of signed constants detected by PR #3056 test
- Fix LabelUse max pos/neg ranges; fix overflow in buffers.rs
- Disable fuzzing tests that require pre-built v8 binaries
- Disable cranelift test that depends on i128
- Temporarily disable memory64 tests
* Use rsix to make system calls in Wasmtime.
`rsix` is a system call wrapper crate that we use in `wasi-common`,
which can provide the following advantages in the rest of Wasmtime:
- It eliminates some `unsafe` blocks in Wasmtime's code. There's
still an `unsafe` block in the library, but this way, the `unsafe`
is factored out and clearly scoped.
- And, it makes error handling more consistent, factoring out code for
checking return values and `io::Error::last_os_error()`, and code that
does `errno::set_errno(0)`.
This doesn't cover *all* system calls; `rsix` doesn't implement
signal-handling APIs, and this doesn't cover calls made through `std` or
crates like `userfaultfd`, `rand`, and `region`.
* allow the ResourceLimiter to reject a memory grow before the
memory's own maximum.
* add a hook so a ResourceLimiter can detect any reason that
a memory grow fails, including if the OS denies additional memory
* add tests for this new functionality. I only took the time to
test the OS denial on Linux, it should be possible on Mac OS
as well but I don't have a test setup. I have no idea how to
do this on windows.
This can be useful for host functions that want to consume fuel to
reflect their relative cost. Additionally it's a relatively easy
addition to have and someone's asking for it!
Closes#3315
We _must not_ trigger a GC when moving refs from host code into
Wasm (e.g. returned from a host function or passed as arguments to a Wasm
function). After insertion into the table, this reference is no longer
rooted. If multiple references are being sent from the host into Wasm and we
allowed GCs during insertion, then the following events could happen:
* Reference A is inserted into the activations table. This does not trigger a
GC, but does fill the table to capacity.
* The caller's reference to A is removed. Now the only reference to A is from
the activations table.
* Reference B is inserted into the activations table. Because the table is at
capacity, a GC is triggered.
* A is reclaimed because the only reference keeping it alive was the activation
table's reference (it isn't inside any Wasm frames on the stack yet, so stack
scanning and stack maps don't increment its reference count).
* We transfer control to Wasm, giving it A and B. Wasm uses A. That's a use
after free.
To prevent uses after free, we cannot GC when moving refs into the
`VMExternRefActivationsTable` because we are passing them from the host to Wasm.
On the other hand, when we are *cloning* -- as opposed to moving -- refs from
the host to Wasm, then it is fine to GC while inserting into the activations
table, because the original referent that we are cloning from is still alive and
rooting the ref.
I'm not sure why when run repeatedly v8 has different limits on
call-stack-size but it's not particularly interesting to assert exact
matches here, so this should fix a fuzz-bug-failure found on oss-fuzz.
Modify the `poll_oneoff_files` test to avoid assuming that `poll_oneoff`
returns all pending events, as it may sometimes return only a subset of
events. When multiple events are expected, use a loop, and loop until
all events have been recorded.
The new backends will not emit a stack map for a safepoint if there are zero
live references. Our fuzzy search for stack maps, which was necessary for the
old backend, caused us to use the wrong stack map for some PCs which would in
turn cause us to treat arbitrary stack slots as reference types pointers.
When debug info was enabled, we would put the debug info sections in between the
text section and the unwind info section. But the unwind info is encoded in a
position-independent manner (so that we don't need relocs for it) that relies on
it directly following the text section. The result of the misplacement was some
crashes inside the unwinder.
We were previously using `_wasmtime_eh_frame` but there is no good reason to
add the prefix Wasmtime-specific prefix. Using the standard name allows for
better inspection with standard tools like `dwarfdump`.
This commit adds an assertion which was previously forgotten when
inserting a `HostFunc` into a `Store`. This can happen when a `Linker`
is defined with one engine but it's used to interoperate with a store
defined within a different engine.
A function contains type information that's only valid relative to the
engine that it was defined within. This means that if a function is used
within a different engine then type information may look valid when in
fact it is not. For example it's otherwise possible to insert a function
into an engine with one type and call it in a different engine with a
different type.
Similar to how `Store` misuse is a panic throughout `wasmtime`'s API
this commit also turns this behavior into panic, so there's no API
impact. Documentation has been updated accordingly to indicate that
various functions on `Linker` will panic if a `store` is provided that's
connected to a different `Engine`.
This commit improves the runtime support for wasm-to-host invocations
for functions created with `Func::new` or `wasmtime_func_new` in the C
API. Previously a `Vec` (sometimes a `SmallVec`) would be dynamically
allocated on each host call to store the arguments that are coming from
wasm and going to the host. In the case of the `wasmtime` crate we need
to decode the `u128`-stored values, and in the case of the C API we need
to decode the `Val` into the C API's `wasmtime_val_t`.
The technique used in this commit is to store a singular `Vec<T>` inside
the "store", be it the literal `Store<T>` or within the `T` in the case
of the C API, which can be reused across wasm->host calls. This means
that we're unlikely to actually perform dynamic memory allocation and
instead we should hit a faster path where the `Vec` always has enough
capacity.
Note that this is just a mild improvement for `Func::new`-based
functions. It's still the case that `Func::wrap` is much faster, but
unfortunately the C API doesn't have access to `Func::wrap`, so the main
motivation here is accelerating the C API.
* Refactor the internals of `Store<T>`
This commit is an overdue refactoring and renaming of some internals of
the `Store` type in Wasmtime. The actual implementation of `Store<T>`
has evolved from the original implementation to the point where some of
the aspects of how things are structured no longer makes sense. There's
also always been a lot of unnecessary gymnastics when trying to get
access to various store pieces depending on where you are in `wasmtime`.
This refactoring aims to simplify all this and make the internals much
easier to read/write. The following changes were made:
* The `StoreOpaque<'_>` type is deleted, along with the `opaque()`
method.
* The `StoreInnermost` type was renamed to `StoreOpaque`.
`StoreOpaque<'_>` is dead. Long live `StoreOpaque`. This renaming
and a few small tweaks means that this type now suffices for all
consumers.
* The `AsContextMut` and `AsContext` traits are now implemented for
`StoreInner<T>`.
These changes, while subtly small, help clean up a lot of the internals
of `wasmtime`. There's a lot less verbose `&mut
store.as_context_mut().opaque()` now. Additionally many methods can
simply start with `let store = store.as_context_mut().0;` and use things
internally. One of the nicer aspects of using references directly is
that the compiler automatically reborrows references as necessary
meaning there's lots of less manual reborrowing.
The main motivation for this change was actually somewhat roundabout
where I found that when `StoreOpaque<'_>` was being captured in closures
and iterators it's 3 pointers wide which is a lot of data to move
around. Now things capture over `&mut StoreOpaque` which is just one
nice and small pointer to move around. In any case though I've long
wanted to revisit the design of these internals to improve the
ergonomics. It's not expected that this change alone will really have
all that much impact on the performance of `wasmtime`.
Finally a doc comment was added to `store.rs` to try to explain all the
`Store`-related types since there are a nontrivial amount.
* Rustfmt
This commit optimizes the runtime execution of `Func::new` by removing
an indirect function call that happens whenever a host function is
called. This indirection was generally done to prevent monomoprhizing a
lot into consumer code but the few extra functions this makes
monomorphic are fairly small, and in general wasm->host call performance
is pretty important.
While not a massive win this is expected to improve codegen, especially
because with the indirect call removed the compiler should now be able
to prove more often when a `Func::new` closure doesn't panic or return
an error.
Wasmtime has a few representations of `Val` and `ValType` across the
internal crates, the `wasmtime` crate, and the C API. These were
previously sometimes mentioned in different orders which means that
converting between the two took a little extra code than before. This
commit is a micro-optimization to align the types across the various
places we define these to help reduce the codegen burden when converting
between these types.
This is not expected to have a major impact on performance, rather it's
a small cleanup which should be easy-ish to preserve I've noticed while
staring at assembly.