* Enable the SIMD proposal by default
This commit updates Wasmtime to enable the SIMD proposal for WebAssembly
by default. Support has been implemented for quite some time and
recently fuzzing has been running for multiple weeks without incident,
so it seems like it might be time to go ahead and enable this!
* Refactor CLI feature specification
Don't store a `bool` but rather an `Option<bool>` so we can inherit the
defaults from Wasmtime rather than having to keep the defaults in-sync.
* Add a compilation section to disable address maps
This commit adds a new `Config::generate_address_map` compilation
setting which is used to disable emission of the `.wasmtime.addrmap`
section of compiled artifacts. This section is currently around the size
of the entire `.text` section itself unfortunately and for size reasons
may wish to be omitted. Functionality-wise all that is lost is knowing
the precise wasm module offset address of a faulting instruction or in a
backtrace of instructions. This also means that if the module has DWARF
debugging information available with it Wasmtime isn't able to produce a
filename and line number in the backtrace.
This option remains enabled by default. This option may not be needed in
the future with #3547 perhaps, but in the meantime it seems reasonable
enough to support a configuration mode where the section is entirely
omitted if the smallest module possible is desired.
* Fix some CI issues
* Update tests/all/traps.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Do less work in compilation for address maps
But only when disabled
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Following up on WebAssembly/wasi-sdk#210, this makes the trap message
for `unreachable` traps more descriptive of what actually caused the
trap, so that it doesn't sound like maybe Wasmtime itself executed a
`unreachable!()` macro in Rust.
Before:
```
wasm trap: unreachable
wasm backtrace:
[...]
```
After:
```
wasm trap: wasm `unreachable` instruction executed
wasm backtrace:
[...]
```
This commit adds the `pooling-allocator` feature to both the `wasmtime` and
`wasmtime-runtime` crates.
The feature controls whether or not the pooling allocator implementation is
built into the runtime and exposed as a supported instance allocation strategy
in the wasmtime API.
The feature is on by default for the `wasmtime` crate.
Closes#3513.
Its a manually written impl, not a derive, because InstancePre<T>: Clone
does not require T: Clone.
The clone should be reasonably inexpensive: Clone for Module is just an
Arc, and Clone for Definition should also just be an Arc on the HostFunc
or Instance variants. An InstancePre shouldnt contain any
Definition::Extern variants because there is not yet a Store associated
with it- right?
* Add a configuration option to force "static" memories
In poking around at some things earlier today I realized that one
configuration option for memories we haven't exposed from embeddings
like the CLI is to forcibly limit the size of memory growth and force
using a static memory style. This means that the CLI, for example, can't
limit memory growth by default and memories are only limited in size by
what the OS can give and the wasm's own memory type. This configuration
option means that the CLI can artificially limit the size of wasm linear
memories.
Additionally another motivation for this is for testing out various
codegen ramifications of static/dynamic memories. This is the only way
to force a static memory, by default, for wasm64 memories with no
maximum size listed for example.
* Review feedback
* Adjust dependency directives between crates
This commit is a preparation for the release process for Wasmtime. The
specific changes here are to delineate which crates are "public", and
all version requirements on non-public crates will now be done with
`=A.B.C` version requirements instead of today's `A.B.C` version
requirements.
The purpose for doing this is to assist with patch releases that might
happen in the future. Patch releases of wasmtime are already required to
not break the APIs of "public" crates, but no such guarantee is given
about "internal" crates. This means that a patch release runs the risk,
for example, of breaking an internal API. In doing so though we would
also need to release a new major version of the internal crate, but we
wouldn't have a great hole in the number scheme of major versions to do
so. By using `=A.B.C` requirements for internal crates it means we can
safely ignore strict semver-compatibility between releases of internal
crates for patch releases, since the only consumers of the crate will be
the corresponding patch release of the `wasmtime` crate itself (or other
public crates).
The `publish.rs` script has been updated with a check to verify that
dependencies on internal crates are all specified with an `=`
dependency, and dependnecies on all public crates are without a `=`
dependency. This will hopefully make it so we don't have to worry about
what to use where, we just let CI tell us what to do. Using this
modification all version dependency declarations have been updated.
Note that some crates were adjusted to simply remove their `version`
requirement in cases such as the crate wasn't published anyway (`publish
= false` was specified) or it's in the `dev-dependencies` section which
doesn't need version specifiers for path dependencies.
* Switch to normal sever deps for cranelift dependencies
These crates will now all be considered "public" where in patch releases
they will be guaranteed to not have breaking changes.
When there is a linking error caused by an undefined instance, list all
the instances exports in the error message. This will clarify errors for
undefined two-level imports that get desugared to one-level instance
imports under the module-linking proposal.
* Update the spec reference testsuite submodule
This commit brings in recent updates to the spec test suite. Most of the
changes here were already fixed in `wasmparser` with some tweaks to
esoteric modules, but Wasmtime also gets a bug fix where where import
matching for the size of tables/memories is based on the current runtime
size of the table/memory rather than the original type of the
table/memory. This means that during type matching the actual value is
consulted for its size rather than using the minimum size listed in its
type.
* Fix now-missing directories in build script
We already validate wasm functions in parallel when compiling a module,
but the same parallelism wasn't available to the `Module::validate` API.
This commit peforms a minor tweak to the validate-the-whole-module API
to validate all functions in parallel in the same manner that module
compilation does.
This commit removes the Lightbeam backend from Wasmtime as per [RFC 14].
This backend hasn't received maintenance in quite some time, and as [RFC
14] indicates this doesn't meet the threshold for keeping the code
in-tree, so this commit removes it.
A fast "baseline" compiler may still be added in the future. The
addition of such a backend should be in line with [RFC 14], though, with
the principles we now have for stable releases of Wasmtime. I'll close
out Lightbeam-related issues once this is merged.
[RFC 14]: https://github.com/bytecodealliance/rfcs/pull/14
* Add `*_unchecked` variants of `Func` APIs for the C API
This commit is what is hopefully going to be my last installment within
the saga of optimizing function calls in/out of WebAssembly modules in
the C API. This is yet another alternative approach to #3345 (sorry) but
also contains everything necessary to make the C API fast. As in #3345
the general idea is just moving checks out of the call path in the same
style of `TypedFunc`.
This new strategy takes inspiration from previously learned attempts
effectively "just" exposes how we previously passed `*mut u128` through
trampolines for arguments/results. This storage format is formalized
through a new `ValRaw` union that is exposed from the `wasmtime` crate.
By doing this it made it relatively easy to expose two new APIs:
* `Func::new_unchecked`
* `Func::call_unchecked`
These are the same as their checked equivalents except that they're
`unsafe` and they work with `*mut ValRaw` rather than safe slices of
`Val`. Working with these eschews type checks and such and requires
callers/embedders to do the right thing.
These two new functions are then exposed via the C API with new
functions, enabling C to have a fast-path of calling/defining functions.
This fast path is akin to `Func::wrap` in Rust, although that API can't
be built in C due to C not having generics in the same way that Rust
has.
For some benchmarks, the benchmarks here are:
* `nop` - Call a wasm function from the host that does nothing and
returns nothing.
* `i64` - Call a wasm function from the host, the wasm function calls a
host function, and the host function returns an `i64` all the way out to
the original caller.
* `many` - Call a wasm function from the host, the wasm calls
host function with 5 `i32` parameters, and then an `i64` result is
returned back to the original host
* `i64` host - just the overhead of the wasm calling the host, so the
wasm calls the host function in a loop.
* `many` host - same as `i64` host, but calling the `many` host function.
All numbers in this table are in nanoseconds, and this is just one
measurement as well so there's bound to be some variation in the precise
numbers here.
| Name | Rust | C (before) | C (after) |
|-----------|------|------------|-----------|
| nop | 19 | 112 | 25 |
| i64 | 22 | 207 | 32 |
| many | 27 | 189 | 34 |
| i64 host | 2 | 38 | 5 |
| many host | 7 | 75 | 8 |
The main conclusion here is that the C API is significantly faster than
before when using the `*_unchecked` variants of APIs. The Rust
implementation is still the ceiling (or floor I guess?) for performance
The main reason that C is slower than Rust is that a little bit more has
to travel through memory where on the Rust side of things we can
monomorphize and inline a bit more to get rid of that. Overall though
the costs are way way down from where they were originally and I don't
plan on doing a whole lot more myself at this time. There's various
things we theoretically could do I've considered but implementation-wise
I think they'll be much more weighty.
* Tweak `wasmtime_externref_t` API comments
* Optimize `Func::call` and its C API
This commit is an alternative to #3298 which achieves effectively the
same goal of optimizing the `Func::call` API as well as its C API
sibling of `wasmtime_func_call`. The strategy taken here is different
than #3298 though where a new API isn't created, rather a small tweak to
an existing API is done. Specifically this commit handles the major
sources of slowness with `Func::call` with:
* Looking up the type of a function, to typecheck the arguments with and
use to guide how the results should be loaded, no longer hits the
rwlock in the `Engine` but instead each `Func` contains its own
`FuncType`. This can be an unnecessary allocation for funcs not used
with `Func::call`, so this is a downside of this implementation
relative to #3298. A mitigating factor, though, is that instance
exports are loaded lazily into the `Store` and in theory not too many
funcs are active in the store as `Func` objects.
* Temporary storage is amortized with a long-lived `Vec` in the `Store`
rather than allocating a new vector on each call. This is basically
the same strategy as #3294 only applied to different types in
different places. Specifically `wasmtime::Store` now retains a
`Vec<u128>` for `Func::call`, and the C API retains a `Vec<Val>` for
calling `Func::call`.
* Finally, an API breaking change is made to `Func::call` and its type
signature (as well as `Func::call_async`). Instead of returning
`Box<[Val]>` as it did before this function now takes a
`results: &mut [Val]` parameter. This allows the caller to manage the
allocation and we can amortize-remove it in `wasmtime_func_call` by
using space after the parameters in the `Vec<Val>` we're passing in.
This change is naturally a breaking change and we'll want to consider
it carefully, but mitigating factors are that most embeddings are
likely using `TypedFunc::call` instead and this signature taking a
mutable slice better aligns with `Func::new` which receives a mutable
slice for the results.
Overall this change, in the benchmark of "call a nop function from the C
API" is not quite as good as #3298. It's still a bit slower, on the
order of 15ns, because there's lots of capacity checks around vectors
and the type checks are slightly less optimized than before. Overall
though this is still significantly better than today because allocations
and the rwlock to acquire the type information are both avoided. I
personally feel that this change is the best to do because it has less
of an API impact than #3298.
* Rebase issues
* Load generated trampolines into jitdump when profiling
This commit updates the jitdump profiler to generate JIT profiling
records for generated trampolines in a wasm module in addition to the
functions already in a module. It's also updated to learn about
trampolines generated via `Func::new` and friends. These trampolines
were all not previously registered meaning that stack traces with these
pc values would be confusing to see in the profile output. While the
names aren't the best it should at least be more clear than before if a
function is hot!
* Fix more builds
* allow the ResourceLimiter to reject a memory grow before the
memory's own maximum.
* add a hook so a ResourceLimiter can detect any reason that
a memory grow fails, including if the OS denies additional memory
* add tests for this new functionality. I only took the time to
test the OS denial on Linux, it should be possible on Mac OS
as well but I don't have a test setup. I have no idea how to
do this on windows.