* Use stable Rust on CI to test the x64 backend
This commit leverages the newly-released 1.51.0 compiler to test the
new backend on Windows and Linux with a stable compiler instead of a
nightly compiler. This isolates the nightly build to just the nightly
documentation generation and fuzzing, both of which rely on nightly for
the best results right now.
* Use updated stable in book build job
* Run rustfmt for new stable
* Silence new warnings for wasi-nn
* Allow some dead code in the x64 backend
Looks like new rustc is better about emitting some dead-code warnings
* Update rust in peepmatic job
* Fix a test in the pooling allocator
* Remove `package.metdata.docs.rs` temporarily
Needs resolution of https://github.com/rust-lang/cargo/pull/9300 first
* Fix a warning in a wasi-nn example
* Combine stack-based cleanups for faster wasm calls
This commit is an extension of #2757 where the goal is to optimize entry
into WebAssembly. Currently wasmtime has two stack-based cleanups when
entering wasm, one for the externref activation table and another for
stack limits getting reset. This commit fuses these two cleanups
together into one and moves some code around which enables less captures
for fewer closures and such to speed up calls in to wasm a bit more.
Overall this drops the execution time from 88ns to 80ns locally for me.
This also updates the atomic orderings when updating the stack limit
from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting
point the usage here should be safe to use `Relaxed` since we're not
using the atomics to actually protect any memory, it's simply receiving
signals from other threads.
* Determine whether a pc is wasm via a global map
The macOS implementation of traps recently changed to using mach ports
for handlers instead of signal handlers. This means that a previously
relied upon invariant, each thread fixes its own trap, was broken. The
macOS implementation worked around this by maintaining a global map from
thread id to thread local information, however, to solve the problem.
This global map is quite slow though. It involves taking a lock and
updating a hash map on all calls into WebAssembly. In my local testing
this accounts for >70% of the overhead of calling into WebAssembly on
macOS. Naturally it'd be great to remove this!
This commit fixes this issue and removes the global lock/map that is
updated on all calls into WebAssembly. The fix is to maintain a global
map of wasm modules and their trap addresses in the `wasmtime` crate.
Doing so is relatively simple since we're already tracking this
information at the `Store` level.
Once we've got a global map then the macOS implementation can use this
from a foreign thread and everything works out.
Locally this brings the overhead, on macOS specifically, of calling into
wasm from 80ns to ~20ns.
* Fix compiles
* Review comments
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).
Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
This commit splits out a `FiberStack` from `Fiber`, allowing the instance
allocator trait to return `FiberStack` rather than raw stack pointers. This
keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand
instance allocator can make use of it.
The instance allocators no longer have to return a "not supported" error to
indicate that the store should allocate its own fiber stack.
This includes a bunch of cleanup in the instance allocator to scope stacks to
the new "async" feature in the runtime.
Closes#2708.
This commit implements the `uffd` feature which turns on support for utilizing
the `userfaultfd` system call on Linux for the pooling instance allocator.
By handling page faults in userland, we are able to detect guard page accesses
without having to constantly change memory page protections.
This should help reduce the number of syscalls as well as kernel lock
contentions when many threads are allocating and deallocating instances.
Additionally, the user fault handler can lazy initialize linear
memories of an instance (implementation to come).
* Update wasm-tools crates
* Update Wasm SIMD spec tests
* Invert 'experimental_x64_should_panic' logic
By doing this, it is easier to see which spec tests currently panic. The new tests correspond to recently-added instructions.
* Fix: ignore new spec tests for all backends
* Implement support for `async` functions in Wasmtime
This is an implementation of [RFC 2] in Wasmtime which is to support
`async`-defined host functions. At a high level support is added by
executing WebAssembly code that might invoke an asynchronous host
function on a separate native stack. When the host function's future is
not ready we switch back to the main native stack to continue execution.
There's a whole bunch of details in this commit, and it's a bit much to
go over them all here in this commit message. The most important changes
here are:
* A new `wasmtime-fiber` crate has been written to manage the low-level
details of stack-switching. Unixes use `mmap` to allocate a stack and
Windows uses the native fibers implementation. We'll surely want to
refactor this to move stack allocation elsewhere in the future. Fibers
are intended to be relatively general with a lot of type paremters to
fling values back and forth across suspension points. The whole crate
is a giant wad of `unsafe` unfortunately and involves handwritten
assembly with custom dwarf CFI directives to boot. Definitely deserves
a close eye in review!
* The `Store` type has two new methods -- `block_on` and `on_fiber`
which bridge between the async and non-async worlds. Lots of unsafe
fiddly bits here as we're trying to communicate context pointers
between disparate portions of the code. Extra eyes and care in review
is greatly appreciated.
* The APIs for binding `async` functions are unfortunately pretty ugly
in `Func`. This is mostly due to language limitations and compiler
bugs (I believe) in Rust. Instead of `Func::wrap` we have a
`Func::wrapN_async` family of methods, and we've also got a whole
bunch of `Func::getN_async` methods now too. It may be worth
rethinking the API of `Func` to try to make the documentation page
actually grok'able.
This isn't super heavily tested but the various test should suffice for
engaging hopefully nearly all the infrastructure in one form or another.
This is just the start though!
[RFC 2]: https://github.com/bytecodealliance/rfcs/pull/2
* Add wasmtime-fiber to publish script
* Save vector/float registers on ARM too.
* Fix a typo
* Update lock file
* Implement periodically yielding with fuel consumption
This commit implements APIs on `Store` to periodically yield execution
of futures through the consumption of fuel. When fuel runs out a
future's execution is yielded back to the caller, and then upon
resumption fuel is re-injected. The goal of this is to allow cooperative
multi-tasking with futures.
* Fix compile without async
* Save/restore the frame pointer in fiber switching
Turns out this is another caller-saved register!
* Simplify x86_64 fiber asm
Take a leaf out of aarch64's playbook and don't have extra memory to
load/store these arguments, instead leverage how `wasmtime_fiber_switch`
already loads a bunch of data into registers which we can then
immediately start using on a fiber's start without any extra memory
accesses.
* Add x86 support to wasmtime-fiber
* Add ARM32 support to fiber crate
* Make fiber build file probing more flexible
* Use CreateFiberEx on Windows
* Remove a stray no-longer-used trait declaration
* Don't reach into `Caller` internals
* Tweak async fuel to eventually run out.
With fuel it's probably best to not provide any way to inject infinite
fuel.
* Fix some typos
* Cleanup asm a bit
* Use a shared header file to deduplicate some directives
* Guarantee hidden visibility for functions
* Enable gc-sections on macOS x86_64
* Add `.type` annotations for ARM
* Update lock file
* Fix compile error
* Review comments
This commit goes through the dependencies that wasmtime has and updates
versions where possible. This notably brings in a wasmparser/wast update
which has some simd spec changes with new instructions. Otherwise most
of these are just routine updates.
This commit updates the various tooling used by wasmtime which has new
updates to the module linking proposal. This is done primarily to sync
with WebAssembly/module-linking#26. The main change implemented here is
that wasmtime now supports creating instances from a set of values, nott
just from instantiating a module. Additionally subtyping handling of
modules with respect to imports is now properly handled by desugaring
two-level imports to imports of instances.
A number of small refactorings are included here as well, but most of
them are in accordance with the changes to `wasmparser` and the updated
binary format for module linking.
This commit updates all the wasm-tools crates that we use and enables
fuzzing of the module linking proposal in our various fuzz targets. This
also refactors some of the dummy value generation logic to not be
fallible and to always succeed, the thinking being that we don't want to
accidentally hide errors while fuzzing. Additionally instantiation is
only allowed to fail with a `Trap`, other failure reasons are unwrapped.
* Implement imported/exported modules/instances
This commit implements the final piece of the module linking proposal
which is to flesh out the support for importing/exporting instances and
modules. This ended up having a few changes:
* Two more `PrimaryMap` instances are now stored in an `Instance`. The value
for instances is `InstanceHandle` (pretty easy) and for modules it's
`Box<dyn Any>` (less easy).
* The custom host state for `InstanceHandle` for `wasmtime` is now
`Arc<TypeTables` to be able to fully reconstruct an instance's types
just from its instance.
* Type matching for imports now has been updated to take
instances/modules into account.
One of the main downsides of this implementation is that type matching
of imports is duplicated between wasmparser and wasmtime, leading to
posssible bugs especially in the subtelties of module linking. I'm not
sure how best to unify these two pieces of validation, however, and it
may be more trouble than it's worth.
cc #2094
* Update wat/wast/wasmparser
* Review comments
* Fix a bug in publish script to vendor the right witx
Currently there's two witx binaries in our repository given the two wasi
spec submodules, so this updates the publication script to vendor the
right one.
This PR adds a new fuzz target, `differential_wasmi`, that runs a
Cranelift-based Wasm backend alongside a simple third-party Wasm
interpeter crate (`wasmi`). The fuzzing runs the first function in a
given module to completion on each side, and then diffs the return value
and linear memory contents.
This strategy should provide end-to-end coverage including both the Wasm
translation to CLIF (which has seen some subtle and scary bugs at
times), the lowering from CLIF to VCode, the register allocation, and
the final code emission.
This PR also adds a feature `experimental_x64` to the fuzzing crate (and
the chain of dependencies down to `cranelift-codegen`) so that we can
fuzz the new x86-64 backend as well as the current one.
* Provide filename/line number information in `Trap`
This commit extends the `Trap` type and `Store` to retain DWARF debug
information found in a wasm file unconditionally, if it's present. This
then enables us to print filenames and line numbers which point back to
actual source code when a trap backtrace is printed. Additionally the
`FrameInfo` type has been souped up to return filename/line number
information as well.
The implementation here is pretty simplistic currently. The meat of all
the work happens in `gimli` and `addr2line`, and otherwise wasmtime is
just schlepping around bytes of dwarf debuginfo here and there!
The general goal here is to assist with debugging when using wasmtime
because filenames and line numbers are generally orders of magnitude
better even when you already have a stack trace. Another nicety here is
that backtraces will display inlined frames (learned through debug
information), improving the experience in release mode as well.
An example of this is that with this file:
```rust
fn main() {
panic!("hello");
}
```
we get this stack trace:
```
$ rustc foo.rs --target wasm32-wasi -g
$ cargo run foo.wasm
Finished dev [unoptimized + debuginfo] target(s) in 0.16s
Running `target/debug/wasmtime foo.wasm`
thread 'main' panicked at 'hello', foo.rs:2:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: failed to run main module `foo.wasm`
Caused by:
0: failed to invoke command default
1: wasm trap: unreachable
wasm backtrace:
0: 0x6c1c - panic_abort::__rust_start_panic::abort::h2d60298621b1ccbf
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:77:17
- __rust_start_panic
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:32:5
1: 0x68c7 - rust_panic
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:626:9
2: 0x65a1 - std::panicking::rust_panic_with_hook::h2345fb0909b53e12
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:596:5
3: 0x1436 - std::panicking::begin_panic::{{closure}}::h106f151a6db8c8fb
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:506:9
4: 0xda8 - std::sys_common::backtrace::__rust_end_short_backtrace::he55aa13f22782798
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:153:18
5: 0x1324 - std::panicking::begin_panic::h1727e7d1d719c76f
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:505:12
6: 0xfde - foo::main::h2db1313a64510850
at /Users/acrichton/code/wasmtime/foo.rs:2:5
7: 0x11d5 - core::ops::function::FnOnce::call_once::h20ee1cc04aeff1fc
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227:5
8: 0xddf - std::sys_common::backtrace::__rust_begin_short_backtrace::h054493e41e27e69c
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:137:18
9: 0x1d5a - std::rt::lang_start::{{closure}}::hd83784448d3fcb42
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:66:18
10: 0x69d8 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h564d3dad35014917
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:259:13
- std::panicking::try::do_call::hdca4832ace5a8603
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:381:40
- std::panicking::try::ha8624a1a6854b456
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:345:19
- std::panic::catch_unwind::h71421f57cf2bc688
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panic.rs:382:14
- std::rt::lang_start_internal::h260050c92cd470af
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:51:25
11: 0x1d0c - std::rt::lang_start::h0b4bcf3c5e498224
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:65:5
12: 0xffc - <unknown>!__original_main
13: 0x393 - __muloti4
at /cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.35/src/macros.rs:269
```
This is relatively noisy by default but there's filenames and line
numbers! Additionally frame 10 can be seen to have lots of frames
inlined into it. All information is always available to the embedder but
we could try to handle the `__rust_begin_short_backtrace` and
`__rust_end_short_backtrace` markers to trim the backtrace by default as
well.
The only gotcha here is that it looks like `__muloti4` is out of place.
That's because the libc that Rust ships with doesn't have dwarf
information, although I'm not sure why we land in that function for
symbolizing it...
* Add a configuration switch for debuginfo
* Control debuginfo by default with `WASM_BACKTRACE_DETAILS`
* Try cpp_demangle on demangling as well
* Rename to WASMTIME_BACKTRACE_DETAILS
This commit removes the global variable associated with wasm traps which
stores frame information. The only purpose of this global is to help
symbolicate `Trap`s created since we support creating a `Trap` without a
`Store`. The global, however, is only used for wasm frames on the stack,
and when wasm frames are on the stack we know that our thread local for
"what was the last context" is set and configured.
The change here is to hijack this thread-local some more to effectively
store the `Store` inside of it. All frame information is then moved
directly into `Store` and no longer lives off on the side in a global.
Additionally support for registering/unregistering modules is now
simplified because once a module is registered with a store it can never
be unregistered.
This has one slight functional change where if there are two instances
of `Store` interleaving calls to wasm code on the stack we'll only be
able to symbolicate one of them instead of both. That's arguably also a
feature however because this is sort of a way to leak information across
stores right now.
Otherwise, though, this isn't intended to change any existing logic, but
instead keep everything working as-is.
* this requires upgrading to wasmparser 0.67.0.
* There are no CLIF side changes because the CLIF `select` instruction is
polymorphic enough.
* on aarch64, there is unfortunately no conditional-move (csel) instruction on
vectors. This patch adds a synthetic instruction `VecCSel` which *does*
behave like that. At emit time, this is emitted as an if-then-else diamond
(4 insns).
* aarch64 implementation is otherwise straightforwards.
This patch implements, for aarch64, the following wasm SIMD extensions
i32x4.dot_i16x8_s instruction
https://github.com/WebAssembly/simd/pull/127
It also updates dependencies as follows, in order that the new instruction can
be parsed, decoded, etc:
wat to 1.0.27
wast to 26.0.1
wasmparser to 0.65.0
wasmprinter to 0.2.12
The changes are straightforward:
* new CLIF instruction `widening_pairwise_dot_product_s`
* translation from wasm into `widening_pairwise_dot_product_s`
* new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group)
* translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv`
There is no testcase in this commit, because that is a separate repo. The
implementation has been tested, nevertheless.
* Validate modules while translating
This commit is a change to cranelift-wasm to validate each function body
as it is translated. Additionally top-level module translation functions
will perform module validation. This commit builds on changes in
wasmparser to perform module validation interwtwined with parsing and
translation. This will be necessary for future wasm features such as
module linking where the type behind a function index, for example, can
be far away in another module. Additionally this also brings a nice
benefit where parsing the binary only happens once (instead of having an
up-front serial validation step) and validation can happen in parallel
for each function.
Most of the changes in this commit are plumbing to make sure everything
lines up right. The major functional change here is that module
compilation should be faster by validating in parallel (or skipping
function validation entirely in the case of a cache hit). Otherwise from
a user-facing perspective nothing should be that different.
This commit does mean that cranelift's translation now inherently
validates the input wasm module. This means that the Spidermonkey
integration of cranelift-wasm will also be validating the function as
it's being translated with cranelift. The associated PR for wasmparser
(bytecodealliance/wasmparser#62) provides the necessary tools to create
a `FuncValidator` for Gecko, but this is something I'll want careful
review for before landing!
* Read function operators until EOF
This way we can let the validator take care of any issues with
mismatched `end` instructions and/or trailing operators/bytes.
This commit moves all of the caching support that currently lives in
`wasmtime-environ` into a `wasmtime-cache` crate and makes it optional. The
goal here is to slim down the `wasmtime-environ` crate and clearly separate
boundaries where caching is a standalone and optional feature, not intertwined
with other crates.
* Refactor where results of compilation are stored
This commit refactors the internals of compilation in Wasmtime to change
where results of individual function compilation are stored. Previously
compilation resulted in many maps being returned, and compilation
results generally held all these maps together. This commit instead
switches this to have all metadata stored in a `CompiledFunction`
instead of having a separate map for each item that can be stored.
The motivation for this is primarily to help out with future
module-linking-related PRs. What exactly "module level" is depends on
how we interpret modules and how many modules are in play, so it's a bit
easier for operations in wasmtime to work at the function level where
possible. This means that we don't have to pass around multiple
different maps and a function index, but instead just one map or just
one entry representing a compiled function.
Additionally this change updates where the parallelism of compilation
happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`.
This is another goal where `wasmtime-jit` will have more knowledge about
module-level pieces with module linking in play. User-facing-wise this
should be the same in terms of parallel compilation, though.
The ultimate goal of this refactoring is to make it easier for the
results of compilation to actually be a set of wasm modules. This means
we won't be able to have a map-per-metadata where the primary key is the
function index, because there will be many modules within one "object
file".
* Don't clear out fields, just don't store them
Persist a smaller set of fields in `CompilationArtifacts` instead of
trying to clear fields out and dynamically not accessing them.
This commit is intended to update wasmparser to 0.59.0. This primarily
includes bytecodealliance/wasm-tools#40 which is a large update to how
parsing and validation works. The impact on Wasmtime is pretty small at
this time, but over time I'd like to refactor the internals here to lean
more heavily on that upstream wasmparser refactoring.
For now, though, the intention is to get on the train of wasmparser's
latest `main` branch to ensure we get bug fixes and such.
As part of this update a few other crates and such were updated. This is
primarily to handle the new encoding of `ref.is_null` where the type is
not part of the instruction encoding any more.
When running in embedded environments, threads creation is sometimes
undesirable. This adds a feature to toggle wasmtime's internal thread
creation for parallel compilation.
This is something I meant to do a long time ago but forgot to get around
to it! This commit updates the top-level documentation of the `wasmtime`
crate to have examples, more words, and generall be a bit more
up-to-date and complete.
* Minor code tidying.
* Document that `Linker::iter`'s iteration order is arbitrary.
* Add a few more tests for `wasmtime::Linker`.
* Refactor `Linker::compute_imports`.
- Extract the error message generation into a separate function.
- In the error message, sort the candidates.
* Fix a typo in a comment.
* Add `__rtti_base` to the list of allowed but deprecated exports.
* Don't print an Error message when a program exits normally.
* Update comments to reflect the current code.
* Also allow "table" as an exported table, which is used by AssemblyScript.