It turns out that Souper does not allow a constant to be assigned to a variable,
they may only be used as operands. The 2.0.0 version of the `souper-ir` crate
correctly reflects this. In the `cranelift_codegen::souper_harvest` module, we
need to modify our Souper IR harvester so that it delays converting `iconst` and
`bconst` into Souper IR until their values are used as operands. Finally, some
unit tests in the `peepmatic-souper` crate need some small updates as well.
* Implement imported/exported modules/instances
This commit implements the final piece of the module linking proposal
which is to flesh out the support for importing/exporting instances and
modules. This ended up having a few changes:
* Two more `PrimaryMap` instances are now stored in an `Instance`. The value
for instances is `InstanceHandle` (pretty easy) and for modules it's
`Box<dyn Any>` (less easy).
* The custom host state for `InstanceHandle` for `wasmtime` is now
`Arc<TypeTables` to be able to fully reconstruct an instance's types
just from its instance.
* Type matching for imports now has been updated to take
instances/modules into account.
One of the main downsides of this implementation is that type matching
of imports is duplicated between wasmparser and wasmtime, leading to
posssible bugs especially in the subtelties of module linking. I'm not
sure how best to unify these two pieces of validation, however, and it
may be more trouble than it's worth.
cc #2094
* Update wat/wast/wasmparser
* Review comments
* Fix a bug in publish script to vendor the right witx
Currently there's two witx binaries in our repository given the two wasi
spec submodules, so this updates the publication script to vendor the
right one.
This PR adds a new fuzz target, `differential_wasmi`, that runs a
Cranelift-based Wasm backend alongside a simple third-party Wasm
interpeter crate (`wasmi`). The fuzzing runs the first function in a
given module to completion on each side, and then diffs the return value
and linear memory contents.
This strategy should provide end-to-end coverage including both the Wasm
translation to CLIF (which has seen some subtle and scary bugs at
times), the lowering from CLIF to VCode, the register allocation, and
the final code emission.
This PR also adds a feature `experimental_x64` to the fuzzing crate (and
the chain of dependencies down to `cranelift-codegen`) so that we can
fuzz the new x86-64 backend as well as the current one.
* Provide filename/line number information in `Trap`
This commit extends the `Trap` type and `Store` to retain DWARF debug
information found in a wasm file unconditionally, if it's present. This
then enables us to print filenames and line numbers which point back to
actual source code when a trap backtrace is printed. Additionally the
`FrameInfo` type has been souped up to return filename/line number
information as well.
The implementation here is pretty simplistic currently. The meat of all
the work happens in `gimli` and `addr2line`, and otherwise wasmtime is
just schlepping around bytes of dwarf debuginfo here and there!
The general goal here is to assist with debugging when using wasmtime
because filenames and line numbers are generally orders of magnitude
better even when you already have a stack trace. Another nicety here is
that backtraces will display inlined frames (learned through debug
information), improving the experience in release mode as well.
An example of this is that with this file:
```rust
fn main() {
panic!("hello");
}
```
we get this stack trace:
```
$ rustc foo.rs --target wasm32-wasi -g
$ cargo run foo.wasm
Finished dev [unoptimized + debuginfo] target(s) in 0.16s
Running `target/debug/wasmtime foo.wasm`
thread 'main' panicked at 'hello', foo.rs:2:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: failed to run main module `foo.wasm`
Caused by:
0: failed to invoke command default
1: wasm trap: unreachable
wasm backtrace:
0: 0x6c1c - panic_abort::__rust_start_panic::abort::h2d60298621b1ccbf
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:77:17
- __rust_start_panic
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/panic_abort/src/lib.rs:32:5
1: 0x68c7 - rust_panic
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:626:9
2: 0x65a1 - std::panicking::rust_panic_with_hook::h2345fb0909b53e12
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:596:5
3: 0x1436 - std::panicking::begin_panic::{{closure}}::h106f151a6db8c8fb
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:506:9
4: 0xda8 - std::sys_common::backtrace::__rust_end_short_backtrace::he55aa13f22782798
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:153:18
5: 0x1324 - std::panicking::begin_panic::h1727e7d1d719c76f
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:505:12
6: 0xfde - foo::main::h2db1313a64510850
at /Users/acrichton/code/wasmtime/foo.rs:2:5
7: 0x11d5 - core::ops::function::FnOnce::call_once::h20ee1cc04aeff1fc
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227:5
8: 0xddf - std::sys_common::backtrace::__rust_begin_short_backtrace::h054493e41e27e69c
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:137:18
9: 0x1d5a - std::rt::lang_start::{{closure}}::hd83784448d3fcb42
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:66:18
10: 0x69d8 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h564d3dad35014917
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:259:13
- std::panicking::try::do_call::hdca4832ace5a8603
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:381:40
- std::panicking::try::ha8624a1a6854b456
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:345:19
- std::panic::catch_unwind::h71421f57cf2bc688
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panic.rs:382:14
- std::rt::lang_start_internal::h260050c92cd470af
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:51:25
11: 0x1d0c - std::rt::lang_start::h0b4bcf3c5e498224
at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/rt.rs:65:5
12: 0xffc - <unknown>!__original_main
13: 0x393 - __muloti4
at /cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.35/src/macros.rs:269
```
This is relatively noisy by default but there's filenames and line
numbers! Additionally frame 10 can be seen to have lots of frames
inlined into it. All information is always available to the embedder but
we could try to handle the `__rust_begin_short_backtrace` and
`__rust_end_short_backtrace` markers to trim the backtrace by default as
well.
The only gotcha here is that it looks like `__muloti4` is out of place.
That's because the libc that Rust ships with doesn't have dwarf
information, although I'm not sure why we land in that function for
symbolizing it...
* Add a configuration switch for debuginfo
* Control debuginfo by default with `WASM_BACKTRACE_DETAILS`
* Try cpp_demangle on demangling as well
* Rename to WASMTIME_BACKTRACE_DETAILS
This commit deletes the old `snapshot_0` implementation of wasi-common,
along with the `wig` crate that was used to generate bindings for it.
This then reimplements `snapshot_0` in terms of
`wasi_snapshot_preview1`. There were very few changes between the two
snapshots:
* The `nlink` field of `FileStat` was increased from 32 to 64 bits.
* The `set` field of `whence` was reordered.
* Clock subscriptions in polling dropped their redundant userdata field.
This makes all of the syscalls relatively straightforward to simply
delegate to the next snapshot's implementation. Some trickery happens to
avoid extra cost when dealing with iovecs, but since the memory layout
of iovecs remained the same this should still work.
Now that `snapshot_0` is using wiggle we simply have a trait to
implement, and that's implemented for the same `WasiCtx` that has the
`wasi_snapshot_preview1` trait implemented for it as well. While this
theoretically means that you could share the file descriptor table
between the two snapshots that's not supported in the generated bindings
just yet. A separate `WasiCtx` will be created for each WASI module.
* Add an initial wasi-nn implementation for Wasmtime
This change adds a crate, `wasmtime-wasi-nn`, that uses `wiggle` to expose the current state of the wasi-nn API and `openvino` to implement the exposed functions. It includes an end-to-end test demonstrating how to do classification using wasi-nn:
- `crates/wasi-nn/tests/classification-example` contains Rust code that is compiled to the `wasm32-wasi` target and run with a Wasmtime embedding that exposes the wasi-nn calls
- the example uses Rust bindings for wasi-nn contained in `crates/wasi-nn/tests/wasi-nn-rust-bindings`; this crate contains code generated by `witx-bindgen` and eventually should be its own standalone crate
* Test wasi-nn as a CI step
This change adds:
- a GitHub action for installing OpenVINO
- a script, `ci/run-wasi-nn-example.sh`, to run the classification example
This commit removes the global variable associated with wasm traps which
stores frame information. The only purpose of this global is to help
symbolicate `Trap`s created since we support creating a `Trap` without a
`Store`. The global, however, is only used for wasm frames on the stack,
and when wasm frames are on the stack we know that our thread local for
"what was the last context" is set and configured.
The change here is to hijack this thread-local some more to effectively
store the `Store` inside of it. All frame information is then moved
directly into `Store` and no longer lives off on the side in a global.
Additionally support for registering/unregistering modules is now
simplified because once a module is registered with a store it can never
be unregistered.
This has one slight functional change where if there are two instances
of `Store` interleaving calls to wasm code on the stack we'll only be
able to symbolicate one of them instead of both. That's arguably also a
feature however because this is sort of a way to leak information across
stores right now.
Otherwise, though, this isn't intended to change any existing logic, but
instead keep everything working as-is.
* this requires upgrading to wasmparser 0.67.0.
* There are no CLIF side changes because the CLIF `select` instruction is
polymorphic enough.
* on aarch64, there is unfortunately no conditional-move (csel) instruction on
vectors. This patch adds a synthetic instruction `VecCSel` which *does*
behave like that. At emit time, this is emitted as an if-then-else diamond
(4 insns).
* aarch64 implementation is otherwise straightforwards.
This will hopefully remove a small thorn in our side with periodic
nightly breakage due to nightly features changing. This commit moves
lightbeam to stable Rust, swapping out `staticvec` for `arrayvec` and
otherwise updating some dependencies (namely `dynasm`) to compile with
stable.
This then also updates CI appropriately to not use a pinned nightly and
instead us a floating `nightly` channel so we can head off any breakage
coming up ASAP.
I don't think this has happened in awhile but I've run a `cargo update`
as well as trimming some of the duplicate/older dependencies in
`Cargo.lock` by updating some of our immediate dependencies as well.
Currently the runtime needs to acquire the current stack pointer so it
can set a limit for where if the wasm stack goes below that point it
will abort the wasm code. Acquiring the stack pointer is done in a
brittle way right now which involves looking at the address of what we
hope is an on-stack structure. This turns out to not work at all with
ASan as well.
Instead this commit switches to the `psm` crate which is used by the
Rust compiler team for stack manipulation, namely a coarse version of
segmented stacks to avoid stack overflow in the compiler. We don't need
most of the implementation of `psm`, just the `stack_pointer` function,
but it shouldn't be a burden to bring in!
Closes#2344
This patch implements, for aarch64, the following wasm SIMD extensions
i32x4.dot_i16x8_s instruction
https://github.com/WebAssembly/simd/pull/127
It also updates dependencies as follows, in order that the new instruction can
be parsed, decoded, etc:
wat to 1.0.27
wast to 26.0.1
wasmparser to 0.65.0
wasmprinter to 0.2.12
The changes are straightforward:
* new CLIF instruction `widening_pairwise_dot_product_s`
* translation from wasm into `widening_pairwise_dot_product_s`
* new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group)
* translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv`
There is no testcase in this commit, because that is a separate repo. The
implementation has been tested, nevertheless.
* Rewrite interpreter generically
This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions).
* Replace macros with closures
This commit removes the binaryen support for fuzzing from wasmtime,
instead switching over to `wasm-smith`. In general it's great to have
what fuzzing we can, but our binaryen support suffers from a few issues:
* The Rust crate, binaryen-sys, seems largely unmaintained at this
point. While we could likely take ownership and/or send PRs to update
the crate it seems like the maintenance is largely on us at this point.
* Currently the binaryen-sys crate doesn't support fuzzing anything
beyond MVP wasm, but we're interested at least in features like bulk
memory and reference types. Additionally we'll also be interested in
features like module-linking. New features would require either
implementation work in binaryen or the binaryen-sys crate to support.
* We have 4-5 fuzz-bugs right now related to timeouts simply in
generating a module for wasmtime to fuzz. One investigation along
these lines in the past revealed a bug in binaryen itself, and in any
case these bugs would otherwise need to get investigated, reported,
and possibly fixed ourselves in upstream binaryen.
Overall I'm not sure at this point if maintaining binaryen fuzzing is
worth it with the advent of `wasm-smith` which has similar goals for
wasm module generation, but is much more readily maintainable on our
end.
Additonally in this commit I've added a fuzzer for wasm-smith's
`SwarmConfig`-based fuzzer which should expand the coverage of tested
modules.
Closes#2163
In the current translation of wasm (128-bit) SIMD into CLIF, we work around differences in the
type system models of wasm vs CLIF by inserting `bitcast` (a no-op cast) CLIF instructions before
more or less every use of a SIMD value. Unfortunately this was not being done consistently and
even small examples with a single if-then-else diamond that produces a SIMD value, could cause a
verification failure downstream. In this case, the jump out of the "else" block needed a
bitcast, but didn't have one.
This patch wraps creation of CLIF jumps and conditional branches up into three functions,
`canonicalise_then_jump` and `canonicalise_then_br{z,nz}`, and uses them consistently. They
first cast the relevant block formal parameters, then generate the relevant kind of branch/jump.
Hence, provided they are also used consistently in future to generate branches/jumps in this
file, we are protected against such failures.
The patch also adds a large(ish) comment at the top explaining this in more detail.
Fixes RUSTSEC-2020-0054 warning from cargo-audit/cargo-deny, follows the recommendation to switch to the new maintained `directories-next` crate fork
Only affects the cache directory determination for the environment and was a simple search'n'replace to this fork so don't think behavior has changed.
https://rustsec.org/advisories/RUSTSEC-2020-0054
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.
In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).
Fixes#2254.
* Validate modules while translating
This commit is a change to cranelift-wasm to validate each function body
as it is translated. Additionally top-level module translation functions
will perform module validation. This commit builds on changes in
wasmparser to perform module validation interwtwined with parsing and
translation. This will be necessary for future wasm features such as
module linking where the type behind a function index, for example, can
be far away in another module. Additionally this also brings a nice
benefit where parsing the binary only happens once (instead of having an
up-front serial validation step) and validation can happen in parallel
for each function.
Most of the changes in this commit are plumbing to make sure everything
lines up right. The major functional change here is that module
compilation should be faster by validating in parallel (or skipping
function validation entirely in the case of a cache hit). Otherwise from
a user-facing perspective nothing should be that different.
This commit does mean that cranelift's translation now inherently
validates the input wasm module. This means that the Spidermonkey
integration of cranelift-wasm will also be validating the function as
it's being translated with cranelift. The associated PR for wasmparser
(bytecodealliance/wasmparser#62) provides the necessary tools to create
a `FuncValidator` for Gecko, but this is something I'll want careful
review for before landing!
* Read function operators until EOF
This way we can let the validator take care of any issues with
mismatched `end` instructions and/or trailing operators/bytes.
Given a clif function, harvest all its integer subexpressions, so that they can
be fed into [Souper](https://github.com/google/souper) as candidates for
superoptimization. For some of these candidates, Souper will successfully
synthesize a right-hand side that is equivalent but has lower cost than the
left-hand side. Then, we can combine these left- and right-hand sides into a
complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value `x`, we do a post-order
traversal of the dataflow graph starting from `x`. As we do this traversal, we
maintain a map from clif values to their translated Souper values. We stop
traversing when we reach anything that can't be translated into Souper IR: a
memory load, a float-to-int conversion, a block parameter, etc. For values
produced by these instructions, we create a Souper `var`, which is an input
variable to the optimization. For instructions that have a direct mapping into
Souper IR, we get the Souper version of each of its operands and then create the
Souper version of the instruction itself. It should now be clear why we do a
post-order traversal: we need an instruction's translated operands in order to
translate the instruction itself. Once this instruction is translated, we update
the clif-to-souper map with this new translation so that any other instruction
that uses this result as an operand has access to the translated value. When the
traversal is complete we return the translation of `x` as the root of left-hand
side candidate.
Conversion from Souper into Peepmatic is implemented with a straightforward,
top-down recursive traversal of the optimization's left- and right-hand side
expression DAGs. Most Souper instructions have a corresponding Peepmatic
instruction. If we run into an instruction where that isn't the case, we skip
that Souper optimization and move on to the next one.
Note that Souper fully supports DAGs, for example:
```text
%0 = var
%1 = add 1, %0
%2 = add %1, %1 ;; Two edges to `%1` makes this a DAG.
```
On the other hand, Peepmatic only currently supports trees, so shared
subexpressions are duplicated:
```text
(iadd (iadd 1 $x)
(iadd 1 $x)) ;; The shared subexpression is duplicated.
```
This does not affect correctness.