* Turn off probestack by default in Cranelift
The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.
* aarch64: Improve codegen for AMode fallback
Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.
* aarch64: Implement inline stack probes
This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.
* Enable inline probestack for AArch64 and Riscv64
This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.
* Enable probestack for aarch64 in cranelift-fuzzgen
* Address review comments
* Remove implicit stack overflow traps from x64 backend
This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.
This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.
* Fix a style issue
* Optimize `Func::call` and its C API
This commit is an alternative to #3298 which achieves effectively the
same goal of optimizing the `Func::call` API as well as its C API
sibling of `wasmtime_func_call`. The strategy taken here is different
than #3298 though where a new API isn't created, rather a small tweak to
an existing API is done. Specifically this commit handles the major
sources of slowness with `Func::call` with:
* Looking up the type of a function, to typecheck the arguments with and
use to guide how the results should be loaded, no longer hits the
rwlock in the `Engine` but instead each `Func` contains its own
`FuncType`. This can be an unnecessary allocation for funcs not used
with `Func::call`, so this is a downside of this implementation
relative to #3298. A mitigating factor, though, is that instance
exports are loaded lazily into the `Store` and in theory not too many
funcs are active in the store as `Func` objects.
* Temporary storage is amortized with a long-lived `Vec` in the `Store`
rather than allocating a new vector on each call. This is basically
the same strategy as #3294 only applied to different types in
different places. Specifically `wasmtime::Store` now retains a
`Vec<u128>` for `Func::call`, and the C API retains a `Vec<Val>` for
calling `Func::call`.
* Finally, an API breaking change is made to `Func::call` and its type
signature (as well as `Func::call_async`). Instead of returning
`Box<[Val]>` as it did before this function now takes a
`results: &mut [Val]` parameter. This allows the caller to manage the
allocation and we can amortize-remove it in `wasmtime_func_call` by
using space after the parameters in the `Vec<Val>` we're passing in.
This change is naturally a breaking change and we'll want to consider
it carefully, but mitigating factors are that most embeddings are
likely using `TypedFunc::call` instead and this signature taking a
mutable slice better aligns with `Func::new` which receives a mutable
slice for the results.
Overall this change, in the benchmark of "call a nop function from the C
API" is not quite as good as #3298. It's still a bit slower, on the
order of 15ns, because there's lots of capacity checks around vectors
and the type checks are slightly less optimized than before. Overall
though this is still significantly better than today because allocations
and the rwlock to acquire the type information are both avoided. I
personally feel that this change is the best to do because it has less
of an API impact than #3298.
* Rebase issues
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.
* Moves CodeMemory, VMInterrupts and SignatureRegistry from Compiler
* CompiledModule holds CodeMemory and GdbJitImageRegistration
* Store keeps track of its JIT code
* Makes "jit_int.rs" stuff Send+Sync
* Adds the threads example.
This commit fixes an issue in Wasmtime where Wasmtime would accidentally
"handle" non-wasm segfaults while executing host imports of wasm
modules. If a host import segfaulted then Wasmtime would recognize that
wasm code is on the stack, so it'd longjmp out of the wasm code. This
papers over real bugs though in host code and erroneously classified
segfaults as wasm traps.
The fix here was to add a check to our wasm signal handler for if the
faulting address falls in JIT code itself. Actually threading through
all the right information for that check to happen is a bit tricky,
though, so this involved some refactoring:
* A closure parameter to `catch_traps` was added. This closure is
responsible for classifying addresses as whether or not they fall in
JIT code. Anything returning `false` means that the trap won't get
handled and we'll forward to the next signal handler.
* To avoid passing tons of context all over the place, the start
function is now no longer automatically invoked by `InstanceHandle`.
This avoids the need for passing all sorts of trap-handling contextual
information like the maximum stack size and "is this a jit address"
closure. Instead creators of `InstanceHandle` (like wasmtime) are now
responsible for invoking the start function.
* To avoid excessive use of `transmute` with lifetimes since the
traphandler state now has a lifetime the per-instance custom signal
handler is now replaced with a per-store custom signal handler. I'm
not entirely certain the purpose of the custom signal handler, though,
so I'd look for feedback on this part.
A new test has been added which ensures that if a host function
segfaults we don't accidentally try to handle it, and instead we
correctly report the segfault.
* Add AArch64 tests to CI
This commit enhances our CI with an AArch64 builder. Currently we have
no physical hardware to run on so for now we run all tests in an
emulator. The AArch64 build is cross-compiled from x86_64 from Linux.
Tests all happen in release mode with a recent version of QEMU (recent
version because it's so much faster, and in release mode because debug
mode tests take quite a long time in an emulator).
The goal here was not to get all tests passing on CI, but rather to get
AArch64 running on CI and get it green at the same time. To achieve that
goal many tests are now ignored on aarch64 platforms. Many tests fail
due to unimplemented functionality in the aarch64 backend (#1521), and
all wasmtime tests involving compilation are also disabled due to
panicking attempting to generate generate instruction offset information
for trap symbolication (#1523).
Despite this, though, all Cranelift tests and other wasmtime tests
should be runnin on AArch64 through QEMU with this PR. Additionally
we'll have an AArch64 binary release of Wasmtime for Linux, although it
won't be too useful just yet since it will panic on almost all wasm
modules.
* Review comments
* Implement interrupting wasm code, reimplement stack overflow
This commit is a relatively large change for wasmtime with two main
goals:
* Primarily this enables interrupting executing wasm code with a trap,
preventing infinite loops in wasm code. Note that resumption of the
wasm code is not a goal of this commit.
* Additionally this commit reimplements how we handle stack overflow to
ensure that host functions always have a reasonable amount of stack to
run on. This fixes an issue where we might longjmp out of a host
function, skipping destructors.
Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.
A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.
When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.
Some fallout included from this change is:
* Unix signal handlers are no longer registered with `SA_ONSTACK`.
Instead they run on the native stack the thread was already using.
This is possible since stack overflow isn't handled by hitting the
guard page, but rather it's explicitly checked for in wasm now. Native
stack overflow will continue to abort the process as usual.
* Unix sigaltstack management is now no longer necessary since we don't
use it any more.
* Windows no longer has any need to reset guard pages since we no longer
try to recover from faults on guard pages.
* On all targets probestack intrinsics are disabled since we use a
different mechanism for catching stack overflow.
* The C API has been updated with interrupts handles. An example has
also been added which shows off how to interrupt a module.
Closes#139Closes#860Closes#900
* Update comment about magical interrupt value
* Store stack limit as a global value, not a closure
* Run rustfmt
* Handle review comments
* Add a comment about SA_ONSTACK
* Use `usize` for type of `INTERRUPTED`
* Parse human-readable durations
* Bring back sigaltstack handling
Allows libstd to print out stack overflow on failure still.
* Add parsing and emission of stack limit-via-preamble
* Fix new example for new apis
* Fix host segfault test in release mode
* Fix new doc example
* Move most wasmtime tests into one test suite
This commit moves most wasmtime tests into a single test suite which
gets compiled into one executable instead of having lots of test
executables. The goal here is to reduce disk space on CI, and this
should be achieved by having fewer executables which means fewer copies
of `libwasmtime.rlib` linked across binaries on the system. More
importantly though this means that DWARF debug information should only
be in one executable rather than duplicated across many.
* Share more build caches
Globally set `RUSTFLAGS` to `-Dwarnings` instead of individually so all
build steps share the same value.
* Allow some dead code in cranelift-codegen
Prevents having to fix all warnings for all possible feature
combinations, only the main ones which come up.
* Update some debug file paths