* Expose memory-related options in `Config`
This commit was initially motivated by looking more into #1501, but it
ended up balooning a bit after finding a few issues. The high-level
items in this commit are:
* New configuration options via `wasmtime::Config` are exposed to
configure the tunable limits of how memories are allocated and such.
* The `MemoryCreator` trait has been updated to accurately reflect the
required allocation characteristics that JIT code expects.
* A bug has been fixed in the cranelift wasm code generation where if no
guard page was present bounds checks weren't accurately performed.
The new `Config` methods allow tuning the memory allocation
characteristics of wasmtime. Currently 64-bit platforms will reserve 6GB
chunks of memory for each linear memory, but by tweaking various config
options you can change how this is allocate, perhaps at the cost of
slower JIT code since it needs more bounds checks. The methods are
intended to be pretty thoroughly documented as to the effect they have
on the JIT code and what values you may wish to select. These new
methods have been added to the spectest fuzzer to ensure that various
configuration values for these methods don't affect correctness.
The `MemoryCreator` trait previously only allocated memories with a
`MemoryType`, but this didn't actually reflect the guarantees that JIT
code expected. JIT code is generated with an assumption about the
minimum size of the guard region, as well as whether memory is static or
dynamic (whether the base pointer can be relocated). These properties
must be upheld by custom allocation engines for JIT code to perform
correctly, so extra parameters have been added to
`MemoryCreator::new_memory` to reflect this.
Finally the fuzzing with `Config` turned up an issue where if no guard
pages present the wasm code wouldn't correctly bounds-check memory
accesses. The issue here was that with a guard page we only need to
bounds-check the first byte of access, but without a guard page we need
to bounds-check the last byte of access. This meant that the code
generation needed to account for the size of the memory operation
(load/store) and use this as the offset-to-check in the no-guard-page
scenario. I've attempted to make the various comments in cranelift a bit
more exhaustive too to hopefully make it a bit clearer for future
readers!
Closes#1501
* Review comments
* Update a comment
This PR updates Cranelift to use the new version of regalloc.rs
(bytecodealliance/regalloc.rs#55) that provides dense vreg->rreg maps to
the `map_reg()` function for each instruction, rather than the earlier
hashmap-based approach.
In one test (regex-rs.wasm), this PR results in a 15% reduction in
memory allocations (1245MB -> 1060MB) as measured by DHAT on `clif-util
wasm` runs.
This change adds SourceLoc information per instruction in a `VCode<Inst>`
container, and keeps this information up-to-date across register allocation
and branch reordering. The information is initially collected during
instruction lowering, eventually collected on the MachSection, and finally
provided to the environment that wraps the codegen crate for wasmtime.
This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the `stack_limit` argument purpose as well as
a function's global `stack_limit` directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!
Given an integer size N, a left rotation of K places is the same as a
right rotation of N - K places. This means we can use right rotations to
implement left rotations too.
The Cranelift's rotation semantics are inherited from WebAssembly, which
mean the rotation count is truncated modulo the operand's bit size. Note
the ROR aarch64 instruction has the same semantics, when both input
operands are registers.
Previously, `fcopysign` was mysteriously failing to pass the
`float_misc` spec test. This was tracked down to bad logical-immediate
masks used to separate the sign and not-sign bits. In particular, the
masks for the and-not operations were wrong. The `invert()` function on
an `ImmLogic` immediate, it turns out, assumed every immediate would be
used by a 64-bit instruction; `ImmLogic` immediates are subtly different
for 32-bit instructions. This change tracks the instruction size (32 or
64 bits) intended for use with each such immediate, and passes it back
into `maybe_from_u64` when computing the inverted immediate.
Addresses several of the failures (`float_misc`, `f32_bitwise`) for
#1521 (test failures) and presumably helps #1519 (SpiderMonkey
integration).
Certain operations (e.g. x86_packss) will have operands with types like `NxM` but will return results with types like `(N/2)x(M*2)` (halve the lane width, double the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::HalfWidth` then `DerivedFunction::DoubleVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, 1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)), I added `DerivedFunction::SplitLanes` to do the necessary type conversion.
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
* Add AArch64 tests to CI
This commit enhances our CI with an AArch64 builder. Currently we have
no physical hardware to run on so for now we run all tests in an
emulator. The AArch64 build is cross-compiled from x86_64 from Linux.
Tests all happen in release mode with a recent version of QEMU (recent
version because it's so much faster, and in release mode because debug
mode tests take quite a long time in an emulator).
The goal here was not to get all tests passing on CI, but rather to get
AArch64 running on CI and get it green at the same time. To achieve that
goal many tests are now ignored on aarch64 platforms. Many tests fail
due to unimplemented functionality in the aarch64 backend (#1521), and
all wasmtime tests involving compilation are also disabled due to
panicking attempting to generate generate instruction offset information
for trap symbolication (#1523).
Despite this, though, all Cranelift tests and other wasmtime tests
should be runnin on AArch64 through QEMU with this PR. Additionally
we'll have an AArch64 binary release of Wasmtime for Linux, although it
won't be too useful just yet since it will panic on almost all wasm
modules.
* Review comments
This halfway solves a test failure: when temporarily disabling another
assert that is triggered by lack of debug info, this causes the
`custom_trap_handler` test to pass.
* Implement interrupting wasm code, reimplement stack overflow
This commit is a relatively large change for wasmtime with two main
goals:
* Primarily this enables interrupting executing wasm code with a trap,
preventing infinite loops in wasm code. Note that resumption of the
wasm code is not a goal of this commit.
* Additionally this commit reimplements how we handle stack overflow to
ensure that host functions always have a reasonable amount of stack to
run on. This fixes an issue where we might longjmp out of a host
function, skipping destructors.
Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.
A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.
When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.
Some fallout included from this change is:
* Unix signal handlers are no longer registered with `SA_ONSTACK`.
Instead they run on the native stack the thread was already using.
This is possible since stack overflow isn't handled by hitting the
guard page, but rather it's explicitly checked for in wasm now. Native
stack overflow will continue to abort the process as usual.
* Unix sigaltstack management is now no longer necessary since we don't
use it any more.
* Windows no longer has any need to reset guard pages since we no longer
try to recover from faults on guard pages.
* On all targets probestack intrinsics are disabled since we use a
different mechanism for catching stack overflow.
* The C API has been updated with interrupts handles. An example has
also been added which shows off how to interrupt a module.
Closes#139Closes#860Closes#900
* Update comment about magical interrupt value
* Store stack limit as a global value, not a closure
* Run rustfmt
* Handle review comments
* Add a comment about SA_ONSTACK
* Use `usize` for type of `INTERRUPTED`
* Parse human-readable durations
* Bring back sigaltstack handling
Allows libstd to print out stack overflow on failure still.
* Add parsing and emission of stack limit-via-preamble
* Fix new example for new apis
* Fix host segfault test in release mode
* Fix new doc example
* Compute instance exports on demand.
Instead having instances eagerly compute a Vec of Externs, and bumping
the refcount for each Extern, compute Externs on demand.
This also enables `Instance::get_export` to avoid doing a linear search.
This also means that the closure returned by `get0` and friends now
holds an `InstanceHandle` to dynamically hold the instance live rather
than being scoped to a lifetime.
* Compute module imports and exports on demand too.
And compute Extern::ty on demand too.
* Add a utility function for computing an ExternType.
* Add a utility function for looking up a function's signature.
* Add a utility function for computing the ValType of a Global.
* Rename wasmtime_environ::Export to EntityIndex.
This helps differentiate it from other Export types in the tree, and
describes what it is.
* Fix a typo in a comment.
* Simplify module imports and exports.
* Make `Instance::exports` return the export names.
This significantly simplifies the public API, as it's relatively common
to need the names, and this avoids the need to do a zip with
`Module::exports`.
This also changes `ImportType` and `ExportType` to have public members
instead of private members and accessors, as I find that simplifies the
usage particularly in cases where there are temporary instances.
* Remove `Instance::module`.
This doesn't quite remove `Instance`'s `module` member, it gets a step
closer.
* Use a InstanceHandle utility function.
* Don't consume self in the `Func::get*` methods.
Instead, just create a closure containing the instance handle and the
export for them to call.
* Use `ExactSizeIterator` to avoid needing separate `num_*` methods.
* Rename `Extern::func()` etc. to `into_func()` etc.
* Revise examples to avoid using `nth`.
* Add convenience methods to instance for getting specific extern types.
* Use the convenience functions in more tests and examples.
* Avoid cloning strings for `ImportType` and `ExportType`.
* Remove more obviated clone() calls.
* Simplify `Func`'s closure state.
* Make wasmtime::Export's fields private.
This makes them more consistent with ExportType.
* Fix compilation error.
* Make a lifetime parameter explicit, and use better lifetime names.
Instead of 'me, use 'instance and 'module to make it clear what the
lifetime is.
* More lifetime cleanups.