This introduces two changes:
- first, a Cargo feature is added to make it possible to use the
Cranelift x64 backend directly from wasmtime's CLI.
- second, when passing a `cranelift-flags` parameter, and the given
parameter's name doesn't exist at the target-independent flag level, try
to set it as a target-dependent setting.
These two changes make it possible to try out the new x64 backend with:
cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm
Right now, this will fail because most opcodes required by the
trampolines are actually not implemented yet.
For host VM code, we use plain reference counting, where cloning increments
the reference count, and dropping decrements it. We can avoid many of the
on-stack increment/decrement operations that typically plague the
performance of reference counting via Rust's ownership and borrowing system.
Moving a `VMExternRef` avoids mutating its reference count, and borrowing it
either avoids the reference count increment or delays it until if/when the
`VMExternRef` is cloned.
When passing a `VMExternRef` into compiled Wasm code, we don't want to do
reference count mutations for every compiled `local.{get,set}`, nor for
every function call. Therefore, we use a variation of **deferred reference
counting**, where we only mutate reference counts when storing
`VMExternRef`s somewhere that outlives the activation: into a global or
table. Simultaneously, we over-approximate the set of `VMExternRef`s that
are inside Wasm function activations. Periodically, we walk the stack at GC
safe points, and use stack map information to precisely identify the set of
`VMExternRef`s inside Wasm activations. Then we take the difference between
this precise set and our over-approximation, and decrement the reference
count for each of the `VMExternRef`s that are in our over-approximation but
not in the precise set. Finally, the over-approximation is replaced with the
precise set.
The `VMExternRefActivationsTable` implements the over-approximized set of
`VMExternRef`s referenced by Wasm activations. Calling a Wasm function and
passing it a `VMExternRef` moves the `VMExternRef` into the table, and the
compiled Wasm function logically "borrows" the `VMExternRef` from the
table. Similarly, `global.get` and `table.get` operations clone the gotten
`VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the
reference out of the table.
When a `VMExternRef` is returned to host code from a Wasm function, the host
increments the reference count (because the reference is logically
"borrowed" from the `VMExternRefActivationsTable` and the reference count
from the table will be dropped at the next GC).
For more general information on deferred reference counting, see *An
Examination of Deferred Reference Counting and Cycle Detection* by Quinane:
https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf
cc #929Fixes#1804
* Refactor how relocs are stored and handled
* refactor CompiledModule::instantiate and link_module
* Refactor DWARF creation: split generation and serialization
* Separate DWARF data transform from instantiation
* rm LinkContext
These libcalls are useful for 32-bit platforms.
On x86_32 in particular, commit 4ec16fa0 added support for legalizing
64-bit shifts through SIMD operations. However, that legalization
requires SIMD to be enabled and SSE 4.1 to be supported, which is not
acceptable as a hard requirement.
* Moves CodeMemory, VMInterrupts and SignatureRegistry from Compiler
* CompiledModule holds CodeMemory and GdbJitImageRegistration
* Store keeps track of its JIT code
* Makes "jit_int.rs" stuff Send+Sync
* Adds the threads example.
This is enough to get an `externref -> externref` identity function
passing.
However, `externref`s that are dropped by compiled Wasm code are (safely)
leaked. Follow up work will leverage cranelift's stack maps to resolve this
issue.
This commit fixes an issue in Wasmtime where Wasmtime would accidentally
"handle" non-wasm segfaults while executing host imports of wasm
modules. If a host import segfaulted then Wasmtime would recognize that
wasm code is on the stack, so it'd longjmp out of the wasm code. This
papers over real bugs though in host code and erroneously classified
segfaults as wasm traps.
The fix here was to add a check to our wasm signal handler for if the
faulting address falls in JIT code itself. Actually threading through
all the right information for that check to happen is a bit tricky,
though, so this involved some refactoring:
* A closure parameter to `catch_traps` was added. This closure is
responsible for classifying addresses as whether or not they fall in
JIT code. Anything returning `false` means that the trap won't get
handled and we'll forward to the next signal handler.
* To avoid passing tons of context all over the place, the start
function is now no longer automatically invoked by `InstanceHandle`.
This avoids the need for passing all sorts of trap-handling contextual
information like the maximum stack size and "is this a jit address"
closure. Instead creators of `InstanceHandle` (like wasmtime) are now
responsible for invoking the start function.
* To avoid excessive use of `transmute` with lifetimes since the
traphandler state now has a lifetime the per-instance custom signal
handler is now replaced with a per-store custom signal handler. I'm
not entirely certain the purpose of the custom signal handler, though,
so I'd look for feedback on this part.
A new test has been added which ensures that if a host function
segfaults we don't accidentally try to handle it, and instead we
correctly report the segfault.
* Revamp memory management of `InstanceHandle`
This commit fixes a known but in Wasmtime where an instance could still
be used after it was freed. Unfortunately the fix here is a bit of a
hammer, but it's the best that we can do for now. The changes made in
this commit are:
* A `Store` now stores all `InstanceHandle` objects it ever creates.
This keeps all instances alive unconditionally (along with all host
functions and such) until the `Store` is itself dropped. Note that a
`Store` is reference counted so basically everything has to be dropped
to drop anything, there's no longer any partial deallocation of instances.
* The `InstanceHandle` type's own reference counting has been removed.
This is largely redundant with what's already happening in `Store`, so
there's no need to manage two reference counts.
* Each `InstanceHandle` no longer tracks its dependencies in terms of
instance handles. This set was actually inaccurate due to dynamic
updates to tables and such, so we needed to revamp it anyway.
* Initialization of an `InstanceHandle` is now deferred until after
`InstanceHandle::new`. This allows storing the `InstanceHandle` before
side-effectful initialization, such as copying element segments or
running the start function, to ensure that regardless of the result of
instantiation the underlying `InstanceHandle` is still available to
persist in storage.
Overall this should fix a known possible way to safely segfault Wasmtime
today (yay!) and it should also fix some flaikness I've seen on CI.
Turns out one of the spec tests
(bulk-memory-operations/partial-init-table-segment.wast) exercises this
functionality and we were hitting sporating use-after-free, but only on
Windows.
* Shuffle some APIs around
* Comment weak cycle
* Implement interrupting wasm code, reimplement stack overflow
This commit is a relatively large change for wasmtime with two main
goals:
* Primarily this enables interrupting executing wasm code with a trap,
preventing infinite loops in wasm code. Note that resumption of the
wasm code is not a goal of this commit.
* Additionally this commit reimplements how we handle stack overflow to
ensure that host functions always have a reasonable amount of stack to
run on. This fixes an issue where we might longjmp out of a host
function, skipping destructors.
Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.
A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.
When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.
Some fallout included from this change is:
* Unix signal handlers are no longer registered with `SA_ONSTACK`.
Instead they run on the native stack the thread was already using.
This is possible since stack overflow isn't handled by hitting the
guard page, but rather it's explicitly checked for in wasm now. Native
stack overflow will continue to abort the process as usual.
* Unix sigaltstack management is now no longer necessary since we don't
use it any more.
* Windows no longer has any need to reset guard pages since we no longer
try to recover from faults on guard pages.
* On all targets probestack intrinsics are disabled since we use a
different mechanism for catching stack overflow.
* The C API has been updated with interrupts handles. An example has
also been added which shows off how to interrupt a module.
Closes#139Closes#860Closes#900
* Update comment about magical interrupt value
* Store stack limit as a global value, not a closure
* Run rustfmt
* Handle review comments
* Add a comment about SA_ONSTACK
* Use `usize` for type of `INTERRUPTED`
* Parse human-readable durations
* Bring back sigaltstack handling
Allows libstd to print out stack overflow on failure still.
* Add parsing and emission of stack limit-via-preamble
* Fix new example for new apis
* Fix host segfault test in release mode
* Fix new doc example
* Compute instance exports on demand.
Instead having instances eagerly compute a Vec of Externs, and bumping
the refcount for each Extern, compute Externs on demand.
This also enables `Instance::get_export` to avoid doing a linear search.
This also means that the closure returned by `get0` and friends now
holds an `InstanceHandle` to dynamically hold the instance live rather
than being scoped to a lifetime.
* Compute module imports and exports on demand too.
And compute Extern::ty on demand too.
* Add a utility function for computing an ExternType.
* Add a utility function for looking up a function's signature.
* Add a utility function for computing the ValType of a Global.
* Rename wasmtime_environ::Export to EntityIndex.
This helps differentiate it from other Export types in the tree, and
describes what it is.
* Fix a typo in a comment.
* Simplify module imports and exports.
* Make `Instance::exports` return the export names.
This significantly simplifies the public API, as it's relatively common
to need the names, and this avoids the need to do a zip with
`Module::exports`.
This also changes `ImportType` and `ExportType` to have public members
instead of private members and accessors, as I find that simplifies the
usage particularly in cases where there are temporary instances.
* Remove `Instance::module`.
This doesn't quite remove `Instance`'s `module` member, it gets a step
closer.
* Use a InstanceHandle utility function.
* Don't consume self in the `Func::get*` methods.
Instead, just create a closure containing the instance handle and the
export for them to call.
* Use `ExactSizeIterator` to avoid needing separate `num_*` methods.
* Rename `Extern::func()` etc. to `into_func()` etc.
* Revise examples to avoid using `nth`.
* Add convenience methods to instance for getting specific extern types.
* Use the convenience functions in more tests and examples.
* Avoid cloning strings for `ImportType` and `ExportType`.
* Remove more obviated clone() calls.
* Simplify `Func`'s closure state.
* Make wasmtime::Export's fields private.
This makes them more consistent with ExportType.
* Fix compilation error.
* Make a lifetime parameter explicit, and use better lifetime names.
Instead of 'me, use 'instance and 'module to make it clear what the
lifetime is.
* More lifetime cleanups.
This commit calls `__register_frame` once for the entire frame table on
Linux.
On macOS, it still manually walks the frame table and registers each frame with
`__register_frame`.
This commit fixes an issue where the global registration of frame data
goes away once the `wasmtime::Module` has been dropped. Even after this
has been dropped, though, there may still be `wasmtime::Func` instances
which reference the original module, so it's only once the underlying
`wasmtime_runtime::Instance` has gone away that we can drop everything.
Closes#1479
This commit makes the following changes to unwind information generation in
Cranelift:
* Remove frame layout change implementation in favor of processing the prologue
and epilogue instructions when unwind information is requested. This also
means this work is no longer performed for Windows, which didn't utilize it.
It also helps simplify the prologue and epilogue generation code.
* Remove the unwind sink implementation that required each unwind information
to be represented in final form. For FDEs, this meant writing a
complete frame table per function, which wastes 20 bytes or so for each
function with duplicate CIEs. This also enables Cranelift users to collect the
unwind information and write it as a single frame table.
* For System V calling convention, the unwind information is no longer stored
in code memory (it's only a requirement for Windows ABI to do so). This allows
for more compact code memory for modules with a lot of functions.
* Deletes some duplicate code relating to frame table generation. Users can
now simply use gimli to create a frame table from each function's unwind
information.
Fixes#1181.
- Undo temporary changes to default features (`all-arch`) and a
signal-handler test.
- Remove `SIGTRAP` handler: no longer needed now that we've found an
"undefined opcode" option on ARM64.
- Rename pp.rs to pretty_print.rs in machinst/.
- Only use empty stack-probe on non-x86. As per a comment in
rust-lang/compiler-builtins [1], LLVM only supports stack probes on
x86 and x86-64. Thus, on any other CPU architecture, we cannot refer
to `__rust_probestack`, because it does not exist.
- Rename arm64 to aarch64.
- Use `target` directive in vcode filetests.
- Run the flags verifier, but without encinfo, when using new backends.
- Clean up warning overrides.
- Fix up use of casts: use u32::from(x) and siblings when possible,
u32::try_from(x).unwrap() when not, to avoid silent truncation.
- Take immutable `Function` borrows as input; we don't actually
mutate the input IR.
- Lots of other miscellaneous cleanups.
[1] cae3e6ea23/src/probestack.rs (L39)
* Consolidate trap/frame information
This commit removes `TrapRegistry` in favor of consolidating this
information in the `FRAME_INFO` we already have in the `wasmtime` crate.
This allows us to keep information generally in one place and have one
canonical location for "map this PC to some original wasm stuff". The
intent for this is to next update with enough information to go from a
program counter to a position in the original wasm file.
* Expose module offset information in `FrameInfo`
This commit implements functionality for `FrameInfo`, the wasm stack
trace of a `Trap`, to return the module/function offset. This allows
knowing the precise wasm location of each stack frame, instead of only
the main trap itself. The intention here is to provide more visibility
into the wasm source when something traps, so you know precisely where
calls were and where traps were, in order to assist in debugging.
Eventually we might use this information for mapping back to native
source languages as well (given sufficient debug information).
This change makes a previously-optional artifact of compilation always
computed on the cranelift side of things. This `ModuleAddressMap` is
then propagated to the same store of information other frame information
is stored within. This also removes the need for passing a `SourceLoc`
with wasm traps or to wasm trap creation, since the backtrace's wasm
frames will be able to infer their own `SourceLoc` from the relevant
program counters.
This commit adds a few odds and ends required to build wasmtime on ARM64
with the new backend. In particular, it adds:
- Support for the `Arm64Call` relocation type.
- Support for fetching the trap PC when a signal is received.
- A hook for `SIGTRAP`, which is sent by the `brk` opcode (in contrast to
x86's `SIGILL`).
With the patch sequence up to and including this patch applied,
`wasmtime` can now compile and successfully execute code on arm64. Not
all tests pass yet, but basic Wasm/WASI tests work correctly.
Preserve FPRs as required by the Windows fastcall calling convention.
This exposes an implementation limit due to Cranelift's approach to stack layout, which conflicts with expectations Windows makes in SEH layout - functions where the Cranelift user desires fastcall unwind information, that require preservation of an ABI-reserved FPR, that have a stack frame 240 bytes or larger, now produce an error when compiled. Several wasm spectests were disabled because they would trip this limit. This is a temporary constraint that should be fixed promptly.
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
* wasmtime: Pass around more contexts instead of fields
This commit refactors some wasmtime internals to pass around more
context-style structures rather than individual fields of each
structure. The intention here is to make the addition of fields to a
structure easier to plumb throughout the internals of wasmtime.
Currently you need to edit lots of functions to pass lots of parameters,
but ideally after this you'll only need to edit one or two struct fields
and then relevant locations have access to the information already.
Updates in this commit are:
* `debug_info` configuration is now folded into `Tunables`. Additionally
a `wasmtime::Config` now holds a `Tunables` directly and is passed
into an internal `Compiler`. Eventually this should allow for direct
configuration of the `Tunables` attributes from the `wasmtime` API,
but no new configuration is exposed at this time.
* `ModuleTranslation` is now passed around as a whole rather than
passing individual components to allow access to all the fields,
including `Tunables`.
This was motivated by investigating what it would take to optionally
allow loops and such to get interrupted, but that sort of codegen
setting was currently relatively difficult to plumb all the way through
and now it's hoped to be largely just an addition to `Tunables`.
* Fix lightbeam compile
* Option for host managed memory
* Rename Allocator to MemoryCreator
* Create LinearMemory and MemoryCreator traits in api
* Leave only one as_ptr function in LinearMemory trait
* Memory creator test
* Update comments/docs for LinearMemory and MemoryCreator traits
* Add guard page to the custom memory example
* Remove mut from LinearMemory trait as_ptr
* Host_memory_grow test
* Wasmtime 0.15.0 and Cranelift 0.62.0. (#1398)
* Bump more ad-hoc versions.
* Add build.rs to wasi-common's Cargo.toml.
* Update the env var name in more places.
* Remove a redundant echo.
This patch adds initial support for ittapi which is an open
source profiling api for instrumentation and tracing and profiling
of jitted code. Result files can be read by VTune for analysis
Build:
cargo build --features=vtune
Profile: // Using amplxe-cl from VTune
amplxe-cl -v -collect hostpost target/debug/wasmtime --vtune test.wasm
* Bump Wasmtime to 0.14.0.
* Update the publish script for the wiggle crate wiggle.
* More fixes.
* Fix lightbeam depenency version.
* cargo update
* Cargo update wasi-tests too.
And add cargo update to the version-bump scripts.
* Enable jitdump profiling support by default
This the result of some of the investigation I was doing for #1017. I've
done a number of refactorings here which culminated in a number of
changes that all amount to what I think should result in jitdump support being
enabled by default:
* Pass in a list of finished functions instead of just a range to
ensure that we're emitting jit dump data for a specific module rather
than a whole `CodeMemory` which may have other modules.
* Define `ProfilingStrategy` in the `wasmtime` crate to have everything
locally-defined
* Add support to the C API to enable profiling
* Documentation added for profiling with jitdump to the book
* Split out supported/unsupported files in `jitdump.rs` to avoid having
lots of `#[cfg]`.
* Make dependencies optional that are only used for `jitdump`.
* Move initialization up-front to `JitDumpAgent::new()` instead of
deferring it to the first module.
* Pass around `Arc<dyn ProfilingAgent>` instead of
`Option<Arc<Mutex<Box<dyn ProfilingAgent>>>>`
The `jitdump` Cargo feature is now enabled by default which means that
our published binaries, C API artifacts, and crates will support
profiling at runtime by default. The support I don't think is fully
fleshed out and working but I think it's probably in a good enough spot
we can get users playing around with it!
* Handle select relocations while generating trampolines
Trampoline generation for all function signatures exposed a preexisting
bug in wasmtime where trampoline generation occasionally does have
relocations, but it's asserted that trampolines don't generate
relocations, causing a panic. The relocation is currently primarily the
probestack function which happens when functions might have a huge
number of parameters, but not so huge as to blow the wasmparser limit of
how many parameters are allowed.
This commit fixes the issue by handling relocations for trampolines in
the same manner as the rest of the code. Note that dynamically-generated
trampolines via the `Func` API still panic if they have too many
arguments and generate a relocation, but it seems like we can try to fix
that later if the need truly arises.
Closes#1322
* Log trampoline relocations
Until #1306 is resolved (some spilling/regalloc issue with larger FPR register banks), this removes FPR32 support. Only Wasm's `i64x2.mul` was using this register class and that instruction is predicated on AVX512 support; for the time being, that instruction will have to make do with the 16 FPR registers.
* Store module name on `wasmtime_environ::Module`
This keeps all name information in one place so we dont' have to keep
extra structures around in `wasmtime::Module`.
* rustfmt
* Refactor wasmtime_runtime::Export
Instead of an enumeration with variants that have data fields have an
enumeration where each variant has a struct, and each struct has the
data fields. This allows us to store the structs in the `wasmtime` API
and avoid lots of `panic!` calls and various extraneous matches.
* Pre-generate trampoline functions
The `wasmtime` crate supports calling arbitrary function signatures in
wasm code, and to do this it generates "trampoline functions" which have
a known ABI that then internally convert to a particular signature's ABI
and call it. These trampoline functions are currently generated
on-the-fly and are cached in the global `Store` structure. This,
however, is suboptimal for a few reasons:
* Due to how code memory is managed each trampoline resides in its own
64kb allocation of memory. This means if you have N trampolines you're
using N * 64kb of memory, which is quite a lot of overhead!
* Trampolines are never free'd, even if the referencing module goes
away. This is similar to #925.
* Trampolines are a source of shared state which prevents `Store` from
being easily thread safe.
This commit refactors how trampolines are managed inside of the
`wasmtime` crate and jit/runtime internals. All trampolines are now
allocated in the same pass of `CodeMemory` that the main module is
allocated into. A trampoline is generated per-signature in a module as
well, instead of per-function. This cache of trampolines is stored
directly inside of an `Instance`. Trampolines are stored based on
`VMSharedSignatureIndex` so they can be looked up from the internals of
the `ExportFunction` value.
The `Func` API has been updated with various bits and pieces to ensure
the right trampolines are registered in the right places. Overall this
should ensure that all trampolines necessary are generated up-front
rather than lazily. This allows us to remove the trampoline cache from
the `Compiler` type, and move one step closer to making `Compiler`
threadsafe for usage across multiple threads.
Note that as one small caveat the `Func::wrap*` family of functions
don't need to generate a trampoline at runtime, they actually generate
the trampoline at compile time which gets passed in.
Also in addition to shuffling a lot of code around this fixes one minor
bug found in `code_memory.rs`, where `self.position` was loaded before
allocation, but the allocation may push a new chunk which would cause
`self.position` to be zero instead.
* Pass the `SignatureRegistry` as an argument to where it's needed.
This avoids the need for storing it in an `Arc`.
* Ignore tramoplines for functions with lots of arguments
Co-authored-by: Dan Gohman <sunfish@mozilla.com>