This commit updates the various tooling used by wasmtime which has new
updates to the module linking proposal. This is done primarily to sync
with WebAssembly/module-linking#26. The main change implemented here is
that wasmtime now supports creating instances from a set of values, nott
just from instantiating a module. Additionally subtyping handling of
modules with respect to imports is now properly handled by desugaring
two-level imports to imports of instances.
A number of small refactorings are included here as well, but most of
them are in accordance with the changes to `wasmparser` and the updated
binary format for module linking.
This commit removes the global variable associated with wasm traps which
stores frame information. The only purpose of this global is to help
symbolicate `Trap`s created since we support creating a `Trap` without a
`Store`. The global, however, is only used for wasm frames on the stack,
and when wasm frames are on the stack we know that our thread local for
"what was the last context" is set and configured.
The change here is to hijack this thread-local some more to effectively
store the `Store` inside of it. All frame information is then moved
directly into `Store` and no longer lives off on the side in a global.
Additionally support for registering/unregistering modules is now
simplified because once a module is registered with a store it can never
be unregistered.
This has one slight functional change where if there are two instances
of `Store` interleaving calls to wasm code on the stack we'll only be
able to symbolicate one of them instead of both. That's arguably also a
feature however because this is sort of a way to leak information across
stores right now.
Otherwise, though, this isn't intended to change any existing logic, but
instead keep everything working as-is.
This commit refactors where trampolines and signature information is
stored within a `Store`, namely moving them from
`wasmtime_runtime::Instance` instead to `Store` itself. The goal here is
to remove an allocation inside of an `Instance` and make them a bit
cheaper to create. Additionally this should open up future possibilities
like not creating duplicate trampolines for signatures already in the
`Store` when using `Func::new`.
* wasmtime: Implement `global.{get,set}` for externref globals
We use libcalls to implement these -- unlike `table.{get,set}`, for which we
create inline JIT fast paths -- because no known toolchain actually uses
externref globals.
Part of #929
* wasmtime: Enable `{extern,func}ref` globals in the API
`funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`.
This should be more efficient than using a `VMExternRef` that points at a
`VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic
allocation, and some reference counting.
Note that the null function reference is *NOT* a null pointer; it is a
`VMCallerCheckedAnyfunc` that has a null `func_ptr` member.
Part of #929
This commit enables `wasmtime_runtime::Table` to internally hold elements of
either `funcref` (all that is currently supported) or `externref` (newly
introduced in this commit).
This commit updates `Table`'s API, but does NOT generally propagate those
changes outwards all the way through the Wasmtime embedding API. It only does
enough to get everything compiling and the current test suite passing. It is
expected that as we implement more of the reference types spec, we will bubble
these changes out and expose them to the embedding API.
For host VM code, we use plain reference counting, where cloning increments
the reference count, and dropping decrements it. We can avoid many of the
on-stack increment/decrement operations that typically plague the
performance of reference counting via Rust's ownership and borrowing system.
Moving a `VMExternRef` avoids mutating its reference count, and borrowing it
either avoids the reference count increment or delays it until if/when the
`VMExternRef` is cloned.
When passing a `VMExternRef` into compiled Wasm code, we don't want to do
reference count mutations for every compiled `local.{get,set}`, nor for
every function call. Therefore, we use a variation of **deferred reference
counting**, where we only mutate reference counts when storing
`VMExternRef`s somewhere that outlives the activation: into a global or
table. Simultaneously, we over-approximate the set of `VMExternRef`s that
are inside Wasm function activations. Periodically, we walk the stack at GC
safe points, and use stack map information to precisely identify the set of
`VMExternRef`s inside Wasm activations. Then we take the difference between
this precise set and our over-approximation, and decrement the reference
count for each of the `VMExternRef`s that are in our over-approximation but
not in the precise set. Finally, the over-approximation is replaced with the
precise set.
The `VMExternRefActivationsTable` implements the over-approximized set of
`VMExternRef`s referenced by Wasm activations. Calling a Wasm function and
passing it a `VMExternRef` moves the `VMExternRef` into the table, and the
compiled Wasm function logically "borrows" the `VMExternRef` from the
table. Similarly, `global.get` and `table.get` operations clone the gotten
`VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the
reference out of the table.
When a `VMExternRef` is returned to host code from a Wasm function, the host
increments the reference count (because the reference is logically
"borrowed" from the `VMExternRefActivationsTable` and the reference count
from the table will be dropped at the next GC).
For more general information on deferred reference counting, see *An
Examination of Deferred Reference Counting and Cycle Detection* by Quinane:
https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf
cc #929Fixes#1804
`VMExternRef` is a reference-counted box for any kind of data that is
external and opaque to running Wasm. Sometimes it might hold a Wasmtime
thing, other times it might hold something from a Wasmtime embedder and is
opaque even to us. It is morally equivalent to `Rc<dyn Any>` in Rust, but
additionally always fits in a pointer-sized word. `VMExternRef` is
non-nullable, but `Option<VMExternRef>` is a null pointer.
The one part of `VMExternRef` that can't ever be opaque to us is the
reference count. Even when we don't know what's inside an `VMExternRef`, we
need to be able to manipulate its reference count as we add and remove
references to it. And we need to do this from compiled Wasm code, so it must
be `repr(C)`!
`VMExternRef` itself is just a pointer to an `VMExternData`, which holds the
opaque, boxed value, its reference count, and its vtable pointer.
The `VMExternData` struct is *preceded* by the dynamically-sized value boxed
up and referenced by one or more `VMExternRef`s:
```ignore
,-------------------------------------------------------.
| |
V |
+----------------------------+-----------+-----------+ |
| dynamically-sized value... | ref_count | value_ptr |---'
+----------------------------+-----------+-----------+
| VMExternData |
+-----------------------+
^
+-------------+ |
| VMExternRef |-------------------+
+-------------+ |
|
+-------------+ |
| VMExternRef |-------------------+
+-------------+ |
|
... ===
|
+-------------+ |
| VMExternRef |-------------------'
+-------------+
```
The `value_ptr` member always points backwards to the start of the
dynamically-sized value (which is also the start of the heap allocation for
this value-and-`VMExternData` pair). Because it is a `dyn` pointer, it is
fat, and also points to the value's `Any` vtable.
The boxed value and the `VMExternRef` footer are held a single heap
allocation. The layout described above is used to make satisfying the
value's alignment easy: we just need to ensure that the heap allocation used
to hold everything satisfies its alignment. It also ensures that we don't
need a ton of excess padding between the `VMExternData` and the value for
values with large alignment.
Previously we initialized trap handling (signals/etc) once-per-instance
but that's a bit too granular since we only need to do this as
one-time per-program initialization. This moves the initialization to
`Store` instead which means that we'll call this at least once per
thread, which some platforms may need (none currently do, they all only
need per-program initialization, but Fuchsia will need per-thread
initialization).
This commit fixes an issue in Wasmtime where Wasmtime would accidentally
"handle" non-wasm segfaults while executing host imports of wasm
modules. If a host import segfaulted then Wasmtime would recognize that
wasm code is on the stack, so it'd longjmp out of the wasm code. This
papers over real bugs though in host code and erroneously classified
segfaults as wasm traps.
The fix here was to add a check to our wasm signal handler for if the
faulting address falls in JIT code itself. Actually threading through
all the right information for that check to happen is a bit tricky,
though, so this involved some refactoring:
* A closure parameter to `catch_traps` was added. This closure is
responsible for classifying addresses as whether or not they fall in
JIT code. Anything returning `false` means that the trap won't get
handled and we'll forward to the next signal handler.
* To avoid passing tons of context all over the place, the start
function is now no longer automatically invoked by `InstanceHandle`.
This avoids the need for passing all sorts of trap-handling contextual
information like the maximum stack size and "is this a jit address"
closure. Instead creators of `InstanceHandle` (like wasmtime) are now
responsible for invoking the start function.
* To avoid excessive use of `transmute` with lifetimes since the
traphandler state now has a lifetime the per-instance custom signal
handler is now replaced with a per-store custom signal handler. I'm
not entirely certain the purpose of the custom signal handler, though,
so I'd look for feedback on this part.
A new test has been added which ensures that if a host function
segfaults we don't accidentally try to handle it, and instead we
correctly report the segfault.
* Implement interrupting wasm code, reimplement stack overflow
This commit is a relatively large change for wasmtime with two main
goals:
* Primarily this enables interrupting executing wasm code with a trap,
preventing infinite loops in wasm code. Note that resumption of the
wasm code is not a goal of this commit.
* Additionally this commit reimplements how we handle stack overflow to
ensure that host functions always have a reasonable amount of stack to
run on. This fixes an issue where we might longjmp out of a host
function, skipping destructors.
Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.
A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.
When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.
Some fallout included from this change is:
* Unix signal handlers are no longer registered with `SA_ONSTACK`.
Instead they run on the native stack the thread was already using.
This is possible since stack overflow isn't handled by hitting the
guard page, but rather it's explicitly checked for in wasm now. Native
stack overflow will continue to abort the process as usual.
* Unix sigaltstack management is now no longer necessary since we don't
use it any more.
* Windows no longer has any need to reset guard pages since we no longer
try to recover from faults on guard pages.
* On all targets probestack intrinsics are disabled since we use a
different mechanism for catching stack overflow.
* The C API has been updated with interrupts handles. An example has
also been added which shows off how to interrupt a module.
Closes#139Closes#860Closes#900
* Update comment about magical interrupt value
* Store stack limit as a global value, not a closure
* Run rustfmt
* Handle review comments
* Add a comment about SA_ONSTACK
* Use `usize` for type of `INTERRUPTED`
* Parse human-readable durations
* Bring back sigaltstack handling
Allows libstd to print out stack overflow on failure still.
* Add parsing and emission of stack limit-via-preamble
* Fix new example for new apis
* Fix host segfault test in release mode
* Fix new doc example
* Consolidate trap/frame information
This commit removes `TrapRegistry` in favor of consolidating this
information in the `FRAME_INFO` we already have in the `wasmtime` crate.
This allows us to keep information generally in one place and have one
canonical location for "map this PC to some original wasm stuff". The
intent for this is to next update with enough information to go from a
program counter to a position in the original wasm file.
* Expose module offset information in `FrameInfo`
This commit implements functionality for `FrameInfo`, the wasm stack
trace of a `Trap`, to return the module/function offset. This allows
knowing the precise wasm location of each stack frame, instead of only
the main trap itself. The intention here is to provide more visibility
into the wasm source when something traps, so you know precisely where
calls were and where traps were, in order to assist in debugging.
Eventually we might use this information for mapping back to native
source languages as well (given sufficient debug information).
This change makes a previously-optional artifact of compilation always
computed on the cranelift side of things. This `ModuleAddressMap` is
then propagated to the same store of information other frame information
is stored within. This also removes the need for passing a `SourceLoc`
with wasm traps or to wasm trap creation, since the backtrace's wasm
frames will be able to infer their own `SourceLoc` from the relevant
program counters.
* Option for host managed memory
* Rename Allocator to MemoryCreator
* Create LinearMemory and MemoryCreator traits in api
* Leave only one as_ptr function in LinearMemory trait
* Memory creator test
* Update comments/docs for LinearMemory and MemoryCreator traits
* Add guard page to the custom memory example
* Remove mut from LinearMemory trait as_ptr
* Host_memory_grow test
* Remove C++ dependency from `wasmtime`
This commit removes the last wads of C++ that we have in wasmtime,
meaning that building wasmtime no longer requires a C++ compiler. It
still does require a C toolchain for some minor purposes, but hopefully
we can remove that over time too!
The motivation for doing this is to consolidate all our signal-handling
code into one location in one language so you don't have to keep
crossing back and forth when understanding what's going on. This also
allows us to remove some extra cruft that wasn't necessary from the C++
original implementation. Additionally this should also make building
wasmtime a bit more portable since it's often easier to acquire a C
toolchain than it is to acquire a C++ toolchain. (e.g. if you're
cross-compiling to a musl target)
* Typos
* Remove `WrappedCallable` indirection
At this point `Func` has evolved quite a bit since inception and the
`WrappedCallable` trait I don't believe is needed any longer. This
should help clean up a few entry points by having fewer traits in play.
* Remove the `Callable` trait
This commit removes the `wasmtime::Callable` trait, changing the
signature of `Func::new` to take an appropriately typed `Fn`.
Additionally the function now always takes `&Caller` like `Func::wrap`
optionally can, to empower `Func::new` to have the same capabilities of
`Func::wrap`.
* Add a test for an already-fixed issue
Closes#849
* rustfmt
* Update more locations for `Callable`
* rustfmt
* Remove a stray leading borrow
* Review feedback
* Remove unneeded `wasmtime_call_trampoline` shim
* Refactor wasmtime_runtime::Export
Instead of an enumeration with variants that have data fields have an
enumeration where each variant has a struct, and each struct has the
data fields. This allows us to store the structs in the `wasmtime` API
and avoid lots of `panic!` calls and various extraneous matches.
* Pre-generate trampoline functions
The `wasmtime` crate supports calling arbitrary function signatures in
wasm code, and to do this it generates "trampoline functions" which have
a known ABI that then internally convert to a particular signature's ABI
and call it. These trampoline functions are currently generated
on-the-fly and are cached in the global `Store` structure. This,
however, is suboptimal for a few reasons:
* Due to how code memory is managed each trampoline resides in its own
64kb allocation of memory. This means if you have N trampolines you're
using N * 64kb of memory, which is quite a lot of overhead!
* Trampolines are never free'd, even if the referencing module goes
away. This is similar to #925.
* Trampolines are a source of shared state which prevents `Store` from
being easily thread safe.
This commit refactors how trampolines are managed inside of the
`wasmtime` crate and jit/runtime internals. All trampolines are now
allocated in the same pass of `CodeMemory` that the main module is
allocated into. A trampoline is generated per-signature in a module as
well, instead of per-function. This cache of trampolines is stored
directly inside of an `Instance`. Trampolines are stored based on
`VMSharedSignatureIndex` so they can be looked up from the internals of
the `ExportFunction` value.
The `Func` API has been updated with various bits and pieces to ensure
the right trampolines are registered in the right places. Overall this
should ensure that all trampolines necessary are generated up-front
rather than lazily. This allows us to remove the trampoline cache from
the `Compiler` type, and move one step closer to making `Compiler`
threadsafe for usage across multiple threads.
Note that as one small caveat the `Func::wrap*` family of functions
don't need to generate a trampoline at runtime, they actually generate
the trampoline at compile time which gets passed in.
Also in addition to shuffling a lot of code around this fixes one minor
bug found in `code_memory.rs`, where `self.position` was loaded before
allocation, but the allocation may push a new chunk which would cause
`self.position` to be zero instead.
* Pass the `SignatureRegistry` as an argument to where it's needed.
This avoids the need for storing it in an `Arc`.
* Ignore tramoplines for functions with lots of arguments
Co-authored-by: Dan Gohman <sunfish@mozilla.com>
This adds support for the `table.copy` instruction from the bulk memory
proposal. It also supports multiple tables, which were introduced by the
reference types proposal.
Part of #928
* Add API to statically assert signature of a `Func`
This commit add a family of APIs to `Func` named `getN` where `N` is the
number of arguments. Each function will attempt to statically assert the
signature of a `Func` and, if matching, returns a corresponding closure
which can be used to invoke the underlying function.
The purpose of this commit is to add a highly optimized way to enter a
wasm module, performing type checks up front and avoiding all the costs
of boxing and unboxing arguments within a `Val`. In general this should
be much more optimized than the previous `call` API for entering a wasm
module, if the signature is statically known.
* rustfmt
* Remove stray debugging
* Remove the `jit_function_registry` global state
This commit removes on the final pieces of global state in wasmtime
today, the `jit_function_registry` module. The purpose of this module is
to help translate a native backtrace with native program counters into a
wasm backtrace with module names, function names, and wasm module
indices. To that end this module retained a global map of function
ranges to this metadata information for each compiled function.
It turns out that we already had a `NAMES` global in the `wasmtime`
crate for symbolicating backtrace addresses, so this commit moves that
global into its own file and restructures the internals to account for
program counter ranges as well. The general set of changes here are:
* Remove `jit_function_registry`
* Remove `NAMES`
* Create a new `frame_info` module which has a singleton global
registering compiled module's frame information.
* Update traps to use the `frame_info` module to symbolicate pcs,
directly extracting a `FrameInfo` from the module.
* Register and unregister information on a module level instead of on a
per-function level (at least in terms of locking granluarity).
This commit leaves the new `FRAME_INFO` global variable as the only
remaining "critical" global variable in `wasmtime`, which only exists
due to the API of `Trap` where it doesn't take in any extra context when
capturing a stack trace through which we could hang off frame
information. I'm thinking though that this is ok, and we can always
tweak the API of `Trap` in the future if necessary if we truly need to
accomodate this.
* Remove a lazy_static dep
* Add some comments and restructure
* Remove global state for trap registration
There's a number of changes brought about in this commit, motivated by a
few things. One motivation was to remove an instance of using
`lazy_static!` in an effort to remove global state and encapsulate it
wherever possible. A second motivation came when investigating a
slowly-compiling wasm module (a bit too slowly) where a good chunk of
time was spent in managing trap registrations.
The specific change made here is that `TrapRegistry` is now stored
inside of a `Compiler` instead of inside a global. Additionally traps
are "bulk registered" for a module rather than one-by-one. This form of
bulk-registration allows optimizing the locks used here, where a lock is
only held for a module at-a-time instead of once-per-function.
With these changes the "unregister" logic has also been tweaked a bit
here and there to continue to work. As a nice side effect the `Compiler`
type now has one fewer field that requires actual mutability and has
been updated for multi-threaded compilation, nudging us closer to a
world where we can support multi-threaded compilation. Yay!
In terms of performance improvements, a local wasm test file that
previously took 3 seconds to compile is now 10% faster to compile,
taking ~2.7 seconds now.
* Perform trap resolution after unwinding
This avoids taking locks in signal handlers which feels a bit iffy...
* Remove `TrapRegistration::dummy()`
Avoid an case where you're trying to lookup trap information from a
dummy module for something that happened in a different module.
* Tweak some comments
* Move `Func` to its own file
* Support `Func` imports with zero shims
This commit extends the `Func` type in the `wasmtime` crate with static
`wrap*` constructors. The goal of these constructors is to create a
`Func` type which has zero shims associated with it, creating as small
of a layer as possible between wasm code and calling imported Rust code.
This is achieved by creating an `extern "C"` shim function which matches
the ABI of what Cranelift will generate, and then the host function is
passed directly into an `InstanceHandle` to get called later. This also
enables enough inlining opportunities that LLVM will be able to see all
functions and inline everything to the point where your function is
called immediately from wasm, no questions asked.
* Improve panics/traps from imported functions
This commit performs a few refactorings and fixes a bug as well. The
changes here are:
* The `thread_local!` in the `wasmtime` crate for trap information is
removed. The thread local in the `wasmtime_runtime` crate is now
leveraged to transmit trap information.
* Panics in user-provided functions are now caught explicitly to be
carried across JIT code manually. Getting Rust panics unwinding
through JIT code is pretty likely to be super tricky and difficult to
do, so in the meantime we can get by with catching panics and resuming
the panic once we've resumed in Rust code.
* Various take/record trap apis have all been removed in favor of
working directly with `Trap` objects, where the internal trap object
has been expanded slightly to encompass user-provided errors as well.
This borrows a bit #839 and otherwise will...
Closes#848
* Rename `r#return` to `ret`
* Remove the unused EnsureDarwinMachPorts function
When compiling the C++ shims for longjmp/setjmp/signal handling we don't
use the `USE_APPLE_MACH_PORTS` directive, so this function was entirely
unused anyway. This looks to be a holdover from when this was originally
copied from C++, but no need for keeping around this now-legacy
initialization.
* Remove the `wasmtime_init_finish` function
This looks like it's perhaps largely historical cruft at this point now
I think? The function, with the removal of the mach ports from the
previous commit, only reads the initializtion state of the signal
handlers. If the signal handlers failed to get installed, though, it
simply returns early rather than erroring out anyway. In any case a
follow-up commit will refactor `wasmtime_init_eager` to handle this as
well.
* Pare down `wasmtime_init_eager`
Similar to previous commits it looks like this function may have accrued
some debt over time, nowadays it doesn't really do much other than
capture a backtrace and install signal handlers. The `lazy_static` state
isn't really that necessary and we can rely on the `Once` primitive in
the standard library for one-time initialization.
This also updates the code to unconditionally panic if signal handlers
fail to get installed, which I think is the behavior that we'll want for
now and we can enhance it over time if necessary, but I don't think we
have a use case where it's currently necessary.
* Preserve full native stack traces in errors
This commit builds on #759 by performing a few refactorings:
* The `backtrace` crate is updated to 0.3.42 which incorporates the
Windows-specific stack-walking code, so that's no longer needed.
* A full `backtrace::Backtrace` type is held in a trap at all times.
* The trap structures in the `wasmtime-*` internal crates were
refactored a bit to preserve more information and deal with raw
values rather than converting between various types and strings.
* The `wasmtime::Trap` type has been updated with these various changes.
Eventually I think we'll want to likely render full stack traces (and/or
partial wasm ones) into error messages, but for now that's left as-is
and we can always improve it later. I suspect the most relevant thing we
need to do is to implement function name symbolication for wasm
functions first, and then afterwards we can incorporate native function
names!
* Fix some test suite assertions
* Migrate back to `std::` stylistically
This commit moves away from idioms such as `alloc::` and `core::` as
imports of standard data structures and types. Instead it migrates all
crates to uniformly use `std::` for importing standard data structures
and types. This also removes the `std` and `core` features from all
crates to and removes any conditional checking for `feature = "std"`
All of this support was previously added in #407 in an effort to make
wasmtime/cranelift "`no_std` compatible". Unfortunately though this
change comes at a cost:
* The usage of `alloc` and `core` isn't idiomatic. Especially trying to
dual between types like `HashMap` from `std` as well as from
`hashbrown` causes imports to be surprising in some cases.
* Unfortunately there was no CI check that crates were `no_std`, so none
of them actually were. Many crates still imported from `std` or
depended on crates that used `std`.
It's important to note, however, that **this does not mean that wasmtime
will not run in embedded environments**. The style of the code today and
idioms aren't ready in Rust to support this degree of multiplexing and
makes it somewhat difficult to keep up with the style of `wasmtime`.
Instead it's intended that embedded runtime support will be added as
necessary. Currently only `std` is necessary to build `wasmtime`, and
platforms that natively need to execute `wasmtime` will need to use a
Rust target that supports `std`. Note though that not all of `std` needs
to be supported, but instead much of it could be configured off to
return errors, and `wasmtime` would be configured to gracefully handle
errors.
The goal of this PR is to move `wasmtime` back to idiomatic usage of
features/`std`/imports/etc and help development in the short-term.
Long-term when platform concerns arise (if any) they can be addressed by
moving back to `no_std` crates (but fixing the issues mentioned above)
or ensuring that the target in Rust has `std` available.
* Start filling out platform support doc