This commit removes all import resolution handling from the
`wasmtime-jit` crate, instead moving the logic to the `wasmtime` crate.
Previously `wasmtime-jit` had a generic `Resolver` trait and would do
all the import type matching itself, but with the upcoming
module-linking implementation this is going to get much trickier.
The goal of this commit is to centralize all meaty "preparation" logic
for instantiation into one location, probably the `wasmtime` crate
itself. Instantiation will soon involve recursive instantiation and
management of alias definitions as well. Having everything in one
location, especially with access to `Store` so we can persist
instances for safety, will be quite convenient.
Additionally the `Resolver` trait isn't really necessary any more since
imports are, at the lowest level, provided as a list rather than a map
of some kind. More generic resolution functionality is provided via
`Linker` or user layers on top of `Instance::new` itself. This makes
matching up provided items to expected imports much easier as well.
Overall this is largely just moving code around, but most of the code
in the previous `resolve_imports` phase can be deleted since a lot of it
is handled by surrounding pieces of `wasmtime` as well.
This commit moves all of the caching support that currently lives in
`wasmtime-environ` into a `wasmtime-cache` crate and makes it optional. The
goal here is to slim down the `wasmtime-environ` crate and clearly separate
boundaries where caching is a standalone and optional feature, not intertwined
with other crates.
This was added long ago at this point to assist with caching, but
caching has moved to a different level such that this wonky second level
of a `Module` isn't necessary. This commit removes the `ModuleLocal`
type to simplify accessors and generally make it easier to work with.
The implementation is pretty straightforward. Wasm atomic instructions fall
into 5 groups
* atomic read-modify-write
* atomic compare-and-swap
* atomic loads
* atomic stores
* fences
and the implementation mirrors that structure, at both the CLIF and AArch64
levels.
At the CLIF level, there are five new instructions, one for each group. Some
comments about these:
* for those that take addresses (all except fences), the address is contained
entirely in a single `Value`; there is no offset field as there is with
normal loads and stores. Wasm atomics require alignment checks, and
removing the offset makes implementation of those checks a bit simpler.
* atomic loads and stores get their own instructions, rather than reusing the
existing load and store instructions, for two reasons:
- per above comment, makes alignment checking simpler
- reuse of existing loads and stores would require extension of `MemFlags`
to indicate atomicity, which sounds semantically unclean. For example,
then *any* instruction carrying `MemFlags` could be marked as atomic, even
in cases where it is meaningless or ambiguous.
* I tried to specify, in comments, the behaviour of these instructions as
tightly as I could. Unfortunately there is no way (per my limited CLIF
knowledge) to enforce the constraint that they may only be used on I8, I16,
I32 and I64 types, and in particular not on floating point or vector types.
The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable.
At the AArch64 level, there are also five new instructions, one for each
group. All of them except `::Fence` contain multiple real machine
instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual
load-linked store-conditional loops, guarded at both ends by memory fences.
Atomic loads and stores are emitted as a load preceded by a fence, and a store
followed by a fence, respectively. The amount of fencing may be overkill, but
it reflects exactly what the SM Wasm baseline compiler for AArch64 does.
One reason to implement r-m-w and c-a-s as a single insn which is expanded
only at emission time is that we must be very careful what instructions we
allow in between the load-linked and store-conditional. In particular, we
cannot allow *any* extra memory transactions in there, since -- particularly
on low-end hardware -- that might cause the transaction to fail, hence
deadlocking the generated code. That implies that we can't present the LL/SC
loop to the register allocator as its constituent instructions, since it might
insert spills anywhere. Hence we must present it as a single indivisible
unit, as we do here. It also has the benefit of reducing the total amount of
work the RA has to do.
The only other notable feature of the r-m-w and c-a-s translations into
AArch64 code, is that they both need a scratch register internally. Rather
than faking one up by claiming, in `get_regs` that it modifies an extra
scratch register, and having to have a dummy initialisation of it, these new
instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range
x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the
cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so
as to be call-clobbered, hence their use is less likely to interfere with long
live ranges that span calls.
One subtlety regarding the use of completely fixed input and output registers
is that we must be careful how the surrounding copy from/to of the arg/result
registers is done. In particular, it is not safe to simply emit copies in
some arbitrary order if one of the arg registers is a real reg. For that
reason, the arguments are first moved into virtual regs if they are not
already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`.
Again, we rely on coalescing to turn them into no-ops in the common case.
There is also a ridealong fix for the AArch64 lowering case for
`Opcode::Trapif | Opcode::Trapff`, which removes a bug in which two trap insns
in a row were generated.
In the patch as submitted there are 6 "FIXME JRS" comments, which mark things
which I believe to be correct, but for which I would appreciate a second
opinion. Unless otherwise directed, I will remove them for the final commit
but leave the associated code/comments unchanged.
* Refactor where results of compilation are stored
This commit refactors the internals of compilation in Wasmtime to change
where results of individual function compilation are stored. Previously
compilation resulted in many maps being returned, and compilation
results generally held all these maps together. This commit instead
switches this to have all metadata stored in a `CompiledFunction`
instead of having a separate map for each item that can be stored.
The motivation for this is primarily to help out with future
module-linking-related PRs. What exactly "module level" is depends on
how we interpret modules and how many modules are in play, so it's a bit
easier for operations in wasmtime to work at the function level where
possible. This means that we don't have to pass around multiple
different maps and a function index, but instead just one map or just
one entry representing a compiled function.
Additionally this change updates where the parallelism of compilation
happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`.
This is another goal where `wasmtime-jit` will have more knowledge about
module-level pieces with module linking in play. User-facing-wise this
should be the same in terms of parallel compilation, though.
The ultimate goal of this refactoring is to make it easier for the
results of compilation to actually be a set of wasm modules. This means
we won't be able to have a map-per-metadata where the primary key is the
function index, because there will be many modules within one "object
file".
* Don't clear out fields, just don't store them
Persist a smaller set of fields in `CompilationArtifacts` instead of
trying to clear fields out and dynamically not accessing them.
* move caching to the CompilationArtifacts
* mv cache_config from Compiler to CompiledModule
* hash isa flags
* no cache for wasm2obj
* mv caching to wasmtime crate
* account each Compiler field when hash
Currently `Func::new` will panic if one of the arguments of the function
is a reference type and the `Store` doesn't have reference types
enabled. This happens because cranelift isn't configure to enable stack
maps but the register allocators expects them to exist when reference
types are seen.
The fix here is to always enable reference types in cranelift for our
trampoline generation and `Func::new`. This should hopefully ensure that
trampolines are generated correctly and they'll just not be able to get
hooked up to an `Instance` because validation will prevent reference
types from being used elsewhere.
This commit is intended to update wasmparser to 0.59.0. This primarily
includes bytecodealliance/wasm-tools#40 which is a large update to how
parsing and validation works. The impact on Wasmtime is pretty small at
this time, but over time I'd like to refactor the internals here to lean
more heavily on that upstream wasmparser refactoring.
For now, though, the intention is to get on the train of wasmparser's
latest `main` branch to ensure we get bug fixes and such.
As part of this update a few other crates and such were updated. This is
primarily to handle the new encoding of `ref.is_null` where the type is
not part of the instruction encoding any more.
- Create the ELF image from Compilation
- Create CodeMemory from the ELF image
- Link using ELF image
- Remove creation of GDB JIT images from crates/debug
- Move make_trampoline from compiler.rs
When running in embedded environments, threads creation is sometimes
undesirable. This adds a feature to toggle wasmtime's internal thread
creation for parallel compilation.
When compiling C to WASM with clang-8, __dso_handle is a global
that maybe exported but that currently is not whitelisted along
with __heap_base and _data_end to be handled as allowable depricated
exports. This PR adds the case for __dso_handle.
* wasmtime: Implement `global.{get,set}` for externref globals
We use libcalls to implement these -- unlike `table.{get,set}`, for which we
create inline JIT fast paths -- because no known toolchain actually uses
externref globals.
Part of #929
* wasmtime: Enable `{extern,func}ref` globals in the API
This is something I meant to do a long time ago but forgot to get around
to it! This commit updates the top-level documentation of the `wasmtime`
crate to have examples, more words, and generall be a bit more
up-to-date and complete.
This new fuzz target exercises sequences of `table.get`s, `table.set`s, and
GCs.
It already found a couple bugs:
* Some leaks due to ref count cycles between stores and host-defined functions
closing over those stores.
* If there are no live references for a PC, Cranelift can avoid emiting an
associated stack map. This was running afoul of a debug assertion.
Better to be loud that we don't support attaching arbitrary host info to
`externref`s than to limp along and pretend we do support it. Supporting it
properly won't reuse any of this code anyways.
`funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`.
This should be more efficient than using a `VMExternRef` that points at a
`VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic
allocation, and some reference counting.
Note that the null function reference is *NOT* a null pointer; it is a
`VMCallerCheckedAnyfunc` that has a null `func_ptr` member.
Part of #929
This introduces two changes:
- first, a Cargo feature is added to make it possible to use the
Cranelift x64 backend directly from wasmtime's CLI.
- second, when passing a `cranelift-flags` parameter, and the given
parameter's name doesn't exist at the target-independent flag level, try
to set it as a target-dependent setting.
These two changes make it possible to try out the new x64 backend with:
cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm
Right now, this will fail because most opcodes required by the
trampolines are actually not implemented yet.
This commit enables `wasmtime_runtime::Table` to internally hold elements of
either `funcref` (all that is currently supported) or `externref` (newly
introduced in this commit).
This commit updates `Table`'s API, but does NOT generally propagate those
changes outwards all the way through the Wasmtime embedding API. It only does
enough to get everything compiling and the current test suite passing. It is
expected that as we implement more of the reference types spec, we will bubble
these changes out and expose them to the embedding API.
This allows us to detect when stack walking has failed to walk the whole stack,
and we are potentially missing on-stack roots, and therefore it would be unsafe
to do a GC because we could free objects too early, leading to use-after-free.
When we detect this scenario, we skip the GC.
For host VM code, we use plain reference counting, where cloning increments
the reference count, and dropping decrements it. We can avoid many of the
on-stack increment/decrement operations that typically plague the
performance of reference counting via Rust's ownership and borrowing system.
Moving a `VMExternRef` avoids mutating its reference count, and borrowing it
either avoids the reference count increment or delays it until if/when the
`VMExternRef` is cloned.
When passing a `VMExternRef` into compiled Wasm code, we don't want to do
reference count mutations for every compiled `local.{get,set}`, nor for
every function call. Therefore, we use a variation of **deferred reference
counting**, where we only mutate reference counts when storing
`VMExternRef`s somewhere that outlives the activation: into a global or
table. Simultaneously, we over-approximate the set of `VMExternRef`s that
are inside Wasm function activations. Periodically, we walk the stack at GC
safe points, and use stack map information to precisely identify the set of
`VMExternRef`s inside Wasm activations. Then we take the difference between
this precise set and our over-approximation, and decrement the reference
count for each of the `VMExternRef`s that are in our over-approximation but
not in the precise set. Finally, the over-approximation is replaced with the
precise set.
The `VMExternRefActivationsTable` implements the over-approximized set of
`VMExternRef`s referenced by Wasm activations. Calling a Wasm function and
passing it a `VMExternRef` moves the `VMExternRef` into the table, and the
compiled Wasm function logically "borrows" the `VMExternRef` from the
table. Similarly, `global.get` and `table.get` operations clone the gotten
`VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the
reference out of the table.
When a `VMExternRef` is returned to host code from a Wasm function, the host
increments the reference count (because the reference is logically
"borrowed" from the `VMExternRefActivationsTable` and the reference count
from the table will be dropped at the next GC).
For more general information on deferred reference counting, see *An
Examination of Deferred Reference Counting and Cycle Detection* by Quinane:
https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf
cc #929Fixes#1804
* Refactor how relocs are stored and handled
* refactor CompiledModule::instantiate and link_module
* Refactor DWARF creation: split generation and serialization
* Separate DWARF data transform from instantiation
* rm LinkContext
* Moves CodeMemory, VMInterrupts and SignatureRegistry from Compiler
* CompiledModule holds CodeMemory and GdbJitImageRegistration
* Store keeps track of its JIT code
* Makes "jit_int.rs" stuff Send+Sync
* Adds the threads example.
It isn't used by anything except for the C API and all of our embedder-exposed
APIs are already internally `Rc`-based, so it doesn't make sense to use with
them.