* Add initial support for fused adapter trampolines
This commit lands a significant new piece of functionality to Wasmtime's
implementation of the component model in the form of the implementation
of fused adapter trampolines. Internally within a component core wasm
modules can communicate with each other by having their exports
`canon lift`'d to get `canon lower`'d into a different component. This
signifies that two components are communicating through a statically
known interface via the canonical ABI at this time. Previously Wasmtime
was able to identify that this communication was happening but it simply
panicked with `unimplemented!` upon seeing it. This commit is the
beginning of filling out this panic location with an actual
implementation.
The implementation route chosen here for fused adapters is to use a
WebAssembly module itself for the implementation. This means that, at
compile time of a component, Wasmtime is generating core WebAssembly
modules which then get recursively compiled within Wasmtime as well. The
choice to use WebAssembly itself as the implementation of fused adapters
stems from a few motivations:
* This does not represent a significant increase in the "trusted
compiler base" of Wasmtime. Getting the Wasm -> CLIF translation
correct once is hard enough much less for an entirely different IR to
CLIF. By generating WebAssembly no new interactions with Cranelift are
added which drastically reduces the possibilities for mistakes.
* Using WebAssembly means that component adapters are insulated from
miscompilations and mistakes. If something goes wrong it's defined
well within the WebAssembly specification how it goes wrong and what
happens as a result. This means that the "blast zone" for a wrong
adapter is the component instance but not the entire host itself.
Accesses to linear memory are guaranteed to be in-bounds and otherwise
handled via well-defined traps.
* A fully-finished fused adapter compiler is expected to be a
significant and quite complex component of Wasmtime. Functionality
along these lines is expected to be needed for Web-based polyfills of
the component model and by using core WebAssembly it provides the
opportunity to share code between Wasmtime and these polyfills for the
component model.
* Finally the runtime implementation of managing WebAssembly modules is
already implemented and quite easy to integrate with, so representing
fused adapters with WebAssembly results in very little extra support
necessary for the runtime implementation of instantiating and managing
a component.
The compiler added in this commit is dubbed Wasmtime's Fused Adapter
Compiler of Trampolines (FACT) because who doesn't like deriving a name
from an acronym. Currently the trampoline compiler is limited in its
support for interface types and only supports a few primitives. I plan
on filing future PRs to flesh out the support here for all the variants
of `InterfaceType`. For now this PR is primarily focused on all of the
other infrastructure for the addition of a trampoline compiler.
With the choice to use core WebAssembly to implement fused adapters it
means that adapters need to be inserted into a module. Unfortunately
adapters cannot all go into a single WebAssembly module because adapters
themselves have dependencies which may be provided transitively through
instances that were instantiated with other adapters. This means that a
significant chunk of this PR (`adapt.rs`) is dedicated to determining
precisely which adapters go into precisely which adapter modules. This
partitioning process attempts to make large modules wherever it can to
cut down on core wasm instantiations but is likely not optimal as
it's just a simple heuristic today.
With all of this added together it's now possible to start writing
`*.wast` tests that internally have adapted modules communicating with
one another. A `fused.wast` test suite was added as part of this PR
which is the beginning of tests for the support of the fused adapter
compiler added in this PR. Currently this is primarily testing some
various topologies of adapters along with direct/indirect modes. This
will grow many more tests over time as more types are supported.
Overall I'm not 100% satisfied with the testing story of this PR. When a
test fails it's very difficult to debug since everything is written in
the text format of WebAssembly meaning there's no "conveniences" to
print out the state of the world when things go wrong and easily debug.
I think this will become even more apparent as more tests are written
for more types in subsequent PRs. At this time though I know of no
better alternative other than leaning pretty heavily on fuzz-testing to
ensure this is all exercised.
* Fix an unused field warning
* Fix tests in `wasmtime-runtime`
* Add some more tests for compiled trampolines
* Remap exports when injecting adapters
The exports of a component were accidentally left unmapped which meant
that they indexed the instance indexes pre-adapter module insertion.
* Fix typo
* Rebase conflicts
* Add shared memories
This change adds the ability to use shared memories in Wasmtime when the
[threads proposal] is enabled. Shared memories are annotated as `shared`
in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are
protected from concurrent access during `memory.size` and `memory.grow`.
[threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md
In order to implement this in Wasmtime, there are two main cases to
cover:
- a program may simply create a shared memory and possibly export it;
this means that Wasmtime itself must be able to create shared
memories
- a user may create a shared memory externally and pass it in as an
import during instantiation; this is the case when the program
contains code like `(import "env" "memory" (memory 1 1
shared))`--this case is handled by a new Wasmtime API
type--`SharedMemory`
Because of the first case, this change allows any of the current
memory-creation mechanisms to work as-is. Wasmtime can still create
either static or dynamic memories in either on-demand or pooling modes,
and any of these memories can be considered shared. When shared, the
`Memory` runtime container will lock appropriately during `memory.size`
and `memory.grow` operations; since all memories use this container, it
is an ideal place for implementing the locking once and once only.
The second case is covered by the new `SharedMemory` structure. It uses
the same `Mmap` allocation under the hood as non-shared memories, but
allows the user to perform the allocation externally to Wasmtime and
share the memory across threads (via an `Arc`). The pointer address to
the actual memory is carefully wired through and owned by the
`SharedMemory` structure itself. This means that there are differing
views of where to access the pointer (i.e., `VMMemoryDefinition`): for
owned memories (the default), the `VMMemoryDefinition` is stored
directly by the `VMContext`; in the `SharedMemory` case, however, this
`VMContext` must point to this separate structure.
To ensure that the `VMContext` can always point to the correct
`VMMemoryDefinition`, this change alters the `VMContext` structure.
Since a `SharedMemory` owns its own `VMMemoryDefinition`, the
`defined_memories` table in the `VMContext` becomes a sequence of
pointers--in the shared memory case, they point to the
`VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory
case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a
new table, `owned_memories`.
This change adds an additional indirection (through the `*mut
VMMemoryDefinition` pointer) that could add overhead. Using an imported
memory as a proxy, we measured a 1-3% overhead of this approach on the
`pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will
special-case the owned memory access (i.e., load a pointer directly to
the `owned_memories` entry) for `memory.size` so that only
shared memories (and imported memories, as before) incur the indirection
cost.
* review: remove thread feature check
* review: swap wasmtime-types dependency for existing wasmtime-environ use
* review: remove unused VMMemoryUnion
* review: reword cross-engine error message
* review: improve tests
* review: refactor to separate prevent Memory <-> SharedMemory conversion
* review: into_shared_memory -> as_shared_memory
* review: remove commented out code
* review: limit shared min/max to 32 bits
* review: skip imported memories
* review: imported memories are not owned
* review: remove TODO
* review: document unsafe send + sync
* review: add limiter assertion
* review: remove TODO
* review: improve tests
* review: fix doc test
* fix: fixes based on discussion with Alex
This changes several key parts:
- adds memory indexes to imports and exports
- makes `VMMemoryDefinition::current_length` an atomic usize
* review: add `Extern::SharedMemory`
* review: remove TODO
* review: atomically load from VMMemoryDescription in JIT-generated code
* review: add test probing the last available memory slot across threads
* fix: move assertion to new location due to rebase
* fix: doc link
* fix: add TODOs to c-api
* fix: broken doc link
* fix: modify pooling allocator messages in tests
* review: make owned_memory_index panic instead of returning an option
* review: clarify calculation of num_owned_memories
* review: move 'use' to top of file
* review: change '*const [u8]' to '*mut [u8]'
* review: remove TODO
* review: avoid hard-coding memory index
* review: remove 'preallocation' parameter from 'Memory::_new'
* fix: component model memory length
* review: check that shared memory plans are static
* review: ignore growth limits for shared memory
* review: improve atomic store comment
* review: add FIXME for memory growth failure
* review: add comment about absence of bounds-checked 'memory.size'
* review: make 'current_length()' doc comment more precise
* review: more comments related to memory.size non-determinism
* review: make 'vmmemory' unreachable for shared memory
* review: move code around
* review: thread plan through to 'wrap()'
* review: disallow shared memory allocation with the pooling allocator
* Move `CompiledFunction` into wasmtime-cranelift
This commit moves the `wasmtime_environ::CompiledFunction` type into the
`wasmtime-cranelift` crate. This type has lots of Cranelift-specific
pieces of compilation and doesn't need to be generated by all Wasmtime
compilers. This replaces the usage in the `Compiler` trait with a
`Box<Any>` type that each compiler can select. Each compiler must still
produce a `FunctionInfo`, however, which is shared information we'll
deserialize for each module.
The `wasmtime-debug` crate is also folded into the `wasmtime-cranelift`
crate as a result of this commit. One possibility was to move the
`CompiledFunction` commit into its own crate and have `wasmtime-debug`
depend on that, but since `wasmtime-debug` is Cranelift-specific at this
time it didn't seem like it was too too necessary to keep it separate.
If `wasmtime-debug` supports other backends in the future we can
recreate a new crate, perhaps with it refactored to not depend on
Cranelift.
* Move wasmtime_environ::reference_type
This now belongs in wasmtime-cranelift and nowhere else
* Remove `Type` reexport in wasmtime-environ
One less dependency on `cranelift-codegen`!
* Remove `types` reexport from `wasmtime-environ`
Less cranelift!
* Remove `SourceLoc` from wasmtime-environ
Change the `srcloc`, `start_srcloc`, and `end_srcloc` fields to a custom
`FilePos` type instead of `ir::SourceLoc`. These are only used in a few
places so there's not much to lose from an extra abstraction for these
leaf use cases outside of cranelift.
* Remove wasmtime-environ's dep on cranelift's `StackMap`
This commit "clones" the `StackMap` data structure in to
`wasmtime-environ` to have an independent representation that that
chosen by Cranelift. This allows Wasmtime to decouple this runtime
dependency of stack map information and let the two evolve
independently, if necessary.
An alternative would be to refactor cranelift's implementation into a
separate crate and have wasmtime depend on that but it seemed a bit like
overkill to do so and easier to clone just a few lines for this.
* Define code offsets in wasmtime-environ with `u32`
Don't use Cranelift's `binemit::CodeOffset` alias to define this field
type since the `wasmtime-environ` crate will be losing the
`cranelift-codegen` dependency soon.
* Commit to using `cranelift-entity` in Wasmtime
This commit removes the reexport of `cranelift-entity` from the
`wasmtime-environ` crate and instead directly depends on the
`cranelift-entity` crate in all referencing crates. The original reason
for the reexport was to make cranelift version bumps easier since it's
less versions to change, but nowadays we have a script to do that.
Otherwise this encourages crates to use whatever they want from
`cranelift-entity` since we'll always depend on the whole crate.
It's expected that the `cranelift-entity` crate will continue to be a
lean crate in dependencies and suitable for use at both runtime and
compile time. Consequently there's no need to avoid its usage in
Wasmtime at runtime, since "remove Cranelift at compile time" is
primarily about the `cranelift-codegen` crate.
* Remove most uses of `cranelift-codegen` in `wasmtime-environ`
There's only one final use remaining, which is the reexport of
`TrapCode`, which will get handled later.
* Limit the glob-reexport of `cranelift_wasm`
This commit removes the glob reexport of `cranelift-wasm` from the
`wasmtime-environ` crate. This is intended to explicitly define what
we're reexporting and is a transitionary step to curtail the amount of
dependencies taken on `cranelift-wasm` throughout the codebase. For
example some functions used by debuginfo mapping are better imported
directly from the crate since they're Cranelift-specific. Note that
this is intended to be a temporary state affairs, soon this reexport
will be gone entirely.
Additionally this commit reduces imports from `cranelift_wasm` and also
primarily imports from `crate::wasm` within `wasmtime-environ` to get a
better sense of what's imported from where and what will need to be
shared.
* Extract types from cranelift-wasm to cranelift-wasm-types
This commit creates a new crate called `cranelift-wasm-types` and
extracts type definitions from the `cranelift-wasm` crate into this new
crate. The purpose of this crate is to be a shared definition of wasm
types that can be shared both by compilers (like Cranelift) as well as
wasm runtimes (e.g. Wasmtime). This new `cranelift-wasm-types` crate
doesn't depend on `cranelift-codegen` and is the final step in severing
the unconditional dependency from Wasmtime to `cranelift-codegen`.
The final refactoring in this commit is to then reexport this crate from
`wasmtime-environ`, delete the `cranelift-codegen` dependency, and then
update all `use` paths to point to these new types.
The main change of substance here is that the `TrapCode` enum is
mirrored from Cranelift into this `cranelift-wasm-types` crate. While
this unfortunately results in three definitions (one more which is
non-exhaustive in Wasmtime itself) it's hopefully not too onerous and
ideally something we can patch up in the future.
* Get lightbeam compiling
* Remove unnecessary dependency
* Fix compile with uffd
* Update publish script
* Fix more uffd tests
* Rename cranelift-wasm-types to wasmtime-types
This reflects the purpose a bit more where it's types specifically
intended for Wasmtime and its support.
* Fix publish script
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.
This commit updates the various tooling used by wasmtime which has new
updates to the module linking proposal. This is done primarily to sync
with WebAssembly/module-linking#26. The main change implemented here is
that wasmtime now supports creating instances from a set of values, nott
just from instantiating a module. Additionally subtyping handling of
modules with respect to imports is now properly handled by desugaring
two-level imports to imports of instances.
A number of small refactorings are included here as well, but most of
them are in accordance with the changes to `wasmparser` and the updated
binary format for module linking.
* Implement imported/exported modules/instances
This commit implements the final piece of the module linking proposal
which is to flesh out the support for importing/exporting instances and
modules. This ended up having a few changes:
* Two more `PrimaryMap` instances are now stored in an `Instance`. The value
for instances is `InstanceHandle` (pretty easy) and for modules it's
`Box<dyn Any>` (less easy).
* The custom host state for `InstanceHandle` for `wasmtime` is now
`Arc<TypeTables` to be able to fully reconstruct an instance's types
just from its instance.
* Type matching for imports now has been updated to take
instances/modules into account.
One of the main downsides of this implementation is that type matching
of imports is duplicated between wasmparser and wasmtime, leading to
posssible bugs especially in the subtelties of module linking. I'm not
sure how best to unify these two pieces of validation, however, and it
may be more trouble than it's worth.
cc #2094
* Update wat/wast/wasmparser
* Review comments
* Fix a bug in publish script to vendor the right witx
Currently there's two witx binaries in our repository given the two wasi
spec submodules, so this updates the publication script to vendor the
right one.
`funcref`s are implemented as `NonNull<VMCallerCheckedAnyfunc>`.
This should be more efficient than using a `VMExternRef` that points at a
`VMCallerCheckedAnyfunc` because it gets rid of an indirection, dynamic
allocation, and some reference counting.
Note that the null function reference is *NOT* a null pointer; it is a
`VMCallerCheckedAnyfunc` that has a null `func_ptr` member.
Part of #929
* Refactor wasmtime_runtime::Export
Instead of an enumeration with variants that have data fields have an
enumeration where each variant has a struct, and each struct has the
data fields. This allows us to store the structs in the `wasmtime` API
and avoid lots of `panic!` calls and various extraneous matches.
* Pre-generate trampoline functions
The `wasmtime` crate supports calling arbitrary function signatures in
wasm code, and to do this it generates "trampoline functions" which have
a known ABI that then internally convert to a particular signature's ABI
and call it. These trampoline functions are currently generated
on-the-fly and are cached in the global `Store` structure. This,
however, is suboptimal for a few reasons:
* Due to how code memory is managed each trampoline resides in its own
64kb allocation of memory. This means if you have N trampolines you're
using N * 64kb of memory, which is quite a lot of overhead!
* Trampolines are never free'd, even if the referencing module goes
away. This is similar to #925.
* Trampolines are a source of shared state which prevents `Store` from
being easily thread safe.
This commit refactors how trampolines are managed inside of the
`wasmtime` crate and jit/runtime internals. All trampolines are now
allocated in the same pass of `CodeMemory` that the main module is
allocated into. A trampoline is generated per-signature in a module as
well, instead of per-function. This cache of trampolines is stored
directly inside of an `Instance`. Trampolines are stored based on
`VMSharedSignatureIndex` so they can be looked up from the internals of
the `ExportFunction` value.
The `Func` API has been updated with various bits and pieces to ensure
the right trampolines are registered in the right places. Overall this
should ensure that all trampolines necessary are generated up-front
rather than lazily. This allows us to remove the trampoline cache from
the `Compiler` type, and move one step closer to making `Compiler`
threadsafe for usage across multiple threads.
Note that as one small caveat the `Func::wrap*` family of functions
don't need to generate a trampoline at runtime, they actually generate
the trampoline at compile time which gets passed in.
Also in addition to shuffling a lot of code around this fixes one minor
bug found in `code_memory.rs`, where `self.position` was loaded before
allocation, but the allocation may push a new chunk which would cause
`self.position` to be zero instead.
* Pass the `SignatureRegistry` as an argument to where it's needed.
This avoids the need for storing it in an `Arc`.
* Ignore tramoplines for functions with lots of arguments
Co-authored-by: Dan Gohman <sunfish@mozilla.com>