This commit removes the global variable associated with wasm traps which
stores frame information. The only purpose of this global is to help
symbolicate `Trap`s created since we support creating a `Trap` without a
`Store`. The global, however, is only used for wasm frames on the stack,
and when wasm frames are on the stack we know that our thread local for
"what was the last context" is set and configured.
The change here is to hijack this thread-local some more to effectively
store the `Store` inside of it. All frame information is then moved
directly into `Store` and no longer lives off on the side in a global.
Additionally support for registering/unregistering modules is now
simplified because once a module is registered with a store it can never
be unregistered.
This has one slight functional change where if there are two instances
of `Store` interleaving calls to wasm code on the stack we'll only be
able to symbolicate one of them instead of both. That's arguably also a
feature however because this is sort of a way to leak information across
stores right now.
Otherwise, though, this isn't intended to change any existing logic, but
instead keep everything working as-is.
This file has grown quite a lot with `Store` over time so this splits it
up into three separate files, one for each of the main types defined in
it: `Config`, `Engine`, and `Store`.
* this requires upgrading to wasmparser 0.67.0.
* There are no CLIF side changes because the CLIF `select` instruction is
polymorphic enough.
* on aarch64, there is unfortunately no conditional-move (csel) instruction on
vectors. This patch adds a synthetic instruction `VecCSel` which *does*
behave like that. At emit time, this is emitted as an if-then-else diamond
(4 insns).
* aarch64 implementation is otherwise straightforwards.
When invoking a WebAssembly routine from Rust code, arguments
are stored into an array of u128, and read from a piece of
generated trampoline code before calling the compiled target
function using the platform ABI calling convention.
The WasmTy/WasmRet routines handle the conversion between Rust
data types and those u128 buffers. This currently works by
in effect converting the Rust object to a u128 and then storing
this u128 into the buffer. The generated trampoline code will
then read an object of appropriate type from the beginning of
that buffer.
This does not work on big-endian platforms, since the above
approach causes the value to be stored into the rightmost
bytes of the u128 buffer, while the trampoline code reads
the leftmost bytes.
This patch fixes the problem by changing WasmTy/WasmRet to
use the leftmost bytes as well, by casting the u128 pointer
to a pointer of the correct type before storing to it (or
reading from it).
(Note that it is not necessary to actually byte-swap the
values since the trampoline code will not treat them like
WebAssembly little-endian memory, but simply access them
in native byte order.)
We've generally moved to a model where `InstanceHandle` doesn't hold
ownership of its internals, instead relying on the caller to manage
that. This removes an allocation on the `Func::wrap` path but otherwise
shouldn't have much impact.
This commit adds lots of plumbing to get the type section from the
module linking proposal plumbed all the way through to the `wasmtime`
crate and the `wasmtime-c-api` crate. This isn't all that useful right
now because Wasmtime doesn't support imported/exported
modules/instances, but this is all necessary groundwork to getting that
exported at some point. I've added some light tests but I suspect the
bulk of the testing will come in a future commit.
One major change in this commit is that `SignatureIndex` no longer
follows type type index space in a wasm module. Instead a new
`TypeIndex` type is used to track that. Function signatures, still
indexed by `SignatureIndex`, are then packed together tightly.
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on #2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.
My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.
Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
This commit updates `wasmtime::FuncType` to exactly store an internal
`WasmFuncType` from the cranelift crates. This allows us to remove a
translation layer when we are given a `FuncType` and want to get an
internal cranelift type out as a result.
The other major change from this commit was changing the constructor and
accessors of `FuncType` to be iterator-based instead of exposing
implementation details.
After compilation there's actually no need to hold onto the native
signature for a wasm function type, so this commit moves out the
`ir::Signature` value from a `Module` into a separate field that's
deallocated when compilation is finished. This simplifies the
`SignatureRegistry` because it only needs to track wasm functino types
and it also means less work is done for `Func::wrap`.
The ModuleFrameInfo and FunctionInfo data structures maintain
a list of ranges via a BTreeMap. The key to that map is one
past the end of the module/function in question. This causes
a problem in the case of immediately adjacent ranges. For
example, if we have two functions occupying adjacent ranges:
A: 0-100
B: 100-200
function A is stored with a key of 100 and B with a key of 200.
Now, when looking up the function associated with address 100,
we'd expect to find B. However the current code:
let (end, func) = info.functions.range(pc..).next()?;
if pc < func.start || *end < pc {
will look up the value 100 in the map and return function A,
which will then fail the pc < func.start check in the next
line, so the result will be failure.
To fix this problem, make sure that the key used when
registering functions or modules is the address of the
last byte, not one past the end.
This patch implements, for aarch64, the following wasm SIMD extensions
i32x4.dot_i16x8_s instruction
https://github.com/WebAssembly/simd/pull/127
It also updates dependencies as follows, in order that the new instruction can
be parsed, decoded, etc:
wat to 1.0.27
wast to 26.0.1
wasmparser to 0.65.0
wasmprinter to 0.2.12
The changes are straightforward:
* new CLIF instruction `widening_pairwise_dot_product_s`
* translation from wasm into `widening_pairwise_dot_product_s`
* new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group)
* translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv`
There is no testcase in this commit, because that is a separate repo. The
implementation has been tested, nevertheless.
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.
This isn't as big of an improvement to serialized module size as #2321
or #2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from #2318
This commit refactors where trampolines and signature information is
stored within a `Store`, namely moving them from
`wasmtime_runtime::Instance` instead to `Store` itself. The goal here is
to remove an allocation inside of an `Instance` and make them a bit
cheaper to create. Additionally this should open up future possibilities
like not creating duplicate trampolines for signatures already in the
`Store` when using `Func::new`.
Added in c4e10227de I think the original
reason (which I'm not entirely knowledgeable of) may no longer be
applicable? In any case this is a significant difference on Windows from
other platforms because it makes loads/stores of wasm code have manual
checks instead of relying on the guard page, causing runtime and
compile-time slowdowns on Windows-only.
I originally rediscovered this when investigating #2318 and saw that
both the compile time of the module in question and trap information
tables were much larger than they were on Linux. Removing this
Windows-specific configuration fixed the discrepancies and afterwards
Linux and Windows were basically the same.
Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
Similar to other data structures owned by the `Store` there's no need
for `Instance` to have a strong `Arc` reference, instead it's sufficient
for `Store` to have the owning reference.
There's no need to name each export since each synthetic instance we're
creating only has one export, so let's use the empty string which
doesn't require any allocations.
This commit adds initial (gated) support for the multi-memory wasm
proposal. This was actually quite easy since almost all of wasmtime
already expected multi-memory to be implemented one day. The only real
substantive change is the `memory.copy` intrinsic changes, which now
accounts for the source/destination memories possibly being different.
* Validate modules while translating
This commit is a change to cranelift-wasm to validate each function body
as it is translated. Additionally top-level module translation functions
will perform module validation. This commit builds on changes in
wasmparser to perform module validation interwtwined with parsing and
translation. This will be necessary for future wasm features such as
module linking where the type behind a function index, for example, can
be far away in another module. Additionally this also brings a nice
benefit where parsing the binary only happens once (instead of having an
up-front serial validation step) and validation can happen in parallel
for each function.
Most of the changes in this commit are plumbing to make sure everything
lines up right. The major functional change here is that module
compilation should be faster by validating in parallel (or skipping
function validation entirely in the case of a cache hit). Otherwise from
a user-facing perspective nothing should be that different.
This commit does mean that cranelift's translation now inherently
validates the input wasm module. This means that the Spidermonkey
integration of cranelift-wasm will also be validating the function as
it's being translated with cranelift. The associated PR for wasmparser
(bytecodealliance/wasmparser#62) provides the necessary tools to create
a `FuncValidator` for Gecko, but this is something I'll want careful
review for before landing!
* Read function operators until EOF
This way we can let the validator take care of any issues with
mismatched `end` instructions and/or trailing operators/bytes.
This commit removes all import resolution handling from the
`wasmtime-jit` crate, instead moving the logic to the `wasmtime` crate.
Previously `wasmtime-jit` had a generic `Resolver` trait and would do
all the import type matching itself, but with the upcoming
module-linking implementation this is going to get much trickier.
The goal of this commit is to centralize all meaty "preparation" logic
for instantiation into one location, probably the `wasmtime` crate
itself. Instantiation will soon involve recursive instantiation and
management of alias definitions as well. Having everything in one
location, especially with access to `Store` so we can persist
instances for safety, will be quite convenient.
Additionally the `Resolver` trait isn't really necessary any more since
imports are, at the lowest level, provided as a list rather than a map
of some kind. More generic resolution functionality is provided via
`Linker` or user layers on top of `Instance::new` itself. This makes
matching up provided items to expected imports much easier as well.
Overall this is largely just moving code around, but most of the code
in the previous `resolve_imports` phase can be deleted since a lot of it
is handled by surrounding pieces of `wasmtime` as well.
This commit moves all of the caching support that currently lives in
`wasmtime-environ` into a `wasmtime-cache` crate and makes it optional. The
goal here is to slim down the `wasmtime-environ` crate and clearly separate
boundaries where caching is a standalone and optional feature, not intertwined
with other crates.
This was added long ago at this point to assist with caching, but
caching has moved to a different level such that this wonky second level
of a `Module` isn't necessary. This commit removes the `ModuleLocal`
type to simplify accessors and generally make it easier to work with.
The implementation is pretty straightforward. Wasm atomic instructions fall
into 5 groups
* atomic read-modify-write
* atomic compare-and-swap
* atomic loads
* atomic stores
* fences
and the implementation mirrors that structure, at both the CLIF and AArch64
levels.
At the CLIF level, there are five new instructions, one for each group. Some
comments about these:
* for those that take addresses (all except fences), the address is contained
entirely in a single `Value`; there is no offset field as there is with
normal loads and stores. Wasm atomics require alignment checks, and
removing the offset makes implementation of those checks a bit simpler.
* atomic loads and stores get their own instructions, rather than reusing the
existing load and store instructions, for two reasons:
- per above comment, makes alignment checking simpler
- reuse of existing loads and stores would require extension of `MemFlags`
to indicate atomicity, which sounds semantically unclean. For example,
then *any* instruction carrying `MemFlags` could be marked as atomic, even
in cases where it is meaningless or ambiguous.
* I tried to specify, in comments, the behaviour of these instructions as
tightly as I could. Unfortunately there is no way (per my limited CLIF
knowledge) to enforce the constraint that they may only be used on I8, I16,
I32 and I64 types, and in particular not on floating point or vector types.
The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable.
At the AArch64 level, there are also five new instructions, one for each
group. All of them except `::Fence` contain multiple real machine
instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual
load-linked store-conditional loops, guarded at both ends by memory fences.
Atomic loads and stores are emitted as a load preceded by a fence, and a store
followed by a fence, respectively. The amount of fencing may be overkill, but
it reflects exactly what the SM Wasm baseline compiler for AArch64 does.
One reason to implement r-m-w and c-a-s as a single insn which is expanded
only at emission time is that we must be very careful what instructions we
allow in between the load-linked and store-conditional. In particular, we
cannot allow *any* extra memory transactions in there, since -- particularly
on low-end hardware -- that might cause the transaction to fail, hence
deadlocking the generated code. That implies that we can't present the LL/SC
loop to the register allocator as its constituent instructions, since it might
insert spills anywhere. Hence we must present it as a single indivisible
unit, as we do here. It also has the benefit of reducing the total amount of
work the RA has to do.
The only other notable feature of the r-m-w and c-a-s translations into
AArch64 code, is that they both need a scratch register internally. Rather
than faking one up by claiming, in `get_regs` that it modifies an extra
scratch register, and having to have a dummy initialisation of it, these new
instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range
x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the
cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so
as to be call-clobbered, hence their use is less likely to interfere with long
live ranges that span calls.
One subtlety regarding the use of completely fixed input and output registers
is that we must be careful how the surrounding copy from/to of the arg/result
registers is done. In particular, it is not safe to simply emit copies in
some arbitrary order if one of the arg registers is a real reg. For that
reason, the arguments are first moved into virtual regs if they are not
already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`.
Again, we rely on coalescing to turn them into no-ops in the common case.
There is also a ridealong fix for the AArch64 lowering case for
`Opcode::Trapif | Opcode::Trapff`, which removes a bug in which two trap insns
in a row were generated.
In the patch as submitted there are 6 "FIXME JRS" comments, which mark things
which I believe to be correct, but for which I would appreciate a second
opinion. Unless otherwise directed, I will remove them for the final commit
but leave the associated code/comments unchanged.
* Refactor where results of compilation are stored
This commit refactors the internals of compilation in Wasmtime to change
where results of individual function compilation are stored. Previously
compilation resulted in many maps being returned, and compilation
results generally held all these maps together. This commit instead
switches this to have all metadata stored in a `CompiledFunction`
instead of having a separate map for each item that can be stored.
The motivation for this is primarily to help out with future
module-linking-related PRs. What exactly "module level" is depends on
how we interpret modules and how many modules are in play, so it's a bit
easier for operations in wasmtime to work at the function level where
possible. This means that we don't have to pass around multiple
different maps and a function index, but instead just one map or just
one entry representing a compiled function.
Additionally this change updates where the parallelism of compilation
happens, pushing it into `wasmtime-jit` instead of `wasmtime-environ`.
This is another goal where `wasmtime-jit` will have more knowledge about
module-level pieces with module linking in play. User-facing-wise this
should be the same in terms of parallel compilation, though.
The ultimate goal of this refactoring is to make it easier for the
results of compilation to actually be a set of wasm modules. This means
we won't be able to have a map-per-metadata where the primary key is the
function index, because there will be many modules within one "object
file".
* Don't clear out fields, just don't store them
Persist a smaller set of fields in `CompilationArtifacts` instead of
trying to clear fields out and dynamically not accessing them.
* move caching to the CompilationArtifacts
* mv cache_config from Compiler to CompiledModule
* hash isa flags
* no cache for wasm2obj
* mv caching to wasmtime crate
* account each Compiler field when hash