* Implement strings in adapter modules
This commit is a hefty addition to Wasmtime's support for the component
model. This implements the final remaining type (in the current type
hierarchy) unimplemented in adapter module trampolines: strings. Strings
are the most complicated type to implement in adapter trampolines
because they are highly structured chunks of data in memory (according
to specific encodings). Additionally each lift/lower operation can
choose its own encoding for strings meaning that Wasmtime, the host, may
have to convert between any pairwise ordering of string encodings.
The `CanonicalABI.md` in the component-model repo in general specifies
all the fiddly bits of string encoding so there's not a ton of wiggle
room for Wasmtime to get creative. This PR largely "just" implements
that. The high-level architecture of this implementation is:
* Fused adapters are first identified to determine src/dst string
encodings. This statically fixes what transcoding operation is being
performed.
* The generated adapter will be responsible for managing calls to
`realloc` and performing bounds checks. The adapter itself does not
perform memory copies or validation of string contents, however.
Instead each transcoding operation is modeled as an imported function
into the adapter module. This means that the adapter module
dynamically, during compile time, determines what string transcoders
are needed. Note that an imported transcoder is not only parameterized
over the transcoding operation but additionally which memory is the
source and which is the destination.
* The imported core wasm functions are modeled as a new
`CoreDef::Transcoder` structure. These transcoders end up being small
Cranelift-compiled trampolines. The Cranelift-compiled trampoline will
load the actual base pointer of memory and add it to the relative
pointers passed as function arguments. This trampoline then calls a
transcoder "libcall" which enters Rust-defined functions for actual
transcoding operations.
* Each possible transcoding operation is implemented in Rust with a
unique name and a unique signature depending on the needs of the
transcoder. I've tried to document inline what each transcoder does.
This means that the `Module::translate_string` in adapter modules is by
far the largest translation method. The main reason for this is due to
the management around calling the imported transcoder functions in the
face of validating string pointer/lengths and performing the dance of
`realloc`-vs-transcode at the right time. I've tried to ensure that each
individual case in transcoding is documented well enough to understand
what's going on as well.
Additionally in this PR is a full implementation in the host for the
`latin1+utf16` encoding which means that both lifting and lowering host
strings now works with this encoding.
Currently the implementation of each transcoder function is likely far
from optimal. Where possible I've leaned on the standard library itself
and for latin1-related things I'm leaning on the `encoding_rs` crate. I
initially tried to implement everything with `encoding_rs` but was
unable to uniformly do so easily. For now I settled on trying to get a
known-correct (even in the face of endianness) implementation for all of
these transcoders. If an when performance becomes an issue it should be
possible to implement more optimized versions of each of these
transcoding operations.
Testing this commit has been somewhat difficult and my general plan,
like with the `(list T)` type, is to rely heavily on fuzzing to cover
the various cases here. In this PR though I've added a simple test that
pushes some statically known strings through all the pairs of encodings
between source and destination. I've attempted to pick "interesting"
strings that one way or another stress the various paths in each
transcoding operation to ideally get full branch coverage there.
Additionally a suite of "negative" tests have also been added to ensure
that validity of encoding is actually checked.
* Fix a temporarily commented out case
* Fix wasmtime-runtime tests
* Update deny.toml configuration
* Add `BSD-3-Clause` for the `encoding_rs` crate
* Remove some unused licenses
* Add an exemption for `encoding_rs` for now
* Split up the `translate_string` method
Move out all the closures and package up captured state into smaller
lists of arguments.
* Test out-of-bounds for zero-length strings
We assert after emitting each instruction that its size was less than
the "worst-case size", which is used to determine when we need to
proactively emit an island so pending branch fixups don't go out of
bounds. However, the `EmitIsland` pseudo-inst itself can cause an
arbitrarily large island to be emitted; this should not have to fit
within the worst-case size (because island size is explicitly accounted
for by the threshold computation). This PR fixes the assert accordingly.
Fixes#4626.
* Cranelift: Don't print "skipped TEST can't run aarch64" on x64, etc
It's way too noisy. Move it to the logs.
* Cranelift: Enable Cranelift trace logs in `clif-util` by default
* cranelift-filetest: use `log::warn!` for warnings
Instead of `println!`
* rustfmt
* Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64)
Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `fma`
- Introduced missing support for `fma` on vector values, as per the
docs.
- `valltrue`
- `vanytrue`
Also fixed `fcmp` on scalar values in the interpreter, and enabled
interpreter tests in `simd-fma.clif`.
This introduces the `FMLA` machine instruction.
Copyright (c) 2022 Arm Limited
* Add comments for `Fmla` and `Bsl`
Copyright (c) 2022 Arm Limited
And the `ABICallerImpl::ir_sig` field that was used to implement that
method. This removes 56 bytes from the size of `ABICallerImpl` and gives us
speed ups to compilation of about 7% on all benchmarks.
```
compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm
Δ = 8205119.48 ± 4069474.25 (confidence = 99%)
main.so is 0.91x to 0.97x faster than feature.so!
feature.so is 1.03x to 1.10x faster than main.so!
[117729152 132258110.36 167484097] main.so
[107486500 124052990.88 138008797] feature.so
compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm
Δ = 4645258.32 ± 1981104.59 (confidence = 99%)
main.so is 0.92x to 0.97x faster than feature.so!
feature.so is 1.03x to 1.08x faster than main.so!
[76562171 85504479.28 93116863] main.so
[75180650 80859220.96 90591978] feature.so
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm
Δ = 150575617.54 ± 65021102.57 (confidence = 99%)
main.so is 0.92x to 0.97x faster than feature.so!
feature.so is 1.03x to 1.08x faster than main.so!
[2573089039 2843117485.10 3175982602] main.so
[2559784932 2692541867.56 3143529008] feature.so
```
When an adapter module depends on a particular core wasm instance this
means that it actually depends on not only that instance but all prior
core wasm instances as well. This is because core wasm instances must be
instantiated in the specified order within a component and that cannot
change depending on the dataflow between adapters. This commit fixes a
possible panic from linearizing the component dfg where an adapter
module tried to depend on an instance that hadn't been instantiated yet
because the ordering dependency between core wasm instances hadn't been
modeled.
* components: ignore export aliases to types in translation.
Currently, translation is ignoring type exports from components during
translation by skipping over them before adding them to the exports map.
If a component instantiates an inner component and aliases a type export of
that instance, it will cause wasmtime to panic with a failure to find the
export in the exports map.
The fix is to add a representation for exported types to the map that is simply
ignored when encountered. This also makes it easier to track places where we
would have to support type exports in translation in the future.
* Keep type information for type exports.
This commit keeps the type information for type exports so that types can be
properly aliased from an instance export and thereby adjusting the type index
space accordingly.
* Add a simple test case for type exports for the component model.
* Add a dataflow-based representation of components
This commit updates the inlining phase of compiling a component to
creating a dataflow-based representation of a component instead of
creating a final `Component` with a linear list of initializers. This
dataflow graph is then linearized in a final step to create the actual
final `Component`.
The motivation for this commit stems primarily from my work implementing
strings in fused adapters. In doing this my plan is to defer most
low-level transcoding to the host itself rather than implementing that
in the core wasm adapter modules. This means that small
cranelift-generated trampolines will be used for adapter modules to call
which then call "transcoding libcalls". The cranelift-generated
trampolines will get raw pointers into linear memory and pass those to
the libcall which core wasm doesn't have access to when passing
arguments to an import.
Implementing this with the previous representation of a `Component` was
becoming too tricky to bear. The initialization of a transcoder needed
to happen at just the right time: before the adapter module which needed
it was instantiated but after the linear memories referenced had been
extracted into the `VMComponentContext`. The difficulty here is further
compounded by the current adapter module injection pass already being
quite complicated. Adapter modules are already renumbering the index
space of runtime instances and shuffling items around in the
`GlobalInitializer` list. Perhaps the worst part of this was that
memories could already be referenced by host function imports or exports
to the host, and if adapters referenced the same memory it shouldn't be
referenced twice in the component. This meant that `ExtractMemory`
initializers ideally needed to be shuffled around in the initializer
list to happen as early as possible instead of wherever they happened to
show up during translation.
Overall I did my best to implement the transcoders but everything always
came up short. I have decided to throw my hands up in the air and try a
completely different approach to this, namely the dataflow-based
representation in this commit. This makes it much easier to edit the
component after initial translation for injection of adapters, injection
of transcoders, adding dependencies on possibly-already-existing items,
etc. The adapter module partitioning pass in this commit was greatly
simplified to something which I believe is functionally equivalent but
is probably an order of magnitude easier to understand.
The biggest downside of this representation I believe is having a
duplicate representation of a component. The `component::info` was
largely duplicated into the `component::dfg` module in this commit.
Personally though I think this is a more appropriate tradeoff than
before because it's very easy to reason about "convert representation A
to B" code whereas it was very difficult to reason about shuffling
around `GlobalInitializer` items in optimal fashions. This may also have
a cost at compile-time in terms of shuffling data around, but my hope is
that we have lots of other low-hanging fruit to optimize if it ever
comes to that which allows keeping this easier-to-understand
representation.
Finally, to reiterate, the final representation of components is not
changed by this PR. To the runtime internals everything is still the
same.
* Fix compile of factc
This adds full i128 support to the s390x target, including new filetests
and enabling the existing i128 runtest on s390x.
The ABI requires that i128 is passed and returned via implicit pointer,
but the front end still generates direct i128 types in call. This means
we have to implement ABI support to implicitly convert i128 types to
pointers when passing arguments.
To do so, we add a new variant ABIArg::ImplicitArg. This acts like
StructArg, except that the value type is the actual target type,
not a pointer type. The required conversions have to be inserted
in the prologue and at function call sites.
Note that when dereferencing the implicit pointer in the prologue,
we may require a temp register: the pointer may be passed on the
stack so it needs to be loaded first, but the value register may
be in the wrong class for pointer values. In this case, we use
the "stack limit" register, which should be available at this
point in the prologue.
For return values, we use a mechanism similar to the one used for
supporting multiple return values in the Wasmtime ABI. The only
difference is that the hidden pointer to the return buffer must
be the *first*, not last, argument in this case.
(This implements the second half of issue #4565.)
This addresses #4307.
For the static API we generate 100 arbitrary test cases at build time, each of
which includes 0-5 parameter types, a result type, and a WAT fragment containing
an imported function and an exported function. The exported function calls the
imported function, which is implemented by the host. At runtime, the fuzz test
selects a test case at random and feeds it zero or more sets of arbitrary
parameters and results, checking that values which flow host-to-guest and
guest-to-host make the transition unchanged.
The fuzz test for the dynamic API follows a similar pattern, the only difference
being that test cases are generated at runtime.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* Add `try_use_var` method to `cranelift-frontend`.
- Unlike `use_var`, this method does not panic if the variable has not been defined
before use
* Add `try_declare_var` and `try_def_var`.
- Also implement Error for error enums.
* Use `write!` macro.
* Add `write!` use I missed.
* Port `Shuffle` to ISLE (AArch64)
Ported the existing implementation of `Shuffle` for AArch64 to ISLE.
Copyright (c) 2022 Arm Limited
* Cleanup by shadowing `rn`, `rn2`, and `_`
Copyright (c) 2022 Arm Limited
* Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts
* Save exit Wasm FP and PC in component-to-host trampolines
Fixes#4535
* Add comment about why we deref the trampoline's FP
* Update some tests to use new `vmruntime_limits_*` methods
This adds support for StructArgument on s390x. The ABI for this
platform requires that the address of the buffer holding the copy
of the struct argument is passed from caller to callee as hidden
pointer, using a register or overflow stack slot.
To implement this, I've added an optional "pointer" filed to
ABIArg::StructArg, and code to handle the pointer both in common
abi_impl code and the s390x back-end.
One notable change necessary to make this work involved the
"copy_to_arg_order" mechanism. Currently, for struct args
we only need to copy the data (and that need to happen before
setting up any other args), while for non-struct args we only
need to set up the appropriate registers or stack slots.
This order is ensured by sorting the arguments appropriately
into a "copy_to_arg_order" list.
However, for struct args with explicit pointers we need to *both*
copy the data (again, before everything else), *and* set up a
register or stack slot. Since we now need to touch the argument
twice, we cannot solve the ordering problem by a simple sort.
Instead, the abi_impl common code now provided *two* callbacks,
emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects
the back end to first call copy..to_buffer for all args, and
then call copy.._to_arg for all args. This required updates
to all back ends.
In the s390x back end, in addition to the new ABI code, I'm now
adding code to actually copy the struct data, using the MVC
instruction (for small buffers) or a memcpy libcall (for larger
buffers). This also requires a bit of new infrastructure:
- MVC is the first memory-to-memory instruction we use, which
needed a bit of memory argument tweaking
- We also need to set up the infrastructure to emit libcalls.
(This implements the first half of issue #4565.)
Give the user the option to sign and to authenticate function
return addresses with the operations introduced by the Pointer
Authentication extension to the Arm instruction set architecture.
Copyright (c) 2021, Arm Limited.
This commit aims to improve the readability of supporting the memory64
proposal in the `fact` adapter trampoline compiler. Previously there
were a few sprinkled blocks that used `if` to generate different
instructions inline, but as I've worked on support for strings this has
become pretty unwieldy as strings do far more memory manipulation than
other type conversions. A pattern that's easier to read is to have
small instruction helpers that take the pointer width as an argument and
internally dispatch to the correct instruction. This keeps the main
translation code branch-free and a bit easier to follow. Additionally
for more complicated branching logic it allows for deduplicating the
main translation path by having lots of little branches instead of one
large branch with everything duplicated on both halves.
This is a collection of some minor renamings, refactorings, sharing of
code, etc. This was all discovered during my addition of string support
to adapter functions and I figured it'd be best to frontload this and
land it ahead of the full patch since it's getting complex.
On oss-fuzz a test case has been found that executes 30k iterations of a
wasm trap which with a 60s timeout leaves 2ms for each invocation which
under fuzzing instrumentation is a bit of a stretch with a ~20x
slowdown. This commit places a limit on the number of inputs to the
fuzzer at 200 to keep it reasonably sized.
I essentially add these same logs back in every time I'm debugging something
related to this fuzz target or `externref`s in general. Probably like 5 times
I've added roughly these logs. We should just make them available whenever we
need them via `RUST_LOG=wasmtime_runtime=trace`.
This also changes a couple `if let`s to `unwrap`s that are now infallible after
* Cranelift: Add instructions for getting the current stack/frame pointers and return address
This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535
* x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead
We just special case getting operands from `Amode`s now.
* Fix s390x `get_return_address`; require `preserve_frame_pointers=true`
* Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp
* Handle non-allocatable registers in Amode::with_allocs
* Use "stack" instead of "r15" on s390x
* r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`
The gen_copy_arg_to_regs routine currently ignores argument extension
flags when loading incoming arguments. This causes a problem with
stack arguments on big-endian systems, since the argument address
points to the word on the stack as extended by the caller, but the
generated code only loads the inner type from the address, causing
it to receive an incorrect value. (This happens to work on little-
endian systems.)
Fixed by loading extended arguments as full words.
* Re-enable component model `*.wast` tests
These accidentally stopped running as part of #4556 on CI since I forgot
one more location to touch a feature gate.
* Enable logging in component tests
This is a small convenience to get log messages during testing for
components by default.
This commit fixes trampoline compilation for lists where the loop condition
would only branch if the amount remaining was 0 instead of not 0.
It resulted in only the first element of the list being copied.
* Cranellift: remove Baldrdash support and related features.
As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.
In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.
This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425
* Review feedback.
* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.
The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.
* Review feedback.
* Move `emit_to_memory` to `MachCompileResult`
This small refactoring makes it clearer to me that emitting to memory
doesn't require anything else from the compilation `Context`. While it's
a trivial change, it's a small public API change that shouldn't cause
too much trouble, and doesn't seem RFC-worthy. Happy to hear different
opinions about this, though!
* hide the MachCompileResult behind a method
* Add a `CompileError` wrapper type that references a `Function`
* Rename MachCompileResult to CompiledCode
* Additionally remove the last unsafe API in cranelift-codegen
* Cranelift: Don't `emit` inside lowering rules in aarch64
The lowering rules should be "pure" and side-effect free, using helpers defined
in `inst.isle` to perform actual side effects like emitting instructions.
* Cranelift: use 80 width for section separators in aarch64 lowering rules
* Implement fused adapters for `(list T)` types
This commit implements one of the two remaining types for adapter
fusion, lists. This implementation is particularly tricky for a number
of reasons:
* Lists have a number of validity checks which need to be carefully
implemented. For example the byte length of the list passed to
allocation in the destination module could overflow the 32-bit index
space. Additionally lists in 32-bit memories need a check that their
final address is in-bounds in the address space.
* In the effort to go ahead and support memory64 at the lowest layers
this is where much of the magic happens. Lists are naturally always
stored in memory and shifting between 64/32-bit address spaces
is done here. This notably required plumbing an `Options` around
during flattening/size/alignment calculations due to the size/types of
lists changing depending on the memory configuration.
I've also added a small `factc` program in this commit which should
hopefully assist in exploring and debugging adapter modules. This takes
as input a component (text or binary format) and then generates an
adapter module for all component function signatures found internally.
This commit notably does not include tests for lists. I tried to figure
out a good way to add these but I felt like there were too many cases to
test and the tests would otherwise be extremely verbose. Instead I think
the best testing strategy for this commit will be through #4537 which
should be relatively extensible to testing adapters between modules in
addition to host-based lifting/lowering.
* Improve handling of lists of 0-size types
* Skip overflow checks on byte sizes for 0-size types
* Skip the copy loop entirely when src/dst are both 0
* Skip the increments of src/dst pointers if either is 0-size
* Update semantics for zero-sized lists/strings
When a list/string has a 0-byte-size the base pointer is no longer
verified to be in-bounds to match the supposedly desired adapter
semantics where no trap happens because no turn of the loop happens.
* Support shadowing in isle
* Re-run the isle build.rs if the examples change
* Print error messages when isle tests fail
* Move run tests
* Refactor `let` uses that don't need to introduce unique names
This commit goes through and updates support in the various argument
passing routines to support 0-sized flags. A bit of a degenerate case
but clarified in WebAssembly/component-model#76 as intentional.
When the `stacks` fuzzer hits a stack overflow the trace generated by
Wasmtime will have one more frame than the trace generated by the wasm
itself. This comes about due to the wasm not actually pushing the final
frame when it stack overflows. The host, however, will still see the
final frame that triggered the stack overflow.
In this situation the fuzzer asserts that the host has one extra frame
and then discards the frame.
* Unconditionally enable component-model tests
* Remove an outdated test that wasn't previously being compiled
* Fix a component model doc test
* Try to decrease memory usage in qemu
The ISLE language's lexer previously used a very primitive
`i64::from_str_radix` call to parse integer constants, allowing values
in the range -2^63..2^63 only. Also, underscores to separate digits (as
is allwoed in Rust) were not supported. Finally, 128-bit constants were
not supported at all.
This PR addresses all issues above:
- Integer constants are internally stored as 128-bit values.
- Parsing supports either signed (-2^127..2^127) or unsigned (0..2^128)
range. Negation works independently of that, so one can write
`-0xffff..ffff` (128 bits wide, i.e., -(2^128-1)) to get a `1`.
- Underscores are supported to separate groups of digits, so one can
write `0xffff_ffff`.
- A minor oversight was fixed: hex constants can start with `0X`
(uppercase) as well as `0x`, for consistency with Rust and C.
This PR also adds a new kind of ISLE test that actually runs a driver
linked to compiled ISLE code; we previously didn't have any such tests,
but it is now quite useful to assert correct interpretation of constant
values.