c3e31c99461f7f109637528d805dbaa349cf4a22
2289 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c3e31c9946 |
[fuzz] Document Wasm-JS conversions (#4683)
During differential execution against V8, Wasm values need to be converted back and forth from JS values. This change documents the location in the specification where this is defined. |
||
|
|
597eb6f4ce |
Limit the type hierarchies in component fuzzing (#4668)
* Limit the type hierarchies in component fuzzing For now `wasmparser` has a hard limit on the size of tuples and such at 1000 recursive types within the tuple itself. Respect this limit by limiting the width of recursive types generated for the `component_api` fuzzer. This commit unifies this new requirement with the preexisting `TupleArray` and `NonEmptyArray` types into one `VecInRange<T, L, H>` which allow expressing all of these various requirements in one type. * Fix a compile error on `main` * Review comments |
||
|
|
918debfe59 |
Stop returning NOTCAPABLE errors from WASI calls. (#4666)
* Stop returning `NOTCAPABLE` errors from WASI calls. `ENOTCAPABLE` was an error code that is used as part of the rights system, from CloudABI. There is a set of flags associated with each file descriptor listing which operations can be performed with the file descriptor, and if an attempt is made to perform an operation with a file descriptor that isn't permitted by its rights flags, it fails with `ENOTCAPABLE`. WASI is removing the rights system. For example, WebAssembly/wasi-libc#294 removed support for translating `ENOTCAPABLE` into POSIX error codes, on the assumption that engines should stop using it. So as another step to migrating away from the rights system, remove uses of the `ENOTCAPABLE` error. * Update crates/wasi-common/src/file.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * Update crates/wasi-common/src/dir.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Jamey Sharp <jamey@minilop.net> |
||
|
|
354daf5b93 |
[fuzz] Fix issues with single-inst module generator (#4674)
* [fuzz] Fix signature of `i64.extend32_s` single-instruction test This single-instruction test incorrectly attempted to convert an `i32` to an `i64`; the correct signature is `i64 -> i64`. See the [WebAssembly specification](https://webassembly.github.io/spec/core/bikeshed/#a7-index-of-instructions). * [fuzz] Fix typo in single-instruction function generator Previously, the `unary!` macro created functions that used two operands instead of the expected one. |
||
|
|
fd28d94352 |
Shield compiled modules from their appended metadata (#4609)
This commit fixes #4600 in a somewhat roundabout fashion. Currently the `main` branch of Wasmtime exhibits unusual behavior: * If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level` test does not fail. * If `cargo test -p wasmtime --lib` is run, however, then the test fails. This test is indeed being run as part of `./ci/run-tests.sh` and it's also passing in CI. The exact failure is that part of the debuginfo support we have takes an existing ELF image, copies it, and then appends some information to inform profilers/gdb about the image. This code is all quite old at this point and not 100% optimal, but that's at least where we're at. The problem is that the appended `ProgramHeader64` is not aligned correctly during `cargo test -p wasmtime --lib`, which is the panic that happens causing the test to fail. The reason, however, that this test passes with `./ci/run-tests.sh` is that the alignment of `ProgramHeader64` is 1 instead of 8. The reason for that is that the `object` crate has an `unaligned` feature which forcibly unaligns all primitives to 1 byte instead of their natural alignment. During `cargo test -p wasmtime --lib` this feature is not enabled but during `./ci/run-tests.sh` this feature is enabled. The feature is currently enabled through inclusion of the `backtrace` crate which only happens for some tests in some crates. The alignment issue explains why the test fails on a single crate test but fails on the whole workspace tests. The next issue I investigated was if this test ever passed. It turns out that on v0.39.0 this test passed, and the regression to main was introduced during #4571. That PR, however, has nothing to do with any of this! The reason that this showed up as causing a "regression" however is because it changed cranelift settings which changed the size of serialized metadata at the end of a Wasmtime cache object. Wasmtime compiled artifacts are ELF images with Wasmtime-specific metadata appended after them. This appended metadata was making its way all the way through to the gdbjit image itself which mean that while the end of the ELF file itself was properly aligned the space after the Wasmtime metadata was not aligned. This metadata changes in size over time as Cranelift settings change which explains why #4571 was the "source" of the regression. The fix in this commit is to discard the extra Wasmtime metadata when creating an `MmapVec` representing the underlying ELF image. This is already supported with `MmapVec::drain` so it was relatively easy to insert that. This means that the gdbjit image starts with just the ELF file itself which is always aligned at the end, which gets the test passing with/without the `unaligned` feature in the `object` crate. |
||
|
|
7fa89c4a4f |
[fuzz] Fix order of operands passed in to wasm-spec-interpreter (#4672)
In #4671, the meta-differential fuzz target was finding errors when running certain Wasm modules (specifically `shr_s` in that case). @conrad-watt diagnosed the issue as a missing reversal in the operands passed to the spec interpreter. This change fixes #4671 and adds an additional unit test to keep it fixed. |
||
|
|
b17a734a57 |
Fix unused result that is #[must_use] (#4663)
Fixes this compiler warning:
```
warning: unused return value of `Box::<T>::from_raw` that must be used
--> crates/bench-api/src/lib.rs:351:9
|
351 | Box::from_raw(state as *mut BenchState);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
```
|
||
|
|
bd70dbebbd |
Deduplicate some size/align calculations (#4658)
This commit is an effort to reduce the amount of complexity around managing the size/alignment calculations of types in the canonical ABI. Previously the logic for the size/alignment of a type was spread out across a number of locations. While each individual calculation is not really the most complicated thing in the world having the duplication in so many places was constantly worrying me. I've opted in this commit to centralize all of this within the runtime at least, and now there's only one "duplicate" of this information in the fuzzing infrastructure which is to some degree less important to deduplicate. This commit introduces a new `CanonicalAbiInfo` type to house all abi size/align information for both memory32 and memory64. This new type is then used pervasively throughout fused adapter compilation, dynamic `Val` management, and typed functions. This type was also able to reduce the complexity of the macro-generated code meaning that even `wasmtime-component-macro` is performing less math than it was before. One other major feature of this commit is that this ABI information is now saved within a `ComponentTypes` structure. This avoids recursive querying of size/align information frequently and instead effectively caching it. This was a worry I had for the fused adapter compiler which frequently sought out size/align information and would recursively descend each type tree each time. The `fact-valid-module` fuzzer is now nearly 10x faster in terms of iterations/s which I suspect is due to this caching. |
||
|
|
66025636fd |
Remove a layer of recursion in adapter compilation (#4657)
In #4640 a feature was added to adapter modules that whenever translation goes through memory it instead goes through a helper function as opposed to inlining it directly. The generation of the helper function happened recursively at compile time, however, and sure enough oss-fuzz has found an input which blows the host stack at compile time. This commit removes the compile-time recursion from the adapter compiler when translating these helper functions by deferring the translation to a worklist which is processed after the original function is translated. This makes the stack-based recursion instead heap-based, removing the stack overflow. |
||
|
|
867f5c1244 |
Update behavior of zero-length lists/strings (#4648)
The spec was expected to change to not bounds-check 0-byte lists/strings but has since been updated to match `memory.copy` which does indeed check the pointer for 0-byte copies. |
||
|
|
0b1f51f804 |
Remove unnecessary parens around expression (#4647)
Fixes a compiler warning. |
||
|
|
e81ad3c7eb |
cli-flags: Don't ignore the first flag in CommonOptions::parse_from_str (#4642)
|
||
|
|
c816a52746 |
Reuse locals in adapter trampolines (#4646)
This commit implements a scheme I've been meaning to work on in the
adapter compiler where instead of always generating a fresh local for
all operations locals may now be reused. Locals generated are explicitly
free'd when their lexical scope has ended, allowing reuse in translation
of later types in the adapter.
This also implements a new scheme for initializing locals where
previously a local could simply be generated, but now the local must be
fused with its initializer where a `local.{tee,set}` instruction is
always generated. This should help prevent a bug I ran into with strings
where one usage of a local was forgotten to be initialized which meant
that when it was used during a loop it may have had a stale value from
before.
Modeling this in Rust isn't possible at compile time unfortunately so I
opted for the next best thing, runtime panics. If a local is
accidentally not released back to the pool of free locals then it will
panic. The fuzzer for simply generating and validating adapter modules
should be good at exercising this and it weeded out a few forgotten
free's and should be good now.
|
||
|
|
ec47335b9c |
wasmtime: Add a Config::native_unwind_info method (#4643)
This method configures whether native unwind information (e.g. `.eh_frame` on Linux) is generated or not. This helps integrate with third-party stack capturing tools, such as the system unwinder or the `backtrace` crate. It does not affect whether Wasmtime can capture stack traces in Wasm code that it is running or not. Unwind info is always enabled on Windows, since the Windows ABI requires it. This configuration option defaults to true. Additionally, we deprecate `Config::wasm_backtrace` since we can always cheaply capture stack traces ever since https://github.com/bytecodealliance/wasmtime/pull/4431. Fixes https://github.com/bytecodealliance/wasmtime/issues/4554 |
||
|
|
866ec46613 |
Implement roundtrip fuzzing of component adapters (#4640)
* Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow<'static, ...>` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store<Box<dyn Any>>` to try to cut down on codegen by monomorphizing fewer `Store<T>` implementation. * Add debug logging to print out what's flowing in and what's flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple<T, T, T, T, ...>` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn't lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the "stack", e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn't trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn't encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator |
||
|
|
650979ae40 |
Implement strings in adapter modules (#4623)
* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings |
||
|
|
1ce9e8aa5f |
Fix an issue in adapter module partitioning (#4622)
When an adapter module depends on a particular core wasm instance this means that it actually depends on not only that instance but all prior core wasm instances as well. This is because core wasm instances must be instantiated in the specified order within a component and that cannot change depending on the dataflow between adapters. This commit fixes a possible panic from linearizing the component dfg where an adapter module tried to depend on an instance that hadn't been instantiated yet because the ordering dependency between core wasm instances hadn't been modeled. |
||
|
|
412fa04911 |
Bump Wasmtime to 0.41.0 (#4620)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> |
||
|
|
42233e8eda |
components: ignore export aliases to types in translation. (#4604)
* components: ignore export aliases to types in translation. Currently, translation is ignoring type exports from components during translation by skipping over them before adding them to the exports map. If a component instantiates an inner component and aliases a type export of that instance, it will cause wasmtime to panic with a failure to find the export in the exports map. The fix is to add a representation for exported types to the map that is simply ignored when encountered. This also makes it easier to track places where we would have to support type exports in translation in the future. * Keep type information for type exports. This commit keeps the type information for type exports so that types can be properly aliased from an instance export and thereby adjusting the type index space accordingly. * Add a simple test case for type exports for the component model. |
||
|
|
b4d7ab36f9 |
Add a dataflow-based representation of components (#4597)
* Add a dataflow-based representation of components This commit updates the inlining phase of compiling a component to creating a dataflow-based representation of a component instead of creating a final `Component` with a linear list of initializers. This dataflow graph is then linearized in a final step to create the actual final `Component`. The motivation for this commit stems primarily from my work implementing strings in fused adapters. In doing this my plan is to defer most low-level transcoding to the host itself rather than implementing that in the core wasm adapter modules. This means that small cranelift-generated trampolines will be used for adapter modules to call which then call "transcoding libcalls". The cranelift-generated trampolines will get raw pointers into linear memory and pass those to the libcall which core wasm doesn't have access to when passing arguments to an import. Implementing this with the previous representation of a `Component` was becoming too tricky to bear. The initialization of a transcoder needed to happen at just the right time: before the adapter module which needed it was instantiated but after the linear memories referenced had been extracted into the `VMComponentContext`. The difficulty here is further compounded by the current adapter module injection pass already being quite complicated. Adapter modules are already renumbering the index space of runtime instances and shuffling items around in the `GlobalInitializer` list. Perhaps the worst part of this was that memories could already be referenced by host function imports or exports to the host, and if adapters referenced the same memory it shouldn't be referenced twice in the component. This meant that `ExtractMemory` initializers ideally needed to be shuffled around in the initializer list to happen as early as possible instead of wherever they happened to show up during translation. Overall I did my best to implement the transcoders but everything always came up short. I have decided to throw my hands up in the air and try a completely different approach to this, namely the dataflow-based representation in this commit. This makes it much easier to edit the component after initial translation for injection of adapters, injection of transcoders, adding dependencies on possibly-already-existing items, etc. The adapter module partitioning pass in this commit was greatly simplified to something which I believe is functionally equivalent but is probably an order of magnitude easier to understand. The biggest downside of this representation I believe is having a duplicate representation of a component. The `component::info` was largely duplicated into the `component::dfg` module in this commit. Personally though I think this is a more appropriate tradeoff than before because it's very easy to reason about "convert representation A to B" code whereas it was very difficult to reason about shuffling around `GlobalInitializer` items in optimal fashions. This may also have a cost at compile-time in terms of shuffling data around, but my hope is that we have lots of other low-hanging fruit to optimize if it ever comes to that which allows keeping this easier-to-understand representation. Finally, to reiterate, the final representation of components is not changed by this PR. To the runtime internals everything is still the same. * Fix compile of factc |
||
|
|
ed8908efcf |
implement fuzzing for component types (#4537)
This addresses #4307. For the static API we generate 100 arbitrary test cases at build time, each of which includes 0-5 parameter types, a result type, and a WAT fragment containing an imported function and an exported function. The exported function calls the imported function, which is implemented by the host. At runtime, the fuzz test selects a test case at random and feeds it zero or more sets of arbitrary parameters and results, checking that values which flow host-to-guest and guest-to-host make the transition unchanged. The fuzz test for the dynamic API follows a similar pattern, the only difference being that test cases are generated at runtime. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
70ce288dc7 |
Save exit Wasm FP and PC in component-to-host trampolines (#4601)
* Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts * Save exit Wasm FP and PC in component-to-host trampolines Fixes #4535 * Add comment about why we deref the trampoline's FP * Update some tests to use new `vmruntime_limits_*` methods |
||
|
|
a897742593 |
Initial back-edge CFI implementation (#3606)
Give the user the option to sign and to authenticate function return addresses with the operations introduced by the Pointer Authentication extension to the Arm instruction set architecture. Copyright (c) 2021, Arm Limited. |
||
|
|
ff6082c0af |
Improve readability of memory64 compat in fact (#4581)
This commit aims to improve the readability of supporting the memory64 proposal in the `fact` adapter trampoline compiler. Previously there were a few sprinkled blocks that used `if` to generate different instructions inline, but as I've worked on support for strings this has become pretty unwieldy as strings do far more memory manipulation than other type conversions. A pattern that's easier to read is to have small instruction helpers that take the pointer width as an argument and internally dispatch to the correct instruction. This keeps the main translation code branch-free and a bit easier to follow. Additionally for more complicated branching logic it allows for deduplicating the main translation path by having lots of little branches instead of one large branch with everything duplicated on both halves. |
||
|
|
9f82644cc3 |
Some minor cleanups/refactorings in components (#4582)
This is a collection of some minor renamings, refactorings, sharing of code, etc. This was all discovered during my addition of string support to adapter functions and I figured it'd be best to frontload this and land it ahead of the full patch since it's getting complex. |
||
|
|
0a6baeddf4 |
Improve some support in factc: (#4580)
* Support CLI parameters for string encoding * Fix `--skip-validate` * Fix printing binary to stdout |
||
|
|
f587b10eb9 |
Reduce wasm invocations in the stacks fuzzer (#4595)
On oss-fuzz a test case has been found that executes 30k iterations of a wasm trap which with a 60s timeout leaves 2ms for each invocation which under fuzzing instrumentation is a bit of a stretch with a ~20x slowdown. This commit places a limit on the number of inputs to the fuzzer at 200 to keep it reasonably sized. |
||
|
|
edf7f9f2bb |
wasmtime: Add lots of logging for externrefs and table_ops fuzz target (#4583)
I essentially add these same logs back in every time I'm debugging something related to this fuzz target or `externref`s in general. Probably like 5 times I've added roughly these logs. We should just make them available whenever we need them via `RUST_LOG=wasmtime_runtime=trace`. This also changes a couple `if let`s to `unwrap`s that are now infallible after |
||
|
|
43125aa994 |
components: fix trampoline compilation for lists. (#4579)
This commit fixes trampoline compilation for lists where the loop condition would only branch if the amount remaining was 0 instead of not 0. It resulted in only the first element of the list being copied. |
||
|
|
43f1765272 |
Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.
As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.
In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.
This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425
* Review feedback.
* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.
The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.
* Review feedback.
|
||
|
|
ff37c9d8a4 |
[cranelift] Rejigger the compile API (#4540)
* Move `emit_to_memory` to `MachCompileResult` This small refactoring makes it clearer to me that emitting to memory doesn't require anything else from the compilation `Context`. While it's a trivial change, it's a small public API change that shouldn't cause too much trouble, and doesn't seem RFC-worthy. Happy to hear different opinions about this, though! * hide the MachCompileResult behind a method * Add a `CompileError` wrapper type that references a `Function` * Rename MachCompileResult to CompiledCode * Additionally remove the last unsafe API in cranelift-codegen |
||
|
|
fb59de15af |
Implement fused adapters for (list T) types (#4558)
* Implement fused adapters for `(list T)` types This commit implements one of the two remaining types for adapter fusion, lists. This implementation is particularly tricky for a number of reasons: * Lists have a number of validity checks which need to be carefully implemented. For example the byte length of the list passed to allocation in the destination module could overflow the 32-bit index space. Additionally lists in 32-bit memories need a check that their final address is in-bounds in the address space. * In the effort to go ahead and support memory64 at the lowest layers this is where much of the magic happens. Lists are naturally always stored in memory and shifting between 64/32-bit address spaces is done here. This notably required plumbing an `Options` around during flattening/size/alignment calculations due to the size/types of lists changing depending on the memory configuration. I've also added a small `factc` program in this commit which should hopefully assist in exploring and debugging adapter modules. This takes as input a component (text or binary format) and then generates an adapter module for all component function signatures found internally. This commit notably does not include tests for lists. I tried to figure out a good way to add these but I felt like there were too many cases to test and the tests would otherwise be extremely verbose. Instead I think the best testing strategy for this commit will be through #4537 which should be relatively extensible to testing adapters between modules in addition to host-based lifting/lowering. * Improve handling of lists of 0-size types * Skip overflow checks on byte sizes for 0-size types * Skip the copy loop entirely when src/dst are both 0 * Skip the increments of src/dst pointers if either is 0-size * Update semantics for zero-sized lists/strings When a list/string has a 0-byte-size the base pointer is no longer verified to be in-bounds to match the supposedly desired adapter semantics where no trap happens because no turn of the loop happens. |
||
|
|
893fadb485 |
components: Fix support for 0-sized flags (#4560)
This commit goes through and updates support in the various argument passing routines to support 0-sized flags. A bit of a degenerate case but clarified in WebAssembly/component-model#76 as intentional. |
||
|
|
05e6abf2f6 |
Fix the stacks fuzzer in the face of stack overflow (#4557)
When the `stacks` fuzzer hits a stack overflow the trace generated by Wasmtime will have one more frame than the trace generated by the wasm itself. This comes about due to the wasm not actually pushing the final frame when it stack overflows. The host, however, will still see the final frame that triggered the stack overflow. In this situation the fuzzer asserts that the host has one extra frame and then discards the frame. |
||
|
|
04631ad0af |
Unconditionally enable component-model tests (#4556)
* Unconditionally enable component-model tests * Remove an outdated test that wasn't previously being compiled * Fix a component model doc test * Try to decrease memory usage in qemu |
||
|
|
46782b18c2 |
wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
|
||
|
|
32979b2714 |
Implement flags in fused adapters (#4549)
This implements the `flags` type for fused adapters and converting between modules. The main logic here is handling the variable size of flags in addition to the masking which happens to ignore unrelated bits when the values pass through the canonical ABI. |
||
|
|
fb7d51033c |
doc: fix c set strategy doc (#4550)
In #4252 I changed the signature and doc of the rust code, but forgot to change the doc for c. Set strategy no longer returns error. This commit fixes that. |
||
|
|
ce7bbef24d |
Implement other variant-like types in adapter fusion (#4547)
This commit fills out the adapter fusion compiler for the `union`, `enum`, `option,` and `result` types. The preexisting support for `variant` types was refactored slightly to be extensible to all of these other types and they all now call into the same common translation code. |
||
|
|
e1148e43be |
Implement char type in adapter fusion (#4544)
This commit implements the translation of `char` which validates that it's in the valid range of unicode scalar values. The precise validation here is lifted from LLVM in the hopes that it's probably better than whatever I would concoct by hand. |
||
|
|
8137432e67 |
x64: only enable VTune dependencies on x86_64 targets (#4533)
As described in #4523, the `ittapi` dependency necessary for Wasmtime's VTune support does not compile on `aarch64-linux-android`. There are several incompatible parts here: though `ittapi` supports all OS combinations that Wasmtime does and builds on all CPU targets, VTune is not primarily intended for aarch64 profiling and `linux-android` is not a high priority platform for the library. We could conditionally depend on `ittapi` for Wasmtime's supported OS combinations, but I think a better answer is to limit it to x86_64 targets, since this more clearly shows why the `ittapi`/VTune support is present and also fixes #4523. |
||
|
|
174b60dcf7 |
Add *.wast support for invoking components (#4526)
This commit builds on bytecodealliance/wasm-tools#690 to add support to testing of the component model to execute functions when running `*.wast` files. This support is all built on #4442 as functions are invoked through a "dynamic" API. Right now the testing and integration is fairly crude but I'm hoping that we can try to improve it over time as necessary. For now this should provide a hopefully more convenient syntax for unit tests and the like. |
||
|
|
285bc5ce24 |
Implement variant translation in fused adapters (#4534)
* Implement variant translation in fused adapters This commit implements the most general case of variants for fused adapter trampolines. Additionally a number of other primitive types are filled out here to assist with testing variants. The implementation internally was relatively straightforward given the shape of variants, but there's room for future optimization as necessary especially around converting locals to various types. This commit also introduces a "one off" fuzzer for adapters to ensure that the generated adapter is valid. I hope to extend this fuzz generator as more types are implemented to assist in various corner cases that might arise. For now the fuzzer simply tests that the output wasm module is valid, not that it actually executes correctly. I hope to integrate with a fuzzer along the lines of #4307 one day to test the run-time-correctness of the generated adapters as well, at which point this fuzzer would become obsolete. Finally this commit also fixes an issue with `u8` translation where upper bits weren't zero'd out and were passed raw across modules. Instead smaller-than-32 types now all mask out their upper bits and do sign-extension as appropriate for unsigned/signed variants. * Fuzz memory64 in the new trampoline fuzzer Currently memory64 isn't supported elsewhere in the component model implementation of Wasmtime but the trampoline compiler seems as good a place as any to ensure that it at least works in isolation. This plumbs through fuzz input into a `memory64` boolean which gets fed into compilation. Some miscellaneous bugs were fixed as a result to ensure that memory64 trampolines all validate correctly. * Tweak manifest for doc build |
||
|
|
0e6ffd0243 |
Don't try to report file size or timestamps for stdio streams. (#4531)
* Don't try to report file size or timestamps for stdio streams. Calling `File::metadata()` on a stdio stream handle fails on Windows, where the stdio streams are not files. This `File::metadata()` call was effectively only being used to add file size and timestamps to the result of `filestat_get`. It's common for users to redirect stdio streams to interesting places, and applications generally shouldn't change their behavior depending on the size or timestamps of the file, if the streams are redirected to a file, so just leave these fields to 0, which is commonly understood to represent "unknown". Fixes #4497. |
||
|
|
1321c234e5 |
Remove dependency on more-asserts (#4408)
* Remove dependency on `more-asserts` In my recent adventures to do a bit of gardening on our dependencies I noticed that there's a new major version for the `more-asserts` crate. Instead of updating to this though I've opted to instead remove the dependency since I don't think we heavily lean on this crate and otherwise one-off prints are probably sufficient to avoid the need for pulling in a whole crate for this. * Remove exemption for `more-asserts` |
||
|
|
81bff71d11 |
wasmtime: let serialization mismatch tests pass with RUST_BACKTRACE=1 set (#4525)
|
||
|
|
97894bc65e |
Add initial support for fused adapter trampolines (#4501)
* Add initial support for fused adapter trampolines This commit lands a significant new piece of functionality to Wasmtime's implementation of the component model in the form of the implementation of fused adapter trampolines. Internally within a component core wasm modules can communicate with each other by having their exports `canon lift`'d to get `canon lower`'d into a different component. This signifies that two components are communicating through a statically known interface via the canonical ABI at this time. Previously Wasmtime was able to identify that this communication was happening but it simply panicked with `unimplemented!` upon seeing it. This commit is the beginning of filling out this panic location with an actual implementation. The implementation route chosen here for fused adapters is to use a WebAssembly module itself for the implementation. This means that, at compile time of a component, Wasmtime is generating core WebAssembly modules which then get recursively compiled within Wasmtime as well. The choice to use WebAssembly itself as the implementation of fused adapters stems from a few motivations: * This does not represent a significant increase in the "trusted compiler base" of Wasmtime. Getting the Wasm -> CLIF translation correct once is hard enough much less for an entirely different IR to CLIF. By generating WebAssembly no new interactions with Cranelift are added which drastically reduces the possibilities for mistakes. * Using WebAssembly means that component adapters are insulated from miscompilations and mistakes. If something goes wrong it's defined well within the WebAssembly specification how it goes wrong and what happens as a result. This means that the "blast zone" for a wrong adapter is the component instance but not the entire host itself. Accesses to linear memory are guaranteed to be in-bounds and otherwise handled via well-defined traps. * A fully-finished fused adapter compiler is expected to be a significant and quite complex component of Wasmtime. Functionality along these lines is expected to be needed for Web-based polyfills of the component model and by using core WebAssembly it provides the opportunity to share code between Wasmtime and these polyfills for the component model. * Finally the runtime implementation of managing WebAssembly modules is already implemented and quite easy to integrate with, so representing fused adapters with WebAssembly results in very little extra support necessary for the runtime implementation of instantiating and managing a component. The compiler added in this commit is dubbed Wasmtime's Fused Adapter Compiler of Trampolines (FACT) because who doesn't like deriving a name from an acronym. Currently the trampoline compiler is limited in its support for interface types and only supports a few primitives. I plan on filing future PRs to flesh out the support here for all the variants of `InterfaceType`. For now this PR is primarily focused on all of the other infrastructure for the addition of a trampoline compiler. With the choice to use core WebAssembly to implement fused adapters it means that adapters need to be inserted into a module. Unfortunately adapters cannot all go into a single WebAssembly module because adapters themselves have dependencies which may be provided transitively through instances that were instantiated with other adapters. This means that a significant chunk of this PR (`adapt.rs`) is dedicated to determining precisely which adapters go into precisely which adapter modules. This partitioning process attempts to make large modules wherever it can to cut down on core wasm instantiations but is likely not optimal as it's just a simple heuristic today. With all of this added together it's now possible to start writing `*.wast` tests that internally have adapted modules communicating with one another. A `fused.wast` test suite was added as part of this PR which is the beginning of tests for the support of the fused adapter compiler added in this PR. Currently this is primarily testing some various topologies of adapters along with direct/indirect modes. This will grow many more tests over time as more types are supported. Overall I'm not 100% satisfied with the testing story of this PR. When a test fails it's very difficult to debug since everything is written in the text format of WebAssembly meaning there's no "conveniences" to print out the state of the world when things go wrong and easily debug. I think this will become even more apparent as more tests are written for more types in subsequent PRs. At this time though I know of no better alternative other than leaning pretty heavily on fuzz-testing to ensure this is all exercised. * Fix an unused field warning * Fix tests in `wasmtime-runtime` * Add some more tests for compiled trampolines * Remap exports when injecting adapters The exports of a component were accidentally left unmapped which meant that they indexed the instance indexes pre-adapter module insertion. * Fix typo * Rebase conflicts |
||
|
|
02c3b47db2 |
x64: Implement SIMD fma (#4474)
* x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args |
||
|
|
4aaf7ff8d9 |
Bump wat/wast crates (#4524)
* Bump wat/wast crates Pull in upstream updates, nothing major, just keeping up-to-date. * Record audit log for new crates |
||
|
|
7c67e620c4 |
support dynamic function calls in component model (#4442)
* support dynamic function calls in component model This addresses #4310, introducing a new `component::values::Val` type for representing component values dynamically, as well as `component::types::Type` for representing the corresponding interface types. It also adds a `call` method to `component::func::Func`, which takes a slice of `Val`s as parameters and returns a `Result<Val>` representing the result. Note that I've moved `post_return` and `call_raw` from `TypedFunc` to `Func` since there was nothing specific to `TypedFunc` about them, and I wanted to reuse them. The code in both is unchanged beyond the trivial tweaks to make them fit in their new home. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * order variants and match cases more consistently Signed-off-by: Joel Dice <joel.dice@fermyon.com> * implement lift for String, Box<str>, etc. This also removes the redundant `store` parameter from `Type::load`. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * implement code review feedback This fixes a few issues: - Bad offset calculation when lowering - Missing variant padding - Style issues regarding `types::Handle` - Missed opportunities to reuse `Lift` and `Lower` impls It also adds forwarding `Lift` impls for `Box<[T]>`, `Vec<T>`, etc. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * move `new_*` methods to specific `types` structs Per review feedback, I've moved `Type::new_record` to `Record::new_val` and added a `Type::unwrap_record` method; likewise for the other kinds of types. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * make tuple, option, and expected type comparisons recursive These types should compare as equal across component boundaries as long as their type parameters are equal. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * improve error diagnostic in `Type::check` We now distinguish between more failure cases to provide an informative error message. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * address review feedback - Remove `WasmStr::to_str_from_memory` and `WasmList::get_from_memory` - add `try_new` methods to various `values` types - avoid using `ExactSizeIterator::len` where we can't trust it - fix over-constrained bounds on forwarded `ComponentType` impls Signed-off-by: Joel Dice <joel.dice@fermyon.com> * rearrange code per review feedback - Move functions from `types` to `values` module so we can make certain struct fields private - Rename `try_new` to just `new` Signed-off-by: Joel Dice <joel.dice@fermyon.com> * remove special-case equality test for tuples, options, and expecteds Instead, I've added a FIXME comment and will open an issue to do recursive structural equality testing. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |