d3a6181939015eabd30d0c154b1cf2922e9dc4ba
535 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d3a6181939 |
Add support for keeping pooling allocator pages resident (#5207)
When new wasm instances are created repeatedly in high-concurrency environments one of the largest bottlenecks is the contention on kernel-level locks having to do with the virtual memory. It's expected that usage in this environment is leveraging the pooling instance allocator with the `memory-init-cow` feature enabled which means that the kernel level VM lock is acquired in operations such as: 1. Growing a heap with `mprotect` (write lock) 2. Faulting in memory during usage (read lock) 3. Resetting a heap's contents with `madvise` (read lock) 4. Shrinking a heap with `mprotect` when reusing a slot (write lock) Rapid usage of these operations can lead to detrimental performance especially on otherwise heavily loaded systems, worsening the more frequent the above operations are. This commit is aimed at addressing the (2) case above, reducing the number of page faults that are fulfilled by the kernel. Currently these page faults happen for three reasons: * When memory is first accessed after the heap is grown. * When the initial linear memory image is accessed for the first time. * When the initial zero'd heap contents, not part of the linear memory image, are accessed. This PR is attempting to address the latter of these cases, and to a lesser extent the first case as well. Specifically this PR provides the ability to partially reset a pooled linear memory with `memset` rather than `madvise`. This is done to have the same effect of resetting contents to zero but namely has a different effect on paging, notably keeping the pages resident in memory rather than returning them to the kernel. This means that reuse of a linear memory slot on a page that was previously `memset` will not trigger a page fault since everything remains paged into the process. The end result is that any access to linear memory which has been touched by `memset` will no longer page fault on reuse. On more recent kernels (6.0+) this also means pages which were zero'd by `memset`, made inaccessible with `PROT_NONE`, and then made accessible again with `PROT_READ | PROT_WRITE` will not page fault. This can be common when a wasm instances grows its heap slightly, uses that memory, but then it's shrunk when the memory is reused for the next instance. Note that this kernel optimization requires a 6.0+ kernel. This same optimization is furthermore applied to both async stacks with the pooling memory allocator in addition to table elements. The defaults of Wasmtime are not changing with this PR, instead knobs are being exposed for embedders to turn if they so desire. This is currently being experimented with at Fastly and I may come back and alter the defaults of Wasmtime if it seems suitable after our measurements. |
||
|
|
b14551d7ca |
Refactor configuration for the pooling allocator (#5205)
This commit changes the APIs in the `wasmtime` crate for configuring the pooling allocator. I plan on adding a few more configuration options in the near future and the current structure was feeling unwieldy for adding these new abstractions. The previous `struct`-based API has been replaced with a builder-style API in a similar shape as to `Config`. This is done to help make it easier to add more configuration options in the future through adding more methods as opposed to adding more field which could break prior initializations. |
||
|
|
e0c8a7f477 |
Don't fail documentation without the default feature (#5180)
This commit fixes `cargo doc -p wasmtime --no-default-features` where previously it would fail with many broken doc links because the crate is missing many items that links refer to. Instead they're emitted as warnings now which while noisy should prevent the build from being entirely usable at least. |
||
|
|
2afaac5181 |
Return anyhow::Error from host functions instead of Trap, redesign Trap (#5149)
* Return `anyhow::Error` from host functions instead of `Trap` This commit refactors how errors are modeled when returned from host functions and additionally refactors how custom errors work with `Trap`. At a high level functions in Wasmtime that previously worked with `Result<T, Trap>` now work with `Result<T>` instead where the error is `anyhow::Error`. This includes functions such as: * Host-defined functions in a `Linker<T>` * `TypedFunc::call` * Host-related callbacks like call hooks Errors are now modeled primarily as `anyhow::Error` throughout Wasmtime. This subsequently removes the need for `Trap` to have the ability to represent all host-defined errors as it previously did. Consequently the `From` implementations for any error into a `Trap` have been removed here and the only embedder-defined way to create a `Trap` is to use `Trap::new` with a custom string. After this commit the distinction between a `Trap` and a host error is the wasm backtrace that it contains. Previously all errors in host functions would flow through a `Trap` and get a wasm backtrace attached to them, but now this only happens if a `Trap` itself is created meaning that arbitrary host-defined errors flowing from a host import to the other side won't get backtraces attached. Some internals of Wasmtime itself were updated or preserved to use `Trap::new` to capture a backtrace where it seemed useful, such as when fuel runs out. The main motivation for this commit is that it now enables hosts to thread a concrete error type from a host function all the way through to where a wasm function was invoked. Previously this could not be done since the host error was wrapped in a `Trap` that didn't provide the ability to get at the internals. A consequence of this commit is that when a host error is returned that isn't a `Trap` we'll capture a backtrace and then won't have a `Trap` to attach it to. To avoid losing the contextual information this commit uses the `Error::context` method to attach the backtrace as contextual information to ensure that the backtrace is itself not lost. This is a breaking change for likely all users of Wasmtime, but it's hoped to be a relatively minor change to workaround. Most use cases can likely change `-> Result<T, Trap>` to `-> Result<T>` and otherwise explicit creation of a `Trap` is largely no longer necessary. * Fix some doc links * add some tests and make a backtrace type public (#55) * Trap: avoid a trailing newline in the Display impl which in turn ends up with three newlines between the end of the backtrace and the `Caused by` in the anyhow Debug impl * make BacktraceContext pub, and add tests showing downcasting behavior of anyhow::Error to traps or backtraces * Remove now-unnecesary `Trap` downcasts in `Linker::module` * Fix test output expectations * Remove `Trap::i32_exit` This commit removes special-handling in the `wasmtime::Trap` type for the i32 exit code required by WASI. This is now instead modeled as a specific `I32Exit` error type in the `wasmtime-wasi` crate which is returned by the `proc_exit` hostcall. Embedders which previously tested for i32 exits now downcast to the `I32Exit` value. * Remove the `Trap::new` constructor This commit removes the ability to create a trap with an arbitrary error message. The purpose of this commit is to continue the prior trend of leaning into the `anyhow::Error` type instead of trying to recreate it with `Trap`. A subsequent simplification to `Trap` after this commit is that `Trap` will simply be an `enum` of trap codes with no extra information. This commit is doubly-motivated by the desire to always use the new `BacktraceContext` type instead of sometimes using that and sometimes using `Trap`. Most of the changes here were around updating `Trap::new` calls to `bail!` calls instead. Tests which assert particular error messages additionally often needed to use the `:?` formatter instead of the `{}` formatter because the prior formats the whole `anyhow::Error` and the latter only formats the top-most error, which now contains the backtrace. * Merge `Trap` and `TrapCode` With prior refactorings there's no more need for `Trap` to be opaque or otherwise contain a backtrace. This commit parse down `Trap` to simply an `enum` which was the old `TrapCode`. All various tests and such were updated to handle this. The main consequence of this commit is that all errors have a `BacktraceContext` context attached to them. This unfortunately means that the backtrace is printed first before the error message or trap code, but given all the prior simplifications that seems worth it at this time. * Rename `BacktraceContext` to `WasmBacktrace` This feels like a better name given how this has turned out, and additionally this commit removes having both `WasmBacktrace` and `BacktraceContext`. * Soup up documentation for errors and traps * Fix build of the C API Co-authored-by: Pat Hickey <pat@moreproductive.org> |
||
|
|
cd53bed898 |
Implement AOT compilation for components (#5160)
* Pull `Module` out of `ModuleTextBuilder` This commit is the first in what will likely be a number towards preparing for serializing a compiled component to bytes, a precompiled artifact. To that end my rough plan is to merge all of the compiled artifacts for a component into one large object file instead of having lots of separate object files and lots of separate mmaps to manage. To that end I plan on eventually using `ModuleTextBuilder` to build one large text section for all core wasm modules and trampolines, meaning that `ModuleTextBuilder` is no longer specific to one module. I've extracted out functionality such as function name calculation as well as relocation resolving (now a closure passed in) in preparation for this. For now this just keeps tests passing, and the trajectory for this should become more clear over the following commits. * Remove component-specific object emission This commit removes the `ComponentCompiler::emit_obj` function in favor of `Compiler::emit_obj`, now renamed `append_code`. This involved significantly refactoring code emission to take a flat list of functions into `append_code` and the caller is responsible for weaving together various "families" of functions and un-weaving them afterwards. * Consolidate ELF parsing in `CodeMemory` This commit moves the ELF file parsing and section iteration from `CompiledModule` into `CodeMemory` so one location keeps track of section ranges and such. This is in preparation for sharing much of this code with components which needs all the same sections to get tracked but won't be using `CompiledModule`. A small side benefit from this is that the section parsing done in `CodeMemory` and `CompiledModule` is no longer duplicated. * Remove separately tracked traps in components Previously components would generate an "always trapping" function and the metadata around which pc was allowed to trap was handled manually for components. With recent refactorings the Wasmtime-standard trap section in object files is now being generated for components as well which means that can be reused instead of custom-tracking this metadata. This commit removes the manual tracking for the `always_trap` functions and plumbs the necessary bits around to make components look more like modules. * Remove a now-unnecessary `Arc` in `Module` Not expected to have any measurable impact on performance, but complexity-wise this should make it a bit easier to understand the internals since there's no longer any need to store this somewhere else than its owner's location. * Merge compilation artifacts of components This commit is a large refactoring of the component compilation process to produce a single artifact instead of multiple binary artifacts. The core wasm compilation process is refactored as well to share as much code as necessary with the component compilation process. This method of representing a compiled component necessitated a few medium-sized changes internally within Wasmtime: * A new data structure was created, `CodeObject`, which represents metadata about a single compiled artifact. This is then stored as an `Arc` within a component and a module. For `Module` this is always uniquely owned and represents a shuffling around of data from one owner to another. For a `Component`, however, this is shared amongst all loaded modules and the top-level component. * The "module registry" which is used for symbolicating backtraces and for trap information has been updated to account for a single region of loaded code holding possibly multiple modules. This involved adding a second-level `BTreeMap` for now. This will likely slow down instantiation slightly but if it poses an issue in the future this should be able to be represented with a more clever data structure. This commit additionally solves a number of longstanding issues with components such as compiling only one host-to-wasm trampoline per signature instead of possibly once-per-module. Additionally the `SignatureCollection` registration now happens once-per-component instead of once-per-module-within-a-component. * Fix compile errors from prior commits * Support AOT-compiling components This commit adds support for AOT-compiled components in the same manner as `Module`, specifically adding: * `Engine::precompile_component` * `Component::serialize` * `Component::deserialize` * `Component::deserialize_file` Internally the support for components looks quite similar to `Module`. All the prior commits to this made adding the support here (unsurprisingly) easy. Components are represented as a single object file as are modules, and the functions for each module are all piled into the same object file next to each other (as are areas such as data sections). Support was also added here to quickly differentiate compiled components vs compiled modules via the `e_flags` field in the ELF header. * Prevent serializing exported modules on components The current representation of a module within a component means that the implementation of `Module::serialize` will not work if the module is exported from a component. The reason for this is that `serialize` doesn't actually do anything and simply returns the underlying mmap as a list of bytes. The mmap, however, has `.wasmtime.info` describing component metadata as opposed to this module's metadata. While rewriting this section could be implemented it's not so easy to do so and is otherwise seen as not super important of a feature right now anyway. * Fix windows build * Fix an unused function warning * Update crates/environ/src/compilation.rs Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> |
||
|
|
434fbf2b27 |
Refactor metadata storage in AOT artifacts (#5153)
* Refactor metadata storage in AOT artifacts This commit is a reorganization of how metadata is stored in Wasmtime's compiled artifacts. Currently Wasmtime's ELF artifacts have data appended after them to contain metadata about the `Engine` as well as type information for the module itself. This extra data at the end of the file is ignored by ELF-related utilities generally and is assembled during the module serialization process. In working on AOT-compiling components, though, I've discovered a number of issues with this: * Primarily it's possible to mistakenly change an artifact if it's deserialized and then serialized again. This issue is probably theoretical but the deserialized artifact records the `Engine` configuration at time of creation but when re-serializing that it serializes the current `Engine` state, not the original `Engine` state. * Additionally the serialization strategy here is tightly coupled to `Module` and its serialization format. While this makes sense it is not conducive for future refactorings to use a similar serialization format for components. The engine metadata, for example, does not necessarily need to be tied up with type information. * The storage for this extra metadata is a bit wonky by shoving it at the end of the ELF file. The original reason for this was to have a compiled artifact be multiple objects concatenated with each other to support serializing module-linking-using modules. Module linking is no longer a thing and I have since decided that for the component model all compilation artifacts will go into one object file to assist debugability. This means that the extra stick-it-at-the-end is no longer necessary. To solve these issues this commit splits up the `module/serialization.rs` file in two, mostly moving the logic to `engine/serialization.rs`. The engine serialization logic now handles everything related to `Engine` compatibility such as targets, compiler flags, wasm features, etc. The module serialization logic is now exclusively interested in type information. The engine metadata and serialized type information additionally live in sections of the final file now instead of at the end. This means that there are three primary `bincode`-encoded sections that are parsed on deserializing a file: 1. The `Engine`-specific metadata. This will be the same for both modules and components. 2. The `CompiledModuleInfo` structure. For core wasm there's just one of these but for the component model there will be multiple, one per core wasm module. 3. The type information. For core wasm this is a `ModuleTypes` but for a component this will be a `ComponentTypes`. No true functional change is expected from this commit. Binary artifacts might get inflated by a small handful of bytes due to using ELF sections to represent this now. A related change I made during this commit as well was the plumbing of the `is_branch_protection_enabled` flag. This is technically `Engine`-level metadata but I didn't want to plumb it all over the place as was done now, so instead a new section was added to the final binary just for this bti information. This means that it no longer needs to be a parameter to `CodeMemory::publish` and additionally is more amenable to a `Component`-is-just-one-object world where no single module owns this piece of metadata. * Exclude some functions in a cranelift-less build |
||
|
|
12e4a1ba18 |
component model: async host function & embedding support (#5055)
* func_wrap_async typechecks
* func call async
* instantiate_async
* fixes
* async engine creation for tests
* start adding a component model test for async
* fix wrong check for async support, factor out Instance::new_started to an unchecked impl
* tests: wibbles
* component::Linker::func_wrap: replace IntoComponentFunc with directly accepting a closure
We find that this makes the Linker::func_wrap type signature much easier
to read. The IntoComponentFunc abstraction was adding a lot of weight to
"splat" a set of arguments from a tuple of types into individual
arguments to the closure. Additionally, making the StoreContextMut
argument optional, or the Result<return> optional, wasn't very
worthwhile.
* Fixes for the new style of closure required by component::Linker::func_wrap
* future of result of return
* add Linker::instantiate_async and {Typed}Func::post_return_async
* fix fuzzing generator
* note optimisation opportunity
* simplify test
|
||
|
|
25bc12ec82 |
Add egraphs option to Wasmtime config, and add it to fuzzing config generation. (#5067)
* Add egraphs option to Wasmtime config, and add it to fuzzing config generation. This PR adds a wrapper method for Cranelift's `use_egraphs` setting to Wasmtime's `Config`, named `cranelift_use_egraphs` analogously to its existing `cranelift_opt_level`. Eventually this should become a no-op as egraph-based optimization becomes the default, but until then it makes sense to expose this as another kind of optimization option. This PR then adds the option to the `Arbitrary`-based config generation for fuzzing, so compilation with egraphs will be fuzzed (on its own and against other configurations and oracles). * Don't use `NamedTempFile` on Windows It looks like this prevents mmap-ing since the named temporary file holds a `File` open which conflicts with the rights we're trying to open the file for mmap-ing. Instead use a temporary directory to try to fix this issue. Co-authored-by: Alex Crichton <alex@alexcrichton.com> |
||
|
|
78ecc17d0f |
unsplat component::Linker::func_wrap args (#5065)
* component::Linker::func_wrap: replace IntoComponentFunc with directly accepting a closure We find that this makes the Linker::func_wrap type signature much easier to read. The IntoComponentFunc abstraction was adding a lot of weight to "splat" a set of arguments from a tuple of types into individual arguments to the closure. Additionally, making the StoreContextMut argument optional, or the Result<return> optional, wasn't very worthwhile. * Fixes for the new style of closure required by component::Linker::func_wrap * fix fuzzing generator |
||
|
|
ff0c45b4a0 |
Minor changes for components related to wit-bindgen support (#5053)
* Plumb type exports in components around more This commit adds some more plumbing for type exports to ensure that they show up in the final compiled representation of a component. For now they continued to be ignored for all purposes in the embedding API itself but I found this useful to explore in `wit-bindgen` based tooling which is leveraging the component parsing in Wasmtime. * Add a field to `ModuleTranslation` to store the original wasm This commit adds a field to be able to refer back to the original wasm binary for a `ModuleTranslation`. This field is used in the upcoming support for host generation in `wit-component` to "decompile" a component into core wasm modules to get instantiated. This is used to extract a core wasm module from the original component. * FIx a build warning |
||
|
|
a2f846f124 |
Don't re-capture backtraces when propagating traps through host frames (#5049)
* Add a benchmark for traps with many Wasm<-->host calls on the stack * Add a test for expected Wasm stack traces with Wasm<--host calls on the stack when we trap * Don't re-capture backtraces when propagating traps through host frames This fixes some accidentally quadratic code where we would re-capture a Wasm stack trace (takes `O(n)` time) every time we propagated a trap through a host frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of course. Whoops. After this commit, it trapping with a call stack that is `n` frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and is therefore just a proper `O(n)` time operation, as it is intended to be. Now we explicitly track whether we need to capture a Wasm backtrace or not when raising a trap. This unfortunately isn't as straightforward as one might hope, however, because of the split between `wasmtime::Trap` and `wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm backtrace inside `wasmtime_runtime` but in order to determine whether to do that or not we need to reflect on the `anyhow::Error` and see if it is a `wasmtime::Trap` that already has a backtrace or not. This can't be done the straightforward way because it would introduce a cyclic dependency between the `wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap` types-- at least not without effectively merging the whole `wasmtime` and `wasmtime-runtime` crates together, which would be a good idea in a perfect world but would be a *ton* of ocean boiling from where we currently are -- because `wasmtime::Trap` does symbolication of stack traces which relies on module registration information data that resides inside the `wasmtime` crate and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we should capture a Wasm backtrace or not, and then determine whether we need a backtrace or not at each of that function's call sites, which are in `wasmtime` and therefore can do the reflection to determine whether the user trap already has a backtrace or not. Phew! Fixes #5037 * debug assert that we don't record unnecessary backtraces for traps * Add assertions around `needs_backtrace` Unfortunately we can't do debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some()); because `needs_backtrace` doesn't consider whether Wasm backtraces have been disabled via config. * Consolidate `needs_backtrace` calculation followed by calling `raise_user_trap` into one place |
||
|
|
f96491f333 |
Ignore when components export type definitions (#5051)
* allow a ComponentTypeRef::Type to point to a component TypeDef * component matching: don't assert exported Interface type definitions are "defined" types may be exported by their name for consumption by some component runtimes, but in wasmtime this doesn't matter (we lift and lower to types, not define them) so we should ignore these. * component-model instance tests: show that an import can export a type definition this is meaningless, but it should be accepted. (previously rejected) |
||
|
|
2be12a5167 |
egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> |
||
|
|
b454110ac7 |
Fix broken WASI ABI link (#5024)
The file has been moved. |
||
|
|
6d1bce9c64 |
Adjust fuel consumption to be empty when fuel is 0 (#5013)
Co-authored-by: Jamey Sharp <jsharp@fastly.com> Co-authored-by: Jamey Sharp <jsharp@fastly.com> |
||
|
|
cdecc858b4 |
add riscv64 backend for cranelift. (#4271)
Add a RISC-V 64 (`riscv64`, RV64GC) backend. Co-authored-by: yuyang <756445638@qq.com> Co-authored-by: Chris Fallin <chris@cfallin.org> Co-authored-by: Afonso Bordado <afonsobordado@az8.co> |
||
|
|
29c7de7340 |
Update wasm-tools dependencies (#4970)
* Update wasm-tools dependencies This update brings in a number of features such as: * The component model binary format and AST has been slightly adjusted in a few locations. Names are dropped from parameters/results now in the internal representation since they were not used anyway. At this time the ability to bind a multi-return function has not been exposed. * The `wasmparser` validator pass will now share allocations with prior functions, providing what's probably a very minor speedup for Wasmtime itself. * The text format for many component-related tests now requires named parameters. * Some new relaxed-simd instructions are updated to be ignored. I hope to have a follow-up to expose the multi-return ability to the embedding API of components. * Update audit information for new crates |
||
|
|
7b311004b5 |
Leverage Cargo's workspace inheritance feature (#4905)
* Leverage Cargo's workspace inheritance feature This commit is an attempt to reduce the complexity of the Cargo manifests in this repository with Cargo's workspace-inheritance feature becoming stable in Rust 1.64.0. This feature allows specifying fields in the root workspace `Cargo.toml` which are then reused throughout the workspace. For example this PR shares definitions such as: * All of the Wasmtime-family of crates now use `version.workspace = true` to have a single location which defines the version number. * All crates use `edition.workspace = true` to have one default edition for the entire workspace. * Common dependencies are listed in `[workspace.dependencies]` to avoid typing the same version number in a lot of different places (e.g. the `wasmparser = "0.89.0"` is now in just one spot. Currently the workspace-inheritance feature doesn't allow having two different versions to inherit, so all of the Cranelift-family of crates still manually specify their version. The inter-crate dependencies, however, are shared amongst the root workspace. This feature can be seen as a method of "preprocessing" of sorts for Cargo manifests. This will help us develop Wasmtime but shouldn't have any actual impact on the published artifacts -- everything's dependency lists are still the same. * Fix wasi-crypto tests |
||
|
|
d8b290898c |
Initial forward-edge CFI implementation (#3693)
* Initial forward-edge CFI implementation Give the user the option to start all basic blocks that are targets of indirect branches with the BTI instruction introduced by the Branch Target Identification extension to the Arm instruction set architecture. Copyright (c) 2022, Arm Limited. * Refactor `from_artifacts` to avoid second `make_executable` (#1) This involves "parsing" twice but this is parsing just the header of an ELF file so it's not a very intensive operation and should be ok to do twice. * Address the code review feedback Copyright (c) 2022, Arm Limited. Co-authored-by: Alex Crichton <alex@alexcrichton.com> |
||
|
|
65930640f8 |
Bump Wasmtime to 2.0.0 (#4874)
This commit replaces #4869 and represents the actual version bump that should have happened had I remembered to bump the in-tree version of Wasmtime to 1.0.0 prior to the branch-cut date. Alas! |
||
|
|
08e7a7f1a0 |
cranelift: Add inline stack probing for x64 (#4747)
* cranelift: Add inline stack probe for x64 * cranelift: Cleanups comments Thanks @jameysharp! |
||
|
|
bca4dae8b0 |
feat: add a knob for reset stack (#4813)
* feat: add a knob for reset stack * Touch up documentation of `async_stack_zeroing` Co-authored-by: Alex Crichton <alex@alexcrichton.com> |
||
|
|
62c5af68b5 |
components: Limit the recursive size of types in Wasmtime (#4825)
* components: Limit the recursive size of types in Wasmtime This commit is aimed at fixing #4814 by placing a hard limit on the maximal recursive depth a type may have in the component model. The component model theoretically allows for infinite recursion but many various types of operations within the component model are naturally written as recursion over the structure of a type which can lead to stack overflow with deeply recursive types. Some examples of recursive operations are: * Lifting and lowering a type - currently the recursion here is modeled in Rust directly with `#[derive]` implementations as well as the implementations for the `Val` type. * Compilation of adapter trampolines which iterates over the type structure recursively. * Historically many various calculations like the size of a type, the flattened representation of a type, etc, were all done recursively. Many of these are more efficiently done via other means but it was still natural to implement these recursively initially. By placing a hard limit on type recursion Wasmtime won't be able to load some otherwise-valid modules. The hope, though, is that no human-written program is likely to ever reach this limit. This limit can be revised and/or the locations with recursion revised if it's ever reached. The implementation of this feature is done by generalizing the current flattened-representation calculation which now keeps track of a type's depth and size. The size calculation isn't used just yet but I plan to use it in fixing #4816 and it was natural enough to write here as well. The depth is checked after a type is translated and if it exceeds the maximum then an error is returned. Additionally the `Arbitrary for Type` implementation was updated to prevent generation of a type that's too-recursive. Closes #4814 * Remove unused size calculation * Bump up just under the limit |
||
|
|
ff0e84ecf4 |
Wasmtime: fix stack walking across frames from different stores (#4779)
We were previously implicitly assuming that all Wasm frames in a stack used the
same `VMRuntimeLimits` as the previous frame we walked, but this is not true
when Wasm in store A calls into the host which then calls into Wasm in store B:
| ... |
| Host | |
+-----------------+ | stack
| Wasm in store A | | grows
+-----------------+ | down
| Host | |
+-----------------+ |
| Wasm in store B | V
+-----------------+
Trying to walk this stack would previously result in a runtime panic.
The solution is to push the maintenance of our list of saved Wasm FP/SP/PC
registers that allow us to identify contiguous regions of Wasm frames on the
stack deeper into `CallThreadState`. The saved registers list is now maintained
whenever updating the `CallThreadState` linked list by making the
`CallThreadState::prev` field private and only accessible via a getter and
setter, where the setter always maintains our invariants.
|
||
|
|
57dca934ad |
Upgrade wasm-tools crates, namely the component model (#4715)
* Upgrade wasm-tools crates, namely the component model This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven't added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I've updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that. * Update version numbers * Use crates.io |
||
|
|
5add267b87 |
Fix a soundness issue with lowering variants (#4723)
* Fix a compile error on nightly Rust It looks like Rust nightly has gotten a bit more strict about attributes-on-expressions and previously accepted code is no longer accepted. This commit updates the generated code for a macro to a form which is accepted by rustc. * Fix a soundness issue with lowering variants This commit fixes a soundness issue lowering variants in the component model where host memory could be leaked to the guest module by accident. In reviewing code recently for `Val::lower` I noticed that the variant lowering was extending the payload with `ValRaw::u32(0)` to appropriately fit the size of the variant. In reading this it appeared incorrect to me due to the fact that it should be `ValRaw::u64(0)` since up to 64-bits can be read. Additionally this implementation was also incorrect because the lowered representation of the payload itself was not possibly zero-extended to 64-bits to accommodate other variants. It turned out these issues were benign because with the dynamic surface area to the component model the arguments were all initialized to 0 anyway. The static version of the API, however, does not initialize arguments to 0 and I wanted to initially align these two implementations so I updated the variant implementation of lowering for dynamic values and removed the zero-ing of arguments. To test this change I updated the `debug` mode of adapter module generation to assert that the upper bits of values in wasm are always zero when the value is casted down (during `stack_get` which only happens with variants). I then threaded through the `debug` boolean configuration parameter into the dynamic and static fuzzers. To my surprise this new assertion tripped even after the fix was applied. It turns out, though, that there was other leakage of bits through other means that I was previously unaware of. At the primitive level lowerings of types like `u32` will have a `Lower` representation of `ValRaw` and the lowering is simply `dst.write(ValRaw::i32(self))`, or the equivalent thereof. The problem, that the fuzzers detected, with this pattern is that the `ValRaw` type is 16-bytes, and `ValRaw::i32(X)` only initializes the first 4. This meant that all the lowerings for all primitives were writing up to 12 bytes of garbage from the host for the wasm module to read. It turned out that this write of a `ValRaw` was sometimes 16 bytes and sometimes the appropriate size depending on the number of optimizations in play. With enough inlining for example `dst.write(ValRaw::i32(self))` would only write 4 bytes, as expected. In debug mode though without inlining 16 bytes would be written, including the garbage from the upper bits. To solve this issue I ended up taking a somewhat different approach. I primarily updated the `ValRaw` constructors to simply always extend the values internally to 64-bits, meaning that the low 8 bytes of a `ValRaw` is always initialized. This prevents any undefined data from leaking from the host into a wasm module, and means that values are also zero-extended even if they're only used in 32-bit contexts outside of a variant. This felt like the best fix for now, though, in terms of not really having a performance impact while additionally not requiring a rewrite of all lowerings. This solution ended up also neatly removing the "zero out the entire payload" logic that was previously require. Now after a payload is lowered only the tail end of the payload, up to the size of the variant, is zeroed out. This means that each lowered argument is written to at most once which should hopefully be a small performance boost for calling into functions as well. |
||
|
|
83e37f9334 |
Fix a compile error on nightly Rust (#4724)
It looks like Rust nightly has gotten a bit more strict about attributes-on-expressions and previously accepted code is no longer accepted. This commit updates the generated code for a macro to a form which is accepted by rustc. |
||
|
|
bc8e36a6af |
Refactor and optimize the flat type calculations (#4708)
* Optimize flat type representation calculations Previously calculating the flat type representation would be done recursively for an entire type tree every time it was visited. Additionally the flat type representation was entirely built only to be thrown away if it was too large at the end. This chiefly presented a source of recursion based on the type structure in the component model which fuzzing does not like as it reports stack overflows. This commit overhauls the representation of flat types in Wasmtime by caching the representation for each type in the compile-time `ComponentTypesBuilder` structure. This avoids recalculating each time the flat representation is queried and additionally allows opportunity to have more short-circuiting to avoid building overly-large vectors. * Remove duplicate flat count calculation in wasmtime Roughly share the infrastructure in the `wasmtime-environ` crate, namely the non-recursive and memoizing nature of the calculation. * Fix component fuzz build * Fix example compile |
||
|
|
1e12645ab1 |
Fix a bad bounds check in component trampolines (#4716)
A `GtU` condition needed to actually be `GeU`, as the comment right above it stated but apparently I forgot to translate the comment to actual code. This fixes a fuzz bug that arose from oss-fuzz over the weekend. |
||
|
|
cc955e4e7e |
Rename MmapVec::drain to split_off (#4673)
* Rename `MmapVec::drain` to `split_off` As suggested on #4609 * Fix tests * Make MmapVec::split_off work like Vec::split_off Co-authored-by: Jamey Sharp <jsharp@fastly.com> |
||
|
|
8a9b1a9025 |
Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155. |
||
|
|
fd28d94352 |
Shield compiled modules from their appended metadata (#4609)
This commit fixes #4600 in a somewhat roundabout fashion. Currently the `main` branch of Wasmtime exhibits unusual behavior: * If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level` test does not fail. * If `cargo test -p wasmtime --lib` is run, however, then the test fails. This test is indeed being run as part of `./ci/run-tests.sh` and it's also passing in CI. The exact failure is that part of the debuginfo support we have takes an existing ELF image, copies it, and then appends some information to inform profilers/gdb about the image. This code is all quite old at this point and not 100% optimal, but that's at least where we're at. The problem is that the appended `ProgramHeader64` is not aligned correctly during `cargo test -p wasmtime --lib`, which is the panic that happens causing the test to fail. The reason, however, that this test passes with `./ci/run-tests.sh` is that the alignment of `ProgramHeader64` is 1 instead of 8. The reason for that is that the `object` crate has an `unaligned` feature which forcibly unaligns all primitives to 1 byte instead of their natural alignment. During `cargo test -p wasmtime --lib` this feature is not enabled but during `./ci/run-tests.sh` this feature is enabled. The feature is currently enabled through inclusion of the `backtrace` crate which only happens for some tests in some crates. The alignment issue explains why the test fails on a single crate test but fails on the whole workspace tests. The next issue I investigated was if this test ever passed. It turns out that on v0.39.0 this test passed, and the regression to main was introduced during #4571. That PR, however, has nothing to do with any of this! The reason that this showed up as causing a "regression" however is because it changed cranelift settings which changed the size of serialized metadata at the end of a Wasmtime cache object. Wasmtime compiled artifacts are ELF images with Wasmtime-specific metadata appended after them. This appended metadata was making its way all the way through to the gdbjit image itself which mean that while the end of the ELF file itself was properly aligned the space after the Wasmtime metadata was not aligned. This metadata changes in size over time as Cranelift settings change which explains why #4571 was the "source" of the regression. The fix in this commit is to discard the extra Wasmtime metadata when creating an `MmapVec` representing the underlying ELF image. This is already supported with `MmapVec::drain` so it was relatively easy to insert that. This means that the gdbjit image starts with just the ELF file itself which is always aligned at the end, which gets the test passing with/without the `unaligned` feature in the `object` crate. |
||
|
|
bd70dbebbd |
Deduplicate some size/align calculations (#4658)
This commit is an effort to reduce the amount of complexity around managing the size/alignment calculations of types in the canonical ABI. Previously the logic for the size/alignment of a type was spread out across a number of locations. While each individual calculation is not really the most complicated thing in the world having the duplication in so many places was constantly worrying me. I've opted in this commit to centralize all of this within the runtime at least, and now there's only one "duplicate" of this information in the fuzzing infrastructure which is to some degree less important to deduplicate. This commit introduces a new `CanonicalAbiInfo` type to house all abi size/align information for both memory32 and memory64. This new type is then used pervasively throughout fused adapter compilation, dynamic `Val` management, and typed functions. This type was also able to reduce the complexity of the macro-generated code meaning that even `wasmtime-component-macro` is performing less math than it was before. One other major feature of this commit is that this ABI information is now saved within a `ComponentTypes` structure. This avoids recursive querying of size/align information frequently and instead effectively caching it. This was a worry I had for the fused adapter compiler which frequently sought out size/align information and would recursively descend each type tree each time. The `fact-valid-module` fuzzer is now nearly 10x faster in terms of iterations/s which I suspect is due to this caching. |
||
|
|
867f5c1244 |
Update behavior of zero-length lists/strings (#4648)
The spec was expected to change to not bounds-check 0-byte lists/strings but has since been updated to match `memory.copy` which does indeed check the pointer for 0-byte copies. |
||
|
|
ec47335b9c |
wasmtime: Add a Config::native_unwind_info method (#4643)
This method configures whether native unwind information (e.g. `.eh_frame` on Linux) is generated or not. This helps integrate with third-party stack capturing tools, such as the system unwinder or the `backtrace` crate. It does not affect whether Wasmtime can capture stack traces in Wasm code that it is running or not. Unwind info is always enabled on Windows, since the Windows ABI requires it. This configuration option defaults to true. Additionally, we deprecate `Config::wasm_backtrace` since we can always cheaply capture stack traces ever since https://github.com/bytecodealliance/wasmtime/pull/4431. Fixes https://github.com/bytecodealliance/wasmtime/issues/4554 |
||
|
|
866ec46613 |
Implement roundtrip fuzzing of component adapters (#4640)
* Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow<'static, ...>` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store<Box<dyn Any>>` to try to cut down on codegen by monomorphizing fewer `Store<T>` implementation. * Add debug logging to print out what's flowing in and what's flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple<T, T, T, T, ...>` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn't lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the "stack", e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn't trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn't encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator |
||
|
|
650979ae40 |
Implement strings in adapter modules (#4623)
* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings |
||
|
|
412fa04911 |
Bump Wasmtime to 0.41.0 (#4620)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> |
||
|
|
b4d7ab36f9 |
Add a dataflow-based representation of components (#4597)
* Add a dataflow-based representation of components This commit updates the inlining phase of compiling a component to creating a dataflow-based representation of a component instead of creating a final `Component` with a linear list of initializers. This dataflow graph is then linearized in a final step to create the actual final `Component`. The motivation for this commit stems primarily from my work implementing strings in fused adapters. In doing this my plan is to defer most low-level transcoding to the host itself rather than implementing that in the core wasm adapter modules. This means that small cranelift-generated trampolines will be used for adapter modules to call which then call "transcoding libcalls". The cranelift-generated trampolines will get raw pointers into linear memory and pass those to the libcall which core wasm doesn't have access to when passing arguments to an import. Implementing this with the previous representation of a `Component` was becoming too tricky to bear. The initialization of a transcoder needed to happen at just the right time: before the adapter module which needed it was instantiated but after the linear memories referenced had been extracted into the `VMComponentContext`. The difficulty here is further compounded by the current adapter module injection pass already being quite complicated. Adapter modules are already renumbering the index space of runtime instances and shuffling items around in the `GlobalInitializer` list. Perhaps the worst part of this was that memories could already be referenced by host function imports or exports to the host, and if adapters referenced the same memory it shouldn't be referenced twice in the component. This meant that `ExtractMemory` initializers ideally needed to be shuffled around in the initializer list to happen as early as possible instead of wherever they happened to show up during translation. Overall I did my best to implement the transcoders but everything always came up short. I have decided to throw my hands up in the air and try a completely different approach to this, namely the dataflow-based representation in this commit. This makes it much easier to edit the component after initial translation for injection of adapters, injection of transcoders, adding dependencies on possibly-already-existing items, etc. The adapter module partitioning pass in this commit was greatly simplified to something which I believe is functionally equivalent but is probably an order of magnitude easier to understand. The biggest downside of this representation I believe is having a duplicate representation of a component. The `component::info` was largely duplicated into the `component::dfg` module in this commit. Personally though I think this is a more appropriate tradeoff than before because it's very easy to reason about "convert representation A to B" code whereas it was very difficult to reason about shuffling around `GlobalInitializer` items in optimal fashions. This may also have a cost at compile-time in terms of shuffling data around, but my hope is that we have lots of other low-hanging fruit to optimize if it ever comes to that which allows keeping this easier-to-understand representation. Finally, to reiterate, the final representation of components is not changed by this PR. To the runtime internals everything is still the same. * Fix compile of factc |
||
|
|
ed8908efcf |
implement fuzzing for component types (#4537)
This addresses #4307. For the static API we generate 100 arbitrary test cases at build time, each of which includes 0-5 parameter types, a result type, and a WAT fragment containing an imported function and an exported function. The exported function calls the imported function, which is implemented by the host. At runtime, the fuzz test selects a test case at random and feeds it zero or more sets of arbitrary parameters and results, checking that values which flow host-to-guest and guest-to-host make the transition unchanged. The fuzz test for the dynamic API follows a similar pattern, the only difference being that test cases are generated at runtime. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
a897742593 |
Initial back-edge CFI implementation (#3606)
Give the user the option to sign and to authenticate function return addresses with the operations introduced by the Pointer Authentication extension to the Arm instruction set architecture. Copyright (c) 2021, Arm Limited. |
||
|
|
9f82644cc3 |
Some minor cleanups/refactorings in components (#4582)
This is a collection of some minor renamings, refactorings, sharing of code, etc. This was all discovered during my addition of string support to adapter functions and I figured it'd be best to frontload this and land it ahead of the full patch since it's getting complex. |
||
|
|
43f1765272 |
Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.
As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.
In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.
This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425
* Review feedback.
* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.
The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.
* Review feedback.
|
||
|
|
fb59de15af |
Implement fused adapters for (list T) types (#4558)
* Implement fused adapters for `(list T)` types This commit implements one of the two remaining types for adapter fusion, lists. This implementation is particularly tricky for a number of reasons: * Lists have a number of validity checks which need to be carefully implemented. For example the byte length of the list passed to allocation in the destination module could overflow the 32-bit index space. Additionally lists in 32-bit memories need a check that their final address is in-bounds in the address space. * In the effort to go ahead and support memory64 at the lowest layers this is where much of the magic happens. Lists are naturally always stored in memory and shifting between 64/32-bit address spaces is done here. This notably required plumbing an `Options` around during flattening/size/alignment calculations due to the size/types of lists changing depending on the memory configuration. I've also added a small `factc` program in this commit which should hopefully assist in exploring and debugging adapter modules. This takes as input a component (text or binary format) and then generates an adapter module for all component function signatures found internally. This commit notably does not include tests for lists. I tried to figure out a good way to add these but I felt like there were too many cases to test and the tests would otherwise be extremely verbose. Instead I think the best testing strategy for this commit will be through #4537 which should be relatively extensible to testing adapters between modules in addition to host-based lifting/lowering. * Improve handling of lists of 0-size types * Skip overflow checks on byte sizes for 0-size types * Skip the copy loop entirely when src/dst are both 0 * Skip the increments of src/dst pointers if either is 0-size * Update semantics for zero-sized lists/strings When a list/string has a 0-byte-size the base pointer is no longer verified to be in-bounds to match the supposedly desired adapter semantics where no trap happens because no turn of the loop happens. |
||
|
|
893fadb485 |
components: Fix support for 0-sized flags (#4560)
This commit goes through and updates support in the various argument passing routines to support 0-sized flags. A bit of a degenerate case but clarified in WebAssembly/component-model#76 as intentional. |
||
|
|
04631ad0af |
Unconditionally enable component-model tests (#4556)
* Unconditionally enable component-model tests * Remove an outdated test that wasn't previously being compiled * Fix a component model doc test * Try to decrease memory usage in qemu |
||
|
|
46782b18c2 |
wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
|
||
|
|
174b60dcf7 |
Add *.wast support for invoking components (#4526)
This commit builds on bytecodealliance/wasm-tools#690 to add support to testing of the component model to execute functions when running `*.wast` files. This support is all built on #4442 as functions are invoked through a "dynamic" API. Right now the testing and integration is fairly crude but I'm hoping that we can try to improve it over time as necessary. For now this should provide a hopefully more convenient syntax for unit tests and the like. |
||
|
|
81bff71d11 |
wasmtime: let serialization mismatch tests pass with RUST_BACKTRACE=1 set (#4525)
|
||
|
|
97894bc65e |
Add initial support for fused adapter trampolines (#4501)
* Add initial support for fused adapter trampolines This commit lands a significant new piece of functionality to Wasmtime's implementation of the component model in the form of the implementation of fused adapter trampolines. Internally within a component core wasm modules can communicate with each other by having their exports `canon lift`'d to get `canon lower`'d into a different component. This signifies that two components are communicating through a statically known interface via the canonical ABI at this time. Previously Wasmtime was able to identify that this communication was happening but it simply panicked with `unimplemented!` upon seeing it. This commit is the beginning of filling out this panic location with an actual implementation. The implementation route chosen here for fused adapters is to use a WebAssembly module itself for the implementation. This means that, at compile time of a component, Wasmtime is generating core WebAssembly modules which then get recursively compiled within Wasmtime as well. The choice to use WebAssembly itself as the implementation of fused adapters stems from a few motivations: * This does not represent a significant increase in the "trusted compiler base" of Wasmtime. Getting the Wasm -> CLIF translation correct once is hard enough much less for an entirely different IR to CLIF. By generating WebAssembly no new interactions with Cranelift are added which drastically reduces the possibilities for mistakes. * Using WebAssembly means that component adapters are insulated from miscompilations and mistakes. If something goes wrong it's defined well within the WebAssembly specification how it goes wrong and what happens as a result. This means that the "blast zone" for a wrong adapter is the component instance but not the entire host itself. Accesses to linear memory are guaranteed to be in-bounds and otherwise handled via well-defined traps. * A fully-finished fused adapter compiler is expected to be a significant and quite complex component of Wasmtime. Functionality along these lines is expected to be needed for Web-based polyfills of the component model and by using core WebAssembly it provides the opportunity to share code between Wasmtime and these polyfills for the component model. * Finally the runtime implementation of managing WebAssembly modules is already implemented and quite easy to integrate with, so representing fused adapters with WebAssembly results in very little extra support necessary for the runtime implementation of instantiating and managing a component. The compiler added in this commit is dubbed Wasmtime's Fused Adapter Compiler of Trampolines (FACT) because who doesn't like deriving a name from an acronym. Currently the trampoline compiler is limited in its support for interface types and only supports a few primitives. I plan on filing future PRs to flesh out the support here for all the variants of `InterfaceType`. For now this PR is primarily focused on all of the other infrastructure for the addition of a trampoline compiler. With the choice to use core WebAssembly to implement fused adapters it means that adapters need to be inserted into a module. Unfortunately adapters cannot all go into a single WebAssembly module because adapters themselves have dependencies which may be provided transitively through instances that were instantiated with other adapters. This means that a significant chunk of this PR (`adapt.rs`) is dedicated to determining precisely which adapters go into precisely which adapter modules. This partitioning process attempts to make large modules wherever it can to cut down on core wasm instantiations but is likely not optimal as it's just a simple heuristic today. With all of this added together it's now possible to start writing `*.wast` tests that internally have adapted modules communicating with one another. A `fused.wast` test suite was added as part of this PR which is the beginning of tests for the support of the fused adapter compiler added in this PR. Currently this is primarily testing some various topologies of adapters along with direct/indirect modes. This will grow many more tests over time as more types are supported. Overall I'm not 100% satisfied with the testing story of this PR. When a test fails it's very difficult to debug since everything is written in the text format of WebAssembly meaning there's no "conveniences" to print out the state of the world when things go wrong and easily debug. I think this will become even more apparent as more tests are written for more types in subsequent PRs. At this time though I know of no better alternative other than leaning pretty heavily on fuzz-testing to ensure this is all exercised. * Fix an unused field warning * Fix tests in `wasmtime-runtime` * Add some more tests for compiled trampolines * Remap exports when injecting adapters The exports of a component were accidentally left unmapped which meant that they indexed the instance indexes pre-adapter module insertion. * Fix typo * Rebase conflicts |