650979ae405afc8b87935172189774cb1f24a8a3
268 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
650979ae40 |
Implement strings in adapter modules (#4623)
* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings |
||
|
|
ed8908efcf |
implement fuzzing for component types (#4537)
This addresses #4307. For the static API we generate 100 arbitrary test cases at build time, each of which includes 0-5 parameter types, a result type, and a WAT fragment containing an imported function and an exported function. The exported function calls the imported function, which is implemented by the host. At runtime, the fuzz test selects a test case at random and feeds it zero or more sets of arbitrary parameters and results, checking that values which flow host-to-guest and guest-to-host make the transition unchanged. The fuzz test for the dynamic API follows a similar pattern, the only difference being that test cases are generated at runtime. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
70ce288dc7 |
Save exit Wasm FP and PC in component-to-host trampolines (#4601)
* Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts * Save exit Wasm FP and PC in component-to-host trampolines Fixes #4535 * Add comment about why we deref the trampoline's FP * Update some tests to use new `vmruntime_limits_*` methods |
||
|
|
ee5b192d35 |
Re-enable component model *.wast tests (#4577)
* Re-enable component model `*.wast` tests These accidentally stopped running as part of #4556 on CI since I forgot one more location to touch a feature gate. * Enable logging in component tests This is a small convenience to get log messages during testing for components by default. |
||
|
|
43f1765272 |
Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.
As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.
In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.
This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425
* Review feedback.
* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.
The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.
* Review feedback.
|
||
|
|
fb59de15af |
Implement fused adapters for (list T) types (#4558)
* Implement fused adapters for `(list T)` types This commit implements one of the two remaining types for adapter fusion, lists. This implementation is particularly tricky for a number of reasons: * Lists have a number of validity checks which need to be carefully implemented. For example the byte length of the list passed to allocation in the destination module could overflow the 32-bit index space. Additionally lists in 32-bit memories need a check that their final address is in-bounds in the address space. * In the effort to go ahead and support memory64 at the lowest layers this is where much of the magic happens. Lists are naturally always stored in memory and shifting between 64/32-bit address spaces is done here. This notably required plumbing an `Options` around during flattening/size/alignment calculations due to the size/types of lists changing depending on the memory configuration. I've also added a small `factc` program in this commit which should hopefully assist in exploring and debugging adapter modules. This takes as input a component (text or binary format) and then generates an adapter module for all component function signatures found internally. This commit notably does not include tests for lists. I tried to figure out a good way to add these but I felt like there were too many cases to test and the tests would otherwise be extremely verbose. Instead I think the best testing strategy for this commit will be through #4537 which should be relatively extensible to testing adapters between modules in addition to host-based lifting/lowering. * Improve handling of lists of 0-size types * Skip overflow checks on byte sizes for 0-size types * Skip the copy loop entirely when src/dst are both 0 * Skip the increments of src/dst pointers if either is 0-size * Update semantics for zero-sized lists/strings When a list/string has a 0-byte-size the base pointer is no longer verified to be in-bounds to match the supposedly desired adapter semantics where no trap happens because no turn of the loop happens. |
||
|
|
893fadb485 |
components: Fix support for 0-sized flags (#4560)
This commit goes through and updates support in the various argument passing routines to support 0-sized flags. A bit of a degenerate case but clarified in WebAssembly/component-model#76 as intentional. |
||
|
|
04631ad0af |
Unconditionally enable component-model tests (#4556)
* Unconditionally enable component-model tests * Remove an outdated test that wasn't previously being compiled * Fix a component model doc test * Try to decrease memory usage in qemu |
||
|
|
46782b18c2 |
wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
|
||
|
|
97894bc65e |
Add initial support for fused adapter trampolines (#4501)
* Add initial support for fused adapter trampolines This commit lands a significant new piece of functionality to Wasmtime's implementation of the component model in the form of the implementation of fused adapter trampolines. Internally within a component core wasm modules can communicate with each other by having their exports `canon lift`'d to get `canon lower`'d into a different component. This signifies that two components are communicating through a statically known interface via the canonical ABI at this time. Previously Wasmtime was able to identify that this communication was happening but it simply panicked with `unimplemented!` upon seeing it. This commit is the beginning of filling out this panic location with an actual implementation. The implementation route chosen here for fused adapters is to use a WebAssembly module itself for the implementation. This means that, at compile time of a component, Wasmtime is generating core WebAssembly modules which then get recursively compiled within Wasmtime as well. The choice to use WebAssembly itself as the implementation of fused adapters stems from a few motivations: * This does not represent a significant increase in the "trusted compiler base" of Wasmtime. Getting the Wasm -> CLIF translation correct once is hard enough much less for an entirely different IR to CLIF. By generating WebAssembly no new interactions with Cranelift are added which drastically reduces the possibilities for mistakes. * Using WebAssembly means that component adapters are insulated from miscompilations and mistakes. If something goes wrong it's defined well within the WebAssembly specification how it goes wrong and what happens as a result. This means that the "blast zone" for a wrong adapter is the component instance but not the entire host itself. Accesses to linear memory are guaranteed to be in-bounds and otherwise handled via well-defined traps. * A fully-finished fused adapter compiler is expected to be a significant and quite complex component of Wasmtime. Functionality along these lines is expected to be needed for Web-based polyfills of the component model and by using core WebAssembly it provides the opportunity to share code between Wasmtime and these polyfills for the component model. * Finally the runtime implementation of managing WebAssembly modules is already implemented and quite easy to integrate with, so representing fused adapters with WebAssembly results in very little extra support necessary for the runtime implementation of instantiating and managing a component. The compiler added in this commit is dubbed Wasmtime's Fused Adapter Compiler of Trampolines (FACT) because who doesn't like deriving a name from an acronym. Currently the trampoline compiler is limited in its support for interface types and only supports a few primitives. I plan on filing future PRs to flesh out the support here for all the variants of `InterfaceType`. For now this PR is primarily focused on all of the other infrastructure for the addition of a trampoline compiler. With the choice to use core WebAssembly to implement fused adapters it means that adapters need to be inserted into a module. Unfortunately adapters cannot all go into a single WebAssembly module because adapters themselves have dependencies which may be provided transitively through instances that were instantiated with other adapters. This means that a significant chunk of this PR (`adapt.rs`) is dedicated to determining precisely which adapters go into precisely which adapter modules. This partitioning process attempts to make large modules wherever it can to cut down on core wasm instantiations but is likely not optimal as it's just a simple heuristic today. With all of this added together it's now possible to start writing `*.wast` tests that internally have adapted modules communicating with one another. A `fused.wast` test suite was added as part of this PR which is the beginning of tests for the support of the fused adapter compiler added in this PR. Currently this is primarily testing some various topologies of adapters along with direct/indirect modes. This will grow many more tests over time as more types are supported. Overall I'm not 100% satisfied with the testing story of this PR. When a test fails it's very difficult to debug since everything is written in the text format of WebAssembly meaning there's no "conveniences" to print out the state of the world when things go wrong and easily debug. I think this will become even more apparent as more tests are written for more types in subsequent PRs. At this time though I know of no better alternative other than leaning pretty heavily on fuzz-testing to ensure this is all exercised. * Fix an unused field warning * Fix tests in `wasmtime-runtime` * Add some more tests for compiled trampolines * Remap exports when injecting adapters The exports of a component were accidentally left unmapped which meant that they indexed the instance indexes pre-adapter module insertion. * Fix typo * Rebase conflicts |
||
|
|
7c67e620c4 |
support dynamic function calls in component model (#4442)
* support dynamic function calls in component model This addresses #4310, introducing a new `component::values::Val` type for representing component values dynamically, as well as `component::types::Type` for representing the corresponding interface types. It also adds a `call` method to `component::func::Func`, which takes a slice of `Val`s as parameters and returns a `Result<Val>` representing the result. Note that I've moved `post_return` and `call_raw` from `TypedFunc` to `Func` since there was nothing specific to `TypedFunc` about them, and I wanted to reuse them. The code in both is unchanged beyond the trivial tweaks to make them fit in their new home. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * order variants and match cases more consistently Signed-off-by: Joel Dice <joel.dice@fermyon.com> * implement lift for String, Box<str>, etc. This also removes the redundant `store` parameter from `Type::load`. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * implement code review feedback This fixes a few issues: - Bad offset calculation when lowering - Missing variant padding - Style issues regarding `types::Handle` - Missed opportunities to reuse `Lift` and `Lower` impls It also adds forwarding `Lift` impls for `Box<[T]>`, `Vec<T>`, etc. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * move `new_*` methods to specific `types` structs Per review feedback, I've moved `Type::new_record` to `Record::new_val` and added a `Type::unwrap_record` method; likewise for the other kinds of types. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * make tuple, option, and expected type comparisons recursive These types should compare as equal across component boundaries as long as their type parameters are equal. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * improve error diagnostic in `Type::check` We now distinguish between more failure cases to provide an informative error message. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * address review feedback - Remove `WasmStr::to_str_from_memory` and `WasmList::get_from_memory` - add `try_new` methods to various `values` types - avoid using `ExactSizeIterator::len` where we can't trust it - fix over-constrained bounds on forwarded `ComponentType` impls Signed-off-by: Joel Dice <joel.dice@fermyon.com> * rearrange code per review feedback - Move functions from `types` to `values` module so we can make certain struct fields private - Rename `try_new` to just `new` Signed-off-by: Joel Dice <joel.dice@fermyon.com> * remove special-case equality test for tuples, options, and expecteds Instead, I've added a FIXME comment and will open an issue to do recursive structural equality testing. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
33312c5380 |
Fix a possible panic with null-containing element segments (#4455)
This commit fixes an issue with the initialization of element segments when one of the elements in the element segment is `ref.func null`. Previously the contents of a table were accidentally initialized with the raw value of the `*mut VMCallerCheckedAnyfunc` which bypassed the "this is initialized" encoding of function table entries that Wasmtime uses for lazy table initialization. The fix here was to ensure that the encoded form is used. The impact of this issue is that a module could panic at runtime when accessing a table element that was initialized with an element segment containing a `ref.null func` entry. This only happens with imported tables in a WebAssembly module where the table itself was defined on the host. If the table was defined in another wasm module or in the local wasm module this bug would not occur. Additionally this bug requires enabling the reference types proposal for WebAssembly (which is enabled by default) due to the usage of encodings for null funcrefs in element segments. |
||
|
|
e31ff9dc67 |
implement wasmtime::component::flags! per #4308 (#4414)
* implement wasmtime::component::flags! per #4308 This is the last macro needed to complete #4308. It supports generating a Rust type that represents a `flags` component type, analogous to how the [bitflags crate](https://crates.io/crates/bitflags) operates. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * wrap `format_flags` output in parens This ensures we generate non-empty output even when no flags are set. Empty output for a `Debug` implementation would be confusing. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * unconditionally derive `Lift` and `Lower` in wasmtime::component::flags! Per feedback on #4414, we now derive impls for those traits unconditionally, which simplifies the syntax of the macro. Also, I happened to notice an alignment bug in `LowerExpander::expand_variant`, so I fixed that and cleaned up some related code. Finally, I used @jameysharp's trick to calculate bit masks without looping. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * fix shift overflow regression in previous commit Jamey pointed out my mistake: I didn't consider the case when the flag count was evenly divisible by the representation size. This fixes the problem and adds test cases to cover it. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
371ae80ac3 |
Migrate most of wasmtime from lazy_static to once_cell (#4368)
* Update tracing-core to a version which doesn't depend on lazy-static. * Update crossbeam-utils to a version that doesn't depend on lazy-static. * Update crossbeam-epoch to a version that doesn't depend on lazy-static. * Update clap to a version that doesn't depend on lazy-static. * Convert Wasmtime's own use of lazy_static to once_cell. * Make `GDB_REGISTRATION`'s comment a doc comment. * Fix compilation on Windows. |
||
|
|
76a2545a7f |
Implement nested instance exports for components (#4364)
This commit adds support to Wasmtime for components which themselves export instances. The support here adds new APIs for how instance exports are accessed in the embedding API. For now this is mostly just a first-pass where the API is somewhat confusing and has a lot of lifetimes. I'm hoping that over time we can figure out how to simplify this but for now it should at least be expressive enough for exploring the exports of an instance. |
||
|
|
5542c4ef26 |
support enums with more than 256 variants in derive macro (#4370)
* support enums with more than 256 variants in derive macro This addresses #4361. Technically, we now support up to 2^32 variants, which is the maximum for the canonical ABI. In practice, though, the derived code for enums with even just 2^16 variants takes a prohibitively long time to compile. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * simplify `LowerExpander::expand_variant` code Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
f252ae34ec |
support variant, enum, and union derives (#4359)
* support variant, enum, and union derives This is the second stage of implementing #4308. It adds support for deriving variant, enum, and union impls for `ComponentType`, `Lift`, and `Lower`. It also fixes derived record impls for generic `struct`s, which I had intended to support in my previous commit, but forgot to test. Signed-off-by: Joel Dice <joel.dice@fermyon.com> * deduplicate component-macro code Thanks to @jameysharp for the suggestion! Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
e179e736b9 |
Update may_enter flag handling in components (#4354)
This commit updates the management of the `may_enter` flag in line with WebAssembly/component-model#57. Namely the `may_enter` flag is now exclusively managed in the `canon lift` function (which is `TypedFunc::call`) and is only unset after post-return completes successfully. This implements semantics where if any trap happens for any reason (lifting, lowering, execution, imports, etc) then the instance is considered permanently poisoned and can no longer be entered. Tests needed many updates to create new instances where previously the same instance was reused after it had an erroneous state. |
||
|
|
22fb3ecbbf |
add ComponentType/Lift/Lower derive macro for record types (#4337)
This is the first stage of implementing https://github.com/bytecodealliance/wasmtime/issues/4308, i.e. derive macros for `ComponentType`, `Lift`, and `Lower` for composite types in the component model. This stage only covers records; I expect the other composite types will follow a similar pattern. It borrows heavily from the work Jamey Sharp did in https://github.com/bytecodealliance/wasmtime/pull/4217. Thanks for that, and thanks to both Jamey and Alex Crichton for their excellent review feedback. Thanks also to Brian for pairing up on the initial draft. Signed-off-by: Joel Dice <joel.dice@fermyon.com> |
||
|
|
c1b3962f7b |
Implement lowered-then-lifted functions (#4327)
* Implement lowered-then-lifted functions This commit is a few features bundled into one, culminating in the implementation of lowered-then-lifted functions for the component model. It's probably not going to be used all that often but this is possible within a valid component so Wasmtime needs to do something relatively reasonable. The main things implemented in this commit are: * Component instances are now assigned a `RuntimeComponentInstanceIndex` to differentiate each one. This will be used in the future to detect fusion (one instance lowering a function from another instance). For now it's used to allocate separate `VMComponentFlags` for each internal component instance. * The `CoreExport<FuncIndex>` of lowered functions was changed to a `CoreDef` since technically a lowered function can use another lowered function as the callee. This ended up being not too difficult to plumb through as everything else was already in place. * A need arose to compile host-to-wasm trampolines which weren't already present. Currently wasm in a component is always entered through a host-to-wasm trampoline but core wasm modules are the source of all the trampolines. In the case of a lowered-then-lifted function there may not actually be any core wasm modules, so component objects now contain necessary trampolines not otherwise provided by the core wasm objects. This feature required splitting a new function into the `Compiler` trait for creating a host-to-wasm trampoline. After doing this core wasm compilation was also updated to leverage this which further enabled compiling trampolines in parallel as opposed to the previous synchronous compilation. * Review comments |
||
|
|
df1502531d |
Migrate from winapi to windows-sys (#4346)
* Migrate from `winapi` to `windows-sys` I believe that Microsoft itself is supporting the development of `windows-sys` and it's also used by `cap-std` now so this switches Wasmtime's dependencies on Windows APIs from the `winapi` crate to the `windows-sys` crate. We still have `winapi` in our dependency graph but that may get phased out over time. * Make windows-sys a target-specific dependency |
||
|
|
fc38f39bd2 |
Expose raw list accessors for all integer types (#4330)
This commit extends the `WasmList<T>` type to have an `as_slice`-lookalike method (now renamed to `as_le_slice`) for all integer types rather than just the `u8` type. With the guarantees of the component model it's known that all lists are aligned in linear memory. Additionally linear memories themselves are also generally guaranteed to be aligned. This means that hosts where the primitive integer alignment is at most the size (which I think is basically all host platforms) can get a raw view into memory for the wasm linear memory for slices of these types. Note, though, that the remaining caveat after alignment is endianness. Big-endian hosts need to be aware that the integers aren't stored in a native format. Previously tools like wit-bindgen have added an `Le<T>` wrapper but for now I've opted to instead use a method that has "le" in the name - `as_le_slice`. I'm hoping that this is a clear enough indicator for users to little-endian conversions as appropriate when reading the values within the slice. |
||
|
|
84a43d86a1 |
Add a method to Linker and flag to wasmtime-cli to trap unknown import funcs (#4312)
* Add a method to Linker and flag to wasmtime-cli to trap unknown import funcs Sometimes users have a Command module which imports functions unknown to the wasmtime-cli, but does not call them at runtime. This PR provides a convenience method on Linker to define all unknown import functions in a given Module as a trivial implementation which traps, and hooks this up to a new cli flag --trap-unknown-imports. * add cfg guards - func_new requires compiler (naturally) |
||
|
|
3339dd1f01 |
Implement the post-return attribute (#4297)
This commit implements the `post-return` feature of the canonical ABI in the component model. This attribute is an optionally-specified function which is to be executed after the return value has been processed by the caller to optionally clean-up the return value. This enables, for example, returning an allocated string and the host then knows how to clean it up to prevent memory leaks in the original module. The API exposed in this PR changes the prior `TypedFunc::call` API in behavior but not in its signature. Previously the `TypedFunc::call` method would set the `may_enter` flag on the way out, but now that operation is deferred until a new `TypedFunc::post_return` method is called. This means that once a method on an instance is invoked then nothing else can be done on the instance until the `post_return` method is called. Note that the method must be called irrespective of whether the `post-return` canonical ABI option was specified or not. Internally wasm will be invoked if necessary. This is a pretty wonky and unergonomic API to work with. For now I couldn't think of a better alternative that improved on the ergonomics. In the theory that the raw Wasmtime bindings for a component may not be used all that heavily (instead `wit-bindgen` would largely be used) I'm hoping that this isn't too much of an issue in the future. cc #4185 |
||
|
|
fa36e86f2c |
Update WASI to cap-std 0.25 and windows-sys. (#4302)
This updates to rustix 0.35.6, and updates wasi-common to use cap-std 0.25 and windows-sys (instead of winapi). Changes include: - Better error code mappings on Windows. - Fixes undefined references to `utimensat` on Darwin. - Fixes undefined references to `preadv64` and `pwritev64` on Android. - Updates to io-lifetimes 0.7, which matches the io_safety API in Rust. - y2038 bug fixes for 32-bit platforms |
||
|
|
445cc87a06 |
Fix a "trampoline missing" panic with components (#4296)
One test case I wrote recently was to import a lowered function into a wasm module and then immediately export it. This previously didn't work because trampoline lookup would fail as the original `VMCallerCheckedAnyfunc` function pointer points into the `trampoline_obj` of a component which wasn't registered with the `ModuleRegistry`. This plumbs through the necessary configuration to get that all hooked up. |
||
|
|
651f40855f |
Add support for nested components (#4285)
* Add support for nested components
This commit is an implementation of a number of features of the
component model including:
* Defining nested components
* Outer aliases to components and modules
* Instantiating nested components
The implementation here is intended to be a foundational pillar of
Wasmtime's component model support since recursion and nested components
are the bread-and-butter of the component model. At a high level the
intention for the component model implementation in Wasmtime has long
been that the recursive nature of components is "erased" at compile time
to something that's more optimized and efficient to process. This commit
ended up exemplifying this quite well where the vast majority of the
internal changes here are in the "compilation" phase of a component
rather than the runtime instantiation phase. The support in the
`wasmtime` crate, the runtime instantiation support, only had minor
updates here while the internals of translation have seen heavy updates.
The `translate` module was greatly refactored here in this commit.
Previously it would, as a component is parsed, create a final
`Component` to hand off to trampoline compilation and get persisted at
runtime. Instead now it's a thin layer over `wasmparser` which simply
records a list of `LocalInitializer` entries for how to instantiate the
component and its index spaces are built. This internal representation
of the instantiation of a component is pretty close to the binary format
intentionally.
Instead of performing dataflow legwork the `translate` phase of a
component is now responsible for two primary tasks:
1. All components and modules are discovered within a component. They're
assigned `Static{Component,Module}Index` depending on where they're
found and a `{Module,}Translation` is prepared for each one. This
"flattens" the recursive structure of the binary into an indexed list
processable later.
2. The lexical scope of components is managed here to implement outer
module and component aliases. This is a significant design
implementation because when closing over an outer component or module
that item may actually be imported or something like the result of a
previous instantiation. This means that the capture of
modules and components is both a lexical concern as well as a runtime
concern. The handling of the "runtime" bits are handled in the next
phase of compilation.
The next and currently final phase of compilation is a new pass where
much of the historical code in `translate.rs` has been moved to (but
heavily refactored). The goal of compilation is to produce one "flat"
list of initializers for a component (as happens prior to this PR) and
to achieve this an "inliner" phase runs which runs through the
instantiation process at compile time to produce a list of initializers.
This `inline` module is the main addition as part of this PR and is now
the workhorse for dataflow analysis and tracking what's actually
referring to what.
During the `inline` phase the local initializers recorded in the
`translate` phase are processed, in sequence, to instantiate a
component. Definitions of items are tracked to correspond to their root
definition which allows seeing across instantiation argument boundaries
and such. Handling "upvars" for component outer aliases is handled in
the `inline` phase as well by creating state for a component whenever a
component is defined as was recorded during the `translate` phase.
Finally this phase is chiefly responsible for doing all string-based
name resolution at compile time that it can. This means that at runtime
no string maps will need to be consulted for item exports and such.
The final result of inlining is a list of "global initializers" which is
a flat list processed during instantiation time. These are almost
identical to the initializers that were processed prior to this PR.
There are certainly still more gaps of the component model to implement
but this should be a major leg up in terms of functionality that
Wasmtime implements. This commit, however leaves behind a "hole" which
is not intended to be filled in at this time, namely importing and
exporting components at the "root" level from and to the host. This is
tracked and explained in more detail as part of #4283.
cc #4185 as this completes a number of items there
* Tweak code to work on stable without warning
* Review comments
|
||
|
|
258dc9de42 |
fix(wasmtime):Config methods should be idempotent (#4252)
This commit refactored `Config` to use a seperate `CompilerConfig` field instead of operating on `CompilerBuilder` directly to make all its methods idempotent. Fixes #4189 |
||
|
|
7d7ddceb17 |
Update wasm-tools crates (#4246)
This commit updates the wasm-tools family of crates, notably pulling in the refactorings and updates from bytecodealliance/wasm-tools#621 for the latest iteration of the component model. This commit additionally updates all support for the component model for these changes, notably: * Many bits and pieces of type information was refactored. Many `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is now `TypeIndex` as well as `ComponentTypeIndex` for the two type index spaces in a component. * A number of new sections are now processed to handle the core and component variants. * Internal maps were split such as the `funcs` map into `component_funcs` and `funcs` (same for `instances`). * Canonical options are now processed individually instead of one bulk `into` definition. Overall this was not a major update to the internals of handling the component model in Wasmtime. Instead this was mostly a surface-level refactoring to make sure that everything lines up with the new binary format for components. * All text syntax used in tests was updated to the new syntax. |
||
|
|
2b52f47b83 |
Add shared memories (#4187)
* Add shared memories This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `*mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost. * review: remove thread feature check * review: swap wasmtime-types dependency for existing wasmtime-environ use * review: remove unused VMMemoryUnion * review: reword cross-engine error message * review: improve tests * review: refactor to separate prevent Memory <-> SharedMemory conversion * review: into_shared_memory -> as_shared_memory * review: remove commented out code * review: limit shared min/max to 32 bits * review: skip imported memories * review: imported memories are not owned * review: remove TODO * review: document unsafe send + sync * review: add limiter assertion * review: remove TODO * review: improve tests * review: fix doc test * fix: fixes based on discussion with Alex This changes several key parts: - adds memory indexes to imports and exports - makes `VMMemoryDefinition::current_length` an atomic usize * review: add `Extern::SharedMemory` * review: remove TODO * review: atomically load from VMMemoryDescription in JIT-generated code * review: add test probing the last available memory slot across threads * fix: move assertion to new location due to rebase * fix: doc link * fix: add TODOs to c-api * fix: broken doc link * fix: modify pooling allocator messages in tests * review: make owned_memory_index panic instead of returning an option * review: clarify calculation of num_owned_memories * review: move 'use' to top of file * review: change '*const [u8]' to '*mut [u8]' * review: remove TODO * review: avoid hard-coding memory index * review: remove 'preallocation' parameter from 'Memory::_new' * fix: component model memory length * review: check that shared memory plans are static * review: ignore growth limits for shared memory * review: improve atomic store comment * review: add FIXME for memory growth failure * review: add comment about absence of bounds-checked 'memory.size' * review: make 'current_length()' doc comment more precise * review: more comments related to memory.size non-determinism * review: make 'vmmemory' unreachable for shared memory * review: move code around * review: thread plan through to 'wrap()' * review: disallow shared memory allocation with the pooling allocator |
||
|
|
088e568f22 |
Accept (tuple) and unit as () in Rust (#4241)
This commit updates the implementation of `ComponentType for ()` to typecheck both the empty tuple type in addition to the `unit` type in the component model. This allows the usage of `()` when either of those types are used. Currently this can work because we don't need to currently support the answer of "what is the type of this host function". Instead the only question that needs to be answered at runtime is "does this host function match this type". |
||
|
|
0b4448a423 |
Validate alignment in the canonical ABI (#4238)
This commit updates the lifting and lowering done by Wasmtime to validate that alignment is all correct. Previously alignment was ignored because I wasn't sure how this would all work out. To be extra safe I haven't actually modified any loads/stores and they're all still unaligned. If this becomes a performance issue we can investigate aligned loads and stores but otherwise I believe the requisite locations have been guarded with traps and I've also added debug asserts to catch possible future mistakes. |
||
|
|
479def00b9 |
Update lifting for integers and bools (#4237)
This commit updates lifting for integer types and boolean types to account for WebAssembly/component-model#35 where extra bits are now discarded instead of being validated as all zero. |
||
|
|
11ff9650e5 |
Split the ComponentValue trait into... components (#4236)
This commit splits the current `ComponentValue` trait into three separate traits: * `ComponentType` - contains size/align/typecheck information in addition to the "lower" representation. * `Lift` - only contains `lift` and `load` * `Lower` - only contains `lower` and `store` When describing the original implementation of host functions to Nick he immediately pointed out this superior solution to the traits involved with Wasmtime's support for typed parameters/returns in exported and imported functions. Instead of having dynamic errors at runtime for things like "you can't lift a `String`" that's instead a static compile-time error now. While I was doing this split I also refactored the `ComponentParams` trait a bit to have `ComponentType` as a supertrait instead of a subtype which made its implementations a bit more compact. Additionally its impl blocks were folded into the existing tuple impl blocks. |
||
|
|
20f510671d |
Enable passing host functions to components (#4219)
* Enable passing host functions to components
This commit implements the ability to pass a host function into a
component. The `wasmtime::component::Linker` type now has a `func_wrap`
method allowing it to take a host function which is exposed internally
to the component and available for lowering.
This is currently mostly a "let's get at least the bare minimum working"
implementation. That involves plumbing around lots of various bits of
the canonical ABI and getting all the previous PRs to line up in this
one to get a test where we call a function where the host takes a
string. This PR also additionally starts reading and using the
`may_{enter,leave}` flags since this is the first time they're actually
relevant.
Overall while this is the bare bones of working this is not a final spot
we should end up at. One of the major downsides is that host functions
are represented as:
F: Fn(StoreContextMut<'_, T>, Arg1, Arg2, ...) -> Result<Return>
while this naively seems reasonable this critically doesn't allow
`Return` to actually close over any of its arguments. This means that if
you want to return a string to wasm then it has to be `String` or
`Rc<str>` or some other owned type. In the case of `String` this means
that to return a string to wasm you first have to copy it from the host
to a temporary `String` allocation, then to wasm. This extra copy for
all strings/lists is expected to be prohibitive. Unfortuantely I don't
think Rust is able to solve this, at least on stable, today.
Nevertheless I wanted to at least post this to get some feedback on it
since it's the final step in implementing host imports to see how others
feel about it.
* Fix a typo in an assertion
* Fix some typos
* Review comments
|
||
|
|
3ed6fae7b3 |
Add trampoline compilation support for lowered imports (#4206)
* Add trampoline compilation support for lowered imports This commit adds support to the component model implementation for compiling trampolines suitable for calling host imports. Currently this is purely just the compilation side of things, modifying the wasmtime-cranelift crate and additionally filling out a new `VMComponentOffsets` type (similar to `VMOffsets`). The actual creation of a `VMComponentContext` is still not performed and will be a subsequent PR. Internally though some tests are actually possible with this where we at least assert that compilation of a component and creation of everything in-memory doesn't panic or trip any assertions, so some tests are added here for that as well. * Fix some test errors |
||
|
|
b49c5c878e |
Implement module imports into components (#4208)
* Implement module imports into components As a step towards implementing function imports into a component this commit implements importing modules into a component. This fills out missing pieces of functionality such as exporting modules as well. The previous translation code had initial support for translating imported modules but some of the AST type information was restructured with feedback from this implementation, namely splitting the `InstantiateModule` initializer into separate upvar/import variants to clarify that the item orderings for imports are resolved differently at runtime. Much of this commit is also adding infrastructure for any imports at all into a component. For example a `Linker` type (analagous to `wasmtime::Linker`) was added here as well. For now this type is quite limited due to the inability to define host functions (it can only work with instances and instances-of-modules) but it's enough to start writing `*.wast` tests which exercise lots of module-related functionality. * Fix a warning |
||
|
|
9f5f978baa |
Fix double-counting imports in VMOffsets calculations (#4209)
* Fix double-counting imports in `VMOffsets` calculations This fixes an oversight in the initial creation of `VMOffsets` for a module to avoid double-counting imported globals, tables, and memories for calculating the size of the `VMContext`. Prior to this PR imported items are accidentally also counted as defined items for sizing calculations meaning that when a memory is imported but not defined, for example, the `VMContext` will have a space for an inline `VMMemoryDefinition` when it doesn't need to. Auditing where all this relates to it appears that the only issue from this mistake is that `VMContext` is a bit larger than it would otherwise need to be. Extra slots are uninitialized memory but nothing in Wasmtime ever actually accesses the memory either, so it should be harmless to have extra space here. Nevertheless it seems better to shrink the size as much as possible to avoid wasting space where we can. * Fix tests |
||
|
|
d5ce51e8d1 |
Redesign interface type value representation (#4198)
Prior to this PR a major feature of calling component exports (#4039) was the usage of the `Value<T>` type. This type represents a value stored in wasm linear memory (the type `T` stored there). This implementation had a number of drawbacks though: * When returning a value it's ABI-specific whether you use `T` or `Value<T>` as a return value. If `T` is represented with one wasm primitive then you have to return `T`, otherwise the return value must be `Value<T>`. This is somewhat non-obvious and leaks ABI-details into the API which is unfortunate. * The `T` in `Value<T>` was somewhat non-obvious. For example a wasm-owned string was `Value<String>`. Using `Value<&str>` didn't work. * Working with `Value<T>` was unergonomic in the sense that you had to first "pair" it with a `&Store<U>` to get a `Cursor<T>` and then you could start reading the value. * Custom structs and enums, while not implemented yet, were planned to be quite wonky where when you had `Cursor<MyStruct>` then you would have to import a `CursorMyStructExt` trait generated by a proc-macro (think a `#[derive]` on the definition of `MyStruct`) which would enable field accessors, returning cursors of all the fields. * In general there was no "generic way" to load a `T` from memory. Other operations like lift/lower/store all had methods in the `ComponentValue` trait but load had no equivalent. None of these drawbacks were deal-breakers per-se. When I started to implement imported functions, though, the `Value<T>` type no longer worked. The major difference between imports and exports is that when receiving values from wasm an export returns at most one wasm primitive where an import can yield (through arguments) up to 16 wasm primitives. This means that if an export returned a string it would always be `Value<String>` but if an import took a string as an argument there was actually no way to represent this with `Value<String>` since the value wasn't actually stored in memory but rather the pointer/length pair is received as arguments. Overall this meant that `Value<T>` couldn't be used for arguments-to-imports, which means that altogether something new would be required. This PR completely removes the `Value<T>` and `Cursor<T>` type in favor of a different implementation. The inspiration from this comes from the fact that all primitives can be both lifted and lowered into wasm while it's just some times which can only go one direction. For example `String` can be lowered into wasm but can't be lifted from wasm. Instead some sort of "view" into wasm needs to be created during lifting. One of the realizations from #4039 was that we could leverage run-time-type-checking to reject static constructions that don't make sense. For example if an embedder asserts that a wasm function returns a Rust `String` we can reject that at typechecking time because it's impossible for a wasm module to ever do that. The new system of imports/exports in this PR now looks like: * Type-checking takes into accont an `Op` operation which indicates whether we'll be lifting or lowering the type. This means that we can allow the lowering operation for `String` but disallow the lifting operation. While we can't statically rule out an embedder saying that a component returns a `String` we can now reject it at runtime and disallow it from being called. * The `ComponentValue` trait now sports a new `load` function. This function will load and instance of `Self` from the byte-array provided. This is implemented for all types but only ever actually executed when the `lift` operation is allowed during type-checking. * The `Lift` associated type is removed since it's now expected that the lift operation returns `Self`. * The `ComponentReturn` trait is now no longer necessary and is removed. Instead returns are bounded by `ComponentValue`. During type-checking it's required that the return value can be lifted, disallowing, for example, returning a `String` or `&str`. * With `Value` gone there's no need to specify the ABI details of the return value, or whether it's communicated through memory or not. This means that handling return values through memory is transparently handled by Wasmtime. * Validation is in a sense more eagerly performed now. Whenever a value `T` is loaded the entire immediate structure of `T` is loaded and validated. Note that recursive through memory validation still does not happen, so the contents of lists or strings aren't validated, it's just validated that the pointers are in-bounds. Overall this felt like a much clearer system to work with and should be much easier to integrate with imported functions as well. The new `WasmStr` and `WasmList<T>` types can be used in import arguments and lifted from the immediate arguments provided rather than forcing them to always be stored in memory. |
||
|
|
2a4851ad2b |
Change some VMContext pointers to () pointers (#4190)
* Change some `VMContext` pointers to `()` pointers This commit is motivated by my work on the component model implementation for imported functions. Currently all context pointers in wasm are `*mut VMContext` but with the component model my plan is to make some pointers instead along the lines of `*mut VMComponentContext`. In doing this though one worry I have is breaking what has otherwise been a core invariant of Wasmtime for quite some time, subtly introducing bugs by accident. To help assuage my worry I've opted here to erase knowledge of `*mut VMContext` where possible. Instead where applicable a context pointer is simply known as `*mut ()` and the embedder doesn't actually know anything about this context beyond the value of the pointer. This will help prevent Wasmtime from accidentally ever trying to interpret this context pointer as an actual `VMContext` when it might instead be a `VMComponentContext`. Overall this was a pretty smooth transition. The main change here is that the `VMTrampoline` (now sporting more docs) has its first argument changed to `*mut ()`. The second argument, the caller context, is still configured as `*mut VMContext` though because all functions are always called from wasm still. Eventually for component-to-component calls I think we'll probably "fake" the second argument as the same as the first argument, losing track of the original caller, as an intentional way of isolating components from each other. Along the way there are a few host locations which do actually assume that the first argument is indeed a `VMContext`. These are valid assumptions that are upheld from a correct implementation, but I opted to add a "magic" field to `VMContext` to assert this in debug mode. This new "magic" field is inintialized during normal vmcontext initialization and it's checked whenever a `VMContext` is reinterpreted as an `Instance` (but only in debug mode). My hope here is to catch any future accidental mistakes, if ever. * Use a VMOpaqueContext wrapper * Fix typos |
||
|
|
f4b9020913 |
Change wasm-to-host trampolines to take the values_vec size (#4192)
* Change wasm-to-host trampolines to take the values_vec size This commit changes the ABI of wasm-to-host trampolines, which are only used right now for functions created with `Func::new`, to pass along the size of the `values_vec` argument. Previously the trampoline simply received `*mut ValRaw` and assumed that it was the appropriate size. By receiving a size as well we can thread through `&mut [ValRaw]` internally instead of `*mut ValRaw`. The original motivation for this is that I'm planning to leverage these trampolines for the component model for host-defined functions. Out of an abundance of caution of making sure that everything lines up I wanted to be able to write down asserts about the size received at runtime compared to the size expected. This overall led me to the desire to thread this size parameter through on the assumption that it would not impact performance all that much. I ran two benchmarks locally from the `call.rs` benchmark and got: * `sync/no-hook/wasm-to-host - nop - unchecked` - no change * `sync/no-hook/wasm-to-host - nop-params-and-results - unchecked` - 5% slower This is what I roughly expected in that if nothing actually reads the new parameter (e.g. no arguments) then threading through the parameter is effectively otherwise free. Otherwise though accesses to the `ValRaw` storage is now bounds-checked internally in Wasmtime instead of assuming it's valid, leading to the 5% slowdown (~9.6ns to ~10.3ns). If this becomes a peformance bottleneck for a particular use case then we should be fine to remove the bounds checking here or otherwise only bounds check in debug mode, otherwise I plan on leaving this as-is. Of particular note this also changes the C API for `*_unchecked` functions where the C callback now receives the size of the array as well. * Add docs |
||
|
|
bffce37050 |
make backtrace collection a Config field rather than a cargo feature (#4183)
* sorta working in runtime * wasmtime-runtime: get rid of wasm-backtrace feature * wasmtime: factor to make backtraces recording optional. not configurable yet * get rid of wasm-backtrace features * trap tests: now a Trap optionally contains backtrace * eliminate wasm-backtrace feature * code review fixes * ci: no more wasm-backtrace feature * c_api: backtraces always enabled * config: unwind required by backtraces and ref types * plumbed * test that disabling backtraces works * code review comments * fuzzing generator: wasm_backtrace is a runtime config now * doc fix |
||
|
|
a02a609528 |
Make ValRaw fields private (#4186)
* Make `ValRaw` fields private Force accessing to go through constructors and accessors to localize the knowledge about little-endian-ness. This is spawned since I made a mistake in #4039 about endianness. * Fix some tests * Component model changes |
||
|
|
140b83597b |
components: Implement the ability to call component exports (#4039)
* components: Implement the ability to call component exports This commit is an implementation of the typed method of calling component exports. This is intended to represent the most efficient way of calling a component in Wasmtime, similar to what `TypedFunc` represents today for core wasm. Internally this contains all the traits and implementations necessary to invoke component exports with any type signature (e.g. arbitrary parameters and/or results). The expectation is that for results we'll reuse all of this infrastructure except in reverse (arguments and results will be swapped when defining imports). Some features of this implementation are: * Arbitrary type hierarchies are supported * The Rust-standard `Option`, `Result`, `String`, `Vec<T>`, and tuple types all map down to the corresponding type in the component model. * Basic utf-16 string support is implemented as proof-of-concept to show what handling might look like. This will need further testing and benchmarking. * Arguments can be behind "smart pointers", so for example `&Rc<Arc<[u8]>>` corresponds to `list<u8>` in interface types. * Bulk copies from linear memory never happen unless explicitly instructed to do so. The goal of this commit is to create the ability to actually invoke wasm components. This represents what is expected to be the performance threshold for these calls where it ideally should be optimal how WebAssembly is invoked. One major missing piece of this is a `#[derive]` of some sort to generate Rust types for arbitrary `*.wit` types such as custom records, variants, flags, unions, etc. The current trait impls for tuples and `Result<T, E>` are expected to have fleshed out most of what such a derive would look like. There are some downsides and missing pieces to this commit and method of calling components, however, such as: * Passing `&[u8]` to WebAssembly is currently not optimal. Ideally this compiles down to a `memcpy`-equivalent somewhere but that currently doesn't happen due to all the bounds checks of copying data into memory. I have been unsuccessful so far at getting these bounds checks to be removed. * There is no finalization at this time (the "post return" functionality in the canonical ABI). Implementing this should be relatively straightforward but at this time requires `wasmparser` changes to catch up with the current canonical ABI. * There is no guarantee that results of a wasm function will be validated. As results are consumed they are validated but this means that if function returns an invalid string which the host doesn't look at then no trap will be generated. This is probably not the intended semantics of hosts in the component model. * At this time there's no support for memory64 memories, just a bunch of `FIXME`s to get around to. It's expected that this won't be too onerous, however. Some extra care will need to ensure that the various methods related to size/alignment all optimize to the same thing they do today (e.g. constants). * The return value of a typed component function is either `T` or `Value<T>`, and it depends on the ABI details of `T` and whether it takes up more than one return value slot or not. This is an ABI-implementation detail which is being forced through to the API layer which is pretty unfortunate. For example if you say the return value of a function is `(u8, u32)` then it's a runtime type-checking error. I don't know of a great way to solve this at this time. Overall I'm feeling optimistic about this trajectory of implementing value lifting/lowering in Wasmtime. While there are a number of downsides none seem completely insurmountable. There's naturally still a good deal of work with the component model but this should be a significant step up towards implementing and testing the component model. * Review comments * Write tests for calling functions This commit adds a new test file for actually executing functions and testing their results. This is not written as a `*.wast` test yet since it's not 100% clear if that's the best way to do that for now (given that dynamic signatures aren't supported yet). The tests themselves could all largely be translated to `*.wast` testing in the future, though, if supported. Along the way a number of minor issues were fixed with lowerings with the bugs exposed here. * Fix an endian mistake * Fix a typo and the `memory.fill` instruction |
||
|
|
fcf6208750 |
Initial skeleton of some component model processing (#4005)
* Initial skeleton of some component model processing This commit is the first of what will likely be many to implement the component model proposal in Wasmtime. This will be structured as a series of incremental commits, most of which haven't been written yet. My hope is to make this incremental and over time to make this easier to review and easier to test each step in isolation. Here much of the skeleton of how components are going to work in Wasmtime is sketched out. This is not a complete implementation of the component model so it's not all that useful yet, but some things you can do are: * Process the type section into a representation amenable for working with in Wasmtime. * Process the module section and register core wasm modules. * Process the instance section for core wasm modules. * Process core wasm module imports. * Process core wasm instance aliasing. * Ability to compile a component with core wasm embedded. * Ability to instantiate a component with no imports. * Ability to get functions from this component. This is already starting to diverge from the previous module linking representation where a `Component` will try to avoid unnecessary metadata about the component and instead internally only have the bare minimum necessary to instantiate the module. My hope is we can avoid constructing most of the index spaces during instantiation only for it to all ge thrown away. Additionally I'm predicting that we'll need to see through processing where possible to know how to generate adapters and where they are fused. At this time you can't actually call a component's functions, and that's the next PR that I would like to make. * Add tests for the component model support This commit uses the recently updated wasm-tools crates to add tests for the component model added in the previous commit. This involved updating the `wasmtime-wast` crate for component-model changes. Currently the component support there is quite primitive, but enough to at least instantiate components and verify the internals of Wasmtime are all working correctly. Additionally some simple tests for the embedding API have also been added. |
||
|
|
89ccc56e46 |
Update the wasm-tools family of crates (#4165)
* Update the wasm-tools family of crates This commit updates these crates as used by Wasmtime for the recently published versions to pull in changes necessary to support the component model. I've split this out from #4005 to make it clear what's impacted here and #4005 can simply rebase on top of this to pick up the necessary changes. * More test fixes |
||
|
|
f19d8cc851 |
Run a callback when the interruption epoch is reached (#4152)
* Run a callback when the interruption epoch is reached Adds Store::epoch_deadline_callback. This accepts a callback which, when invoked, can mutate the store's contents. The callback can either return an error (in which case we trap) or return a delta which we'll use to set the new epoch deadline. * Add a basic test for epoch interruption callback * Some small nits - Remove use of &mut in the pattern match - Return both yields and state from run_and_count_yields_or_trap in test code and assert on them separately. - Add a test for trapping on a state failure. |
||
|
|
52524d258c |
Expose TrapCode::Interrupt on epoch based interruption (#4105)
|
||
|
|
ccf834b473 |
Fix an issue where massive memory images are created (#4112)
This commit fixes an issue introduced in #4046 where the checks for ensuring that the memory initialization image for a module was constrained in its size failed to trigger and a very small module could produce an arbitrarily large memory image. The bug in question was that if a module only had empty data segments at arbitrarily small and large addresses then the loop which checks whether or not the image is allowed was skipped entirely since it was seen that the memory had no data size. The fix here is to skip segments that are empty to ensure that if the validation loop is skipped then no data segments will be processed to create the image (and the module won't end up having an image in the end). |
||
|
|
90791a0e32 |
Reduce contention on the global module rwlock (#4041)
* Reduce contention on the global module rwlock This commit intendes to close #4025 by reducing contention on the global rwlock Wasmtime has for module information during instantiation and dropping a store. Currently registration of a module into this global map happens during instantiation, but this can be a hot path as embeddings may want to, in parallel, instantiate modules. Instead this switches to a strategy of inserting into the global module map when a `Module` is created and then removing it from the map when the `Module` is dropped. Registration in a `Store` now preserves the entire `Module` within the store as opposed to trying to only save it piecemeal. In reality the only piece that wasn't saved within a store was the `TypeTables` which was pretty inconsequential for core wasm modules anyway. This means that instantiation should now clone a singluar `Arc` into a `Store` per `Module` (previously it cloned two) with zero managemnt on the global rwlock as that happened at `Module` creation time. Additionally dropping a `Store` again involves zero rwlock management and only a single `Arc` drop per-instantiated module (previously it was two). In the process of doing this I also went ahead and removed the `Module::new_with_name` API. This has been difficult to support historically with various variations on the internals of `ModuleInner` because it involves mutating a `Module` after it's been created. My hope is that this API is pretty rarely used and/or isn't super important, so it's ok to remove. Finally this change removes some internal `Arc` layerings that are no longer necessary, attempting to use either `T` or `&T` where possible without dealing with the overhead of an `Arc`. Closes #4025 * Move back to a `BTreeMap` in `ModuleRegistry` |