Commit Graph

10065 Commits

Author SHA1 Message Date
Alex Crichton
b1273548fb Try using windows-latest CI (#4553)
See if our `windows-2019` woes are solved now that backtraces are using
frame pointers instead of native APIs.
2022-07-29 12:15:54 -05:00
Alex Crichton
6d6e7e0f6a Add definitions of tiers-of-support for Wasmtime (#4479)
* Add definitions of tiers-of-support for Wasmtime

This commit adds documentation of a Tiers-based system for classifying
how supported a component is within Wasmtime. This was somewhat
pioneered in the [Wasmtime 1.0 RFC][rfc] but the documentation here is
expanded to include more than just API stability but additionally other
components. Inspiration for this is drawn from Rust's definition of
[support tiers][rust] as well.

The motivation for this is to help clarify what exactly it means to live
at each tier and what is expected. For example one thing this document
clarifies is the requirements necessary for landing new major changes in
Wasmtime at all. Additionally this helps clarify what it means to have
the highest level of support vs "otherwise well supported".

[rfc]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasmtime-one-dot-oh.md#tier-1---api-stable-production-quality
[rust]: https://doc.rust-lang.org/rustc/target-tier-policy.html

* Review comments

* Review comments
2022-07-29 15:11:16 +00:00
Afonso Bordado
1f058a02c0 cranelift: Add MinGW fma regression tests (#4517)
* cranelift: Add MinGW `fma` regression tests

* cranelift: Fix FMA in interpreter

* cranelift: Add separate `fma` test suite for the interpreter

The interpreter can run `fma.clif` on most platforms, however on
`x86_64-pc-windows-gnu` we use libm which has issues with some inputs.
We should delete `fma-interpreter.clif` and enable the interpreter on
the main `fma.clif` file once those are fixed.
2022-07-29 09:09:37 -05:00
Nick Fitzgerald
46782b18c2 wasmtime: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime

This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.

* wasmtime: Implement fast Wasm stack walking

Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.

Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:

1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.

2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.

This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.

The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:

```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
    vec![wasmtime::ValType::I32],
    vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
    // ...
    Ok(())
});

// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```

Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)

Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.

<h2>Benchmarks</h2>

<h3>Traps and Stack Traces</h3>

Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.

<details>

```
multi-threaded-traps/0  time:   [2.5686 us 2.5808 us 2.5934 us]
                        thrpt:  [0.0000  elem/s 0.0000  elem/s 0.0000  elem/s]
                 change:
                        time:   [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
                        thrpt:  [+560.90% +573.56% +585.84%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
multi-threaded-traps/1  time:   [2.9021 us 2.9167 us 2.9322 us]
                        thrpt:  [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
                 change:
                        time:   [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
                        thrpt:  [+1023.1% +1048.6% +1070.3%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
multi-threaded-traps/2  time:   [2.9996 us 3.0145 us 3.0295 us]
                        thrpt:  [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
                 change:
                        time:   [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
                        thrpt:  [+1503.1% +1542.0% +1578.0%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe
multi-threaded-traps/4  time:   [5.5768 us 5.6052 us 5.6364 us]
                        thrpt:  [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
                 change:
                        time:   [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
                        thrpt:  [+1339.2% +1353.6% +1369.1%]
                        Performance has improved.
multi-threaded-traps/8  time:   [8.6408 us 9.1212 us 9.5438 us]
                        thrpt:  [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
                 change:
                        time:   [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
                        thrpt:  [+1624.7% +1709.2% +1806.1%]
                        Performance has improved.
multi-threaded-traps/16 time:   [10.152 us 10.840 us 11.545 us]
                        thrpt:  [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
                 change:
                        time:   [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
                        thrpt:  [+2821.5% +3048.1% +3281.1%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

many-modules-registered-traps/1
                        time:   [2.6278 us 2.6361 us 2.6447 us]
                        thrpt:  [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
                 change:
                        time:   [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
                        thrpt:  [+562.65% +571.51% +580.76%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
many-modules-registered-traps/8
                        time:   [2.6294 us 2.6460 us 2.6623 us]
                        thrpt:  [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
                 change:
                        time:   [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
                        thrpt:  [+567.63% +588.95% +608.95%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
many-modules-registered-traps/64
                        time:   [2.6218 us 2.6329 us 2.6452 us]
                        thrpt:  [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
                 change:
                        time:   [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
                        thrpt:  [+1431.4% +1450.6% +1469.5%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
many-modules-registered-traps/512
                        time:   [2.6569 us 2.6737 us 2.6923 us]
                        thrpt:  [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
                 change:
                        time:   [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
                        thrpt:  [+13417% +13566% +13731%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
many-modules-registered-traps/4096
                        time:   [2.7258 us 2.7390 us 2.7535 us]
                        thrpt:  [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
                 change:
                        time:   [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
                        thrpt:  [+221417% +223380% +224881%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

many-stack-frames-traps/1
                        time:   [1.4658 us 1.4719 us 1.4784 us]
                        thrpt:  [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
                 change:
                        time:   [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
                        thrpt:  [+860.23% +894.72% +938.21%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
many-stack-frames-traps/8
                        time:   [2.4772 us 2.4870 us 2.4973 us]
                        thrpt:  [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
                 change:
                        time:   [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
                        thrpt:  [+575.65% +583.51% +592.03%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
many-stack-frames-traps/64
                        time:   [10.109 us 10.171 us 10.236 us]
                        thrpt:  [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
                 change:
                        time:   [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
                        thrpt:  [+341.22% +350.38% +357.55%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
many-stack-frames-traps/512
                        time:   [126.16 us 126.54 us 126.96 us]
                        thrpt:  [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
                 change:
                        time:   [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
                        thrpt:  [+181.32% +185.17% +188.71%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe
```

</details>

<h3>Calls</h3>

There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.

I believe this regression is ultimately acceptable because

1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,

2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,

3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.

<details>

```
sync/no-hook/host-to-wasm - typed - nop
                        time:   [28.683 ns 28.757 ns 28.844 ns]
                        change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
                        time:   [42.515 ns 42.652 ns 42.841 ns]
                        change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
                        time:   [33.936 ns 34.052 ns 34.179 ns]
                        change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
                        time:   [34.290 ns 34.388 ns 34.502 ns]
                        change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
                        time:   [62.546 ns 62.721 ns 62.919 ns]
                        change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
                        time:   [42.609 ns 42.710 ns 42.831 ns]
                        change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

sync/hook-sync/host-to-wasm - typed - nop
                        time:   [29.546 ns 29.675 ns 29.818 ns]
                        change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
                        time:   [45.448 ns 45.699 ns 45.961 ns]
                        change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
                        time:   [34.334 ns 34.437 ns 34.558 ns]
                        change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
                        time:   [36.594 ns 36.763 ns 36.974 ns]
                        change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
                        time:   [63.541 ns 63.831 ns 64.194 ns]
                        change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
                        time:   [43.968 ns 44.169 ns 44.437 ns]
                        change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

async/no-hook/host-to-wasm - typed - nop
                        time:   [4.9612 us 4.9743 us 4.9889 us]
                        change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
                        time:   [5.0030 us 5.0211 us 5.0439 us]
                        change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
                        time:   [4.9273 us 4.9468 us 4.9700 us]
                        change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
                        time:   [5.1151 us 5.1338 us 5.1555 us]
                        change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

async/hook-sync/host-to-wasm - typed - nop
                        time:   [4.9330 us 4.9394 us 4.9467 us]
                        change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
                        time:   [5.0073 us 5.0183 us 5.0310 us]
                        change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
                        time:   [4.9610 us 4.9839 us 5.0097 us]
                        change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
                        time:   [5.0995 us 5.1272 us 5.1617 us]
                        change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

async-pool/no-hook/host-to-wasm - typed - nop
                        time:   [2.4242 us 2.4316 us 2.4396 us]
                        change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
                        time:   [2.5102 us 2.5155 us 2.5210 us]
                        change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
                        time:   [2.4203 us 2.4310 us 2.4440 us]
                        change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
                        time:   [2.5501 us 2.5593 us 2.5700 us]
                        change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

async-pool/hook-sync/host-to-wasm - typed - nop
                        time:   [2.4135 us 2.4190 us 2.4254 us]
                        change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
                        time:   [2.5172 us 2.5248 us 2.5357 us]
                        change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
                        time:   [2.4214 us 2.4353 us 2.4532 us]
                        change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
                        time:   [2.5499 us 2.5607 us 2.5748 us]
                        change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe

sync/no-hook/wasm-to-host - nop - typed
                        time:   [6.6135 ns 6.6288 ns 6.6452 ns]
                        change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
                        time:   [15.930 ns 15.993 ns 16.067 ns]
                        change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
                        time:   [20.596 ns 20.640 ns 20.690 ns]
                        change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
                        time:   [42.659 ns 42.882 ns 43.159 ns]
                        change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
                        time:   [10.671 ns 10.691 ns 10.713 ns]
                        change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.136 ns 11.190 ns 11.263 ns]
                        change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

sync/hook-sync/wasm-to-host - nop - typed
                        time:   [6.7964 ns 6.8087 ns 6.8226 ns]
                        change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
                        time:   [15.865 ns 15.921 ns 15.985 ns]
                        change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
                        time:   [21.505 ns 21.587 ns 21.677 ns]
                        change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
                        time:   [44.018 ns 44.128 ns 44.261 ns]
                        change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
                        time:   [11.264 ns 11.326 ns 11.387 ns]
                        change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.816 ns 11.865 ns 11.920 ns]
                        change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) high mild
  6 (6.00%) high severe

async/no-hook/wasm-to-host - nop - typed
                        time:   [6.6221 ns 6.6385 ns 6.6569 ns]
                        change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
                        time:   [15.884 ns 15.929 ns 15.983 ns]
                        change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
                        time:   [20.615 ns 20.702 ns 20.821 ns]
                        change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
                        time:   [41.956 ns 42.207 ns 42.521 ns]
                        change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
                        time:   [10.440 ns 10.474 ns 10.513 ns]
                        change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.476 ns 11.512 ns 11.554 ns]
                        change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
                        time:   [26.427 ns 26.478 ns 26.532 ns]
                        change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
                        time:   [28.557 ns 28.693 ns 28.880 ns]
                        change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe

async/hook-sync/wasm-to-host - nop - typed
                        time:   [6.7488 ns 6.7630 ns 6.7784 ns]
                        change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
                        time:   [15.928 ns 16.031 ns 16.149 ns]
                        change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
                        time:   [21.930 ns 22.114 ns 22.296 ns]
                        change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
                        time:   [42.684 ns 42.858 ns 43.081 ns]
                        change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
                        time:   [11.026 ns 11.053 ns 11.086 ns]
                        change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.840 ns 11.900 ns 11.982 ns]
                        change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
                        time:   [27.601 ns 27.709 ns 27.882 ns]
                        change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
                        time:   [28.955 ns 29.174 ns 29.413 ns]
                        change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

async-pool/no-hook/wasm-to-host - nop - typed
                        time:   [6.5626 ns 6.5733 ns 6.5851 ns]
                        change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
                        time:   [15.820 ns 15.886 ns 15.969 ns]
                        change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
                        time:   [20.481 ns 20.521 ns 20.566 ns]
                        change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
                        time:   [41.834 ns 41.998 ns 42.189 ns]
                        change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
                        time:   [10.353 ns 10.380 ns 10.414 ns]
                        change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.123 ns 11.168 ns 11.228 ns]
                        change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
                        time:   [27.442 ns 27.528 ns 27.638 ns]
                        change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
                        time:   [29.014 ns 29.148 ns 29.312 ns]
                        change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

async-pool/hook-sync/wasm-to-host - nop - typed
                        time:   [6.7916 ns 6.8116 ns 6.8325 ns]
                        change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
                        time:   [15.917 ns 15.975 ns 16.051 ns]
                        change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
                        time:   [21.558 ns 21.612 ns 21.679 ns]
                        change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
                        time:   [42.475 ns 42.614 ns 42.775 ns]
                        change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  3 (3.00%) high mild
  15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
                        time:   [11.150 ns 11.195 ns 11.247 ns]
                        change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
                        time:   [11.639 ns 11.695 ns 11.760 ns]
                        change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
                        time:   [27.480 ns 27.712 ns 27.984 ns]
                        change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
                        time:   [29.218 ns 29.380 ns 29.600 ns]
                        change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe
```

</details>

* Add s390x support for frame pointer-based stack walking

* wasmtime: Allow `Caller::get_export` to get all exports

* fuzzing: Add a fuzz target to check that our stack traces are correct

We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.

* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code

* Add doc comment with stack walking implementation notes

* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods

* Add extensive comments for stack walking function

* Factor architecture-specific bits of stack walking out into modules

* Initialize store-related fields in a vmctx to null when there is no store yet

Rather than leaving them as uninitialized data.

* Use `set_callee` instead of manually setting the vmctx field

* Use a more informative compile error message for unsupported architectures

* Document unsafety of `prepare_host_to_wasm_trampoline`

* Use `bti c` instead of `hint #34` in inline aarch64 assembly

* Remove outdated TODO comment

* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`

This is no longer needed as the value is plumbed through to the backtrace code
directly now.

* Only set the stack limit once, in the face of re-entrancy into Wasm

* Add comments for s390x-specific stack walking bits

* Use the helper macro for all libcalls

If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.

* Use the `asm_sym!` macro in Wasm-to-libcall trampolines

This macro handles the macOS-specific underscore prefix stuff for us.

* wasmtime: add size and align to `externref` assertion error message

* Extend the `stacks` fuzzer to have host frames in between Wasm frames

This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.

* Add documentation for aarch64-specific backtrace helpers

* Clarify that we only support little-endian aarch64 in trampoline comment

* Use `.machine z13` in s390x assembly file

Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.

* Fix aarch64 build

* Fix macOS build

* Document the `asm_sym!` macro

* Add windows support to the `wasmtime-asm-macros` crate

* Add windows support to host<--->Wasm trampolines

* Fix trap handler build on windows

* Run `rustfmt` on s390x trampoline source file

* Temporarily disable some assertions about a trap's backtrace in the component model tests

Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535

* Refactor libcall definitions with less macros

This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.

* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`

* Move `backtrace` module to be submodule of `traphandlers`

This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.

* Fix macOS aarch64 build

* Use "i64" instead of "word" in aarch64-specific file

* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions

Also clean up assertions surrounding our saved entry/exit registers.

* Put "typed" vs "untyped" in the same position of call benchmark names

Regardless if we are doing wasm-to-host or host-to-wasm

* Fix stacks test case generator build for new `wasm-encoder`

* Fix build for s390x

* Expand libcalls in s390x asm

* Disable more parts of component tests now that backtrace assertions are a bit tighter

* Remove assertion that can maybe fail on s390x

Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-07-28 15:46:14 -07:00
Damian Heaton
5e3bb588a8 Port Fence, IsNull/IsInvalid & Debugtrap to ISLE (AArch64) (#4548)
Ported the existing implementation of the following Opcodes for AArch64
to ISLE:
- `Fence`
- `IsNull`
- `IsInvalid`
- `Debugtrap`

Copyright (c) 2022 Arm Limited
2022-07-28 15:36:13 -07:00
Trevor Elliott
29d4edc76b x64: Migrate call and call_indirect to ISLE (#4542)
https://github.com/bytecodealliance/wasmtime/pull/4542
2022-07-28 13:10:03 -07:00
Alex Crichton
32979b2714 Implement flags in fused adapters (#4549)
This implements the `flags` type for fused adapters and converting
between modules. The main logic here is handling the variable size of
flags in addition to the masking which happens to ignore unrelated bits
when the values pass through the canonical ABI.
2022-07-28 14:56:32 -05:00
Pure White
fb7d51033c doc: fix c set strategy doc (#4550)
In #4252 I changed the signature and doc of the rust code, but forgot to change the doc for c.
Set strategy no longer returns error.
This commit fixes that.
2022-07-28 18:01:03 +00:00
Alex Crichton
ce7bbef24d Implement other variant-like types in adapter fusion (#4547)
This commit fills out the adapter fusion compiler for the `union`,
`enum`, `option,` and `result` types. The preexisting support for
`variant` types was refactored slightly to be extensible to all of these
other types and they all now call into the same common translation code.
2022-07-28 11:47:15 -05:00
Alex Crichton
e1148e43be Implement char type in adapter fusion (#4544)
This commit implements the translation of `char` which validates that
it's in the valid range of unicode scalar values. The precise validation
here is lifted from LLVM in the hopes that it's probably better than
whatever I would concoct by hand.
2022-07-28 11:47:01 -05:00
Andrew Brown
8137432e67 x64: only enable VTune dependencies on x86_64 targets (#4533)
As described in #4523, the `ittapi` dependency necessary for Wasmtime's
VTune support does not compile on `aarch64-linux-android`. There are
several incompatible parts here: though `ittapi` supports all OS
combinations that Wasmtime does and builds on all CPU targets, VTune is
not primarily intended for aarch64 profiling and `linux-android` is not
a high priority platform for the library. We could conditionally depend
on `ittapi` for Wasmtime's supported OS combinations, but I think a
better answer is to limit it to x86_64 targets, since this more clearly
shows why the `ittapi`/VTune support is present and also fixes #4523.
2022-07-28 09:22:27 -05:00
Jamey Sharp
ad050e6fb2 cranelift-wasm: Only allocate if vectors need bitcasts (#4543)
For wasm programs using SIMD vector types, the type known at function
entry or exit may not match the type used within the body of the
function, so we have to bitcast them. We have to check all calls and
returns for this condition, which involves comparing a subset of a
function's signature with the CLIF types we're trying to use. Currently,
this check heap-allocates a short-lived Vec for the appropriate subset
of the signature.

But most of the time none of the values need a bitcast. So this patch
avoids allocating unless at least one bitcast is actually required, and
only saves the pointers to values that need fixing up. I leaned heavily
on iterators to keep space usage constant until we discover allocation
is necessary after all.

I don't think it's possible to eliminate the allocation entirely,
because the function signature we're examining is borrowed from the
FuncBuilder, but we need to mutably borrow the FuncBuilder to insert the
bitcast instructions. Fortunately, the FromIterator implementation for
Vec doesn't reserve any heap space if the iterator is empty, so we can
unconditionally collect into a Vec and still avoid unnecessary
allocations.

Since the relationship between a function signature and a list of CLIF
values is somewhat complicated, I refactored all the uses of
`bitcast_arguments` and `wasm_param_types`. Instead there's
`bitcast_wasm_params` and `bitcast_wasm_returns` which share a helper
that combines the previous pair of functions into one.

According to DHAT, when compiling the Sightglass Spidermonkey benchmark,
this avoids 52k allocations averaging about 9 bytes each, which are
freed on average within 2k instructions.

Most Sightglass benchmarks, including Spidermonkey, show no performance
difference with this change. Only one has a slowdown, and it's small:

compilation :: nanoseconds :: benchmarks/shootout-matrix/benchmark.wasm

  Δ = 689373.34 ± 593720.78 (confidence = 99%)

  lazy-bitcast.so is 0.94x to 1.00x faster than main-e121c209f.so!
  main-e121c209f.so is 1.00x to 1.06x faster than lazy-bitcast.so!

  [19741713 21375844.46 32976047] lazy-bitcast.so
  [19345471 20686471.12 30872267] main-e121c209f.so

But several Sightglass benchmarks have notable speed-ups, with smaller
improvements for shootout-ed25519, meshoptimizer, and pulldown-cmark,
and larger ones as follows:

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 20071545.47 ± 2950014.92 (confidence = 99%)

  lazy-bitcast.so is 1.26x to 1.36x faster than main-e121c209f.so!
  main-e121c209f.so is 0.73x to 0.80x faster than lazy-bitcast.so!

  [55995164 64849257.68 89083031] lazy-bitcast.so
  [79382460 84920803.15 98016185] main-e121c209f.so

compilation :: nanoseconds :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 16620780.61 ± 5395162.13 (confidence = 99%)

  lazy-bitcast.so is 1.14x to 1.28x faster than main-e121c209f.so!
  main-e121c209f.so is 0.77x to 0.88x faster than lazy-bitcast.so!

  [54604352 79877776.35 103666598] lazy-bitcast.so
  [94011835 96498556.96 106200091] main-e121c209f.so

compilation :: nanoseconds :: benchmarks/intgemm-simd/benchmark.wasm

  Δ = 36891254.34 ± 12403663.50 (confidence = 99%)

  lazy-bitcast.so is 1.12x to 1.24x faster than main-e121c209f.so!
  main-e121c209f.so is 0.79x to 0.90x faster than lazy-bitcast.so!

  [131610845 201289587.88 247341883] lazy-bitcast.so
  [232065032 238180842.22 250957563] main-e121c209f.so
2022-07-27 16:41:30 -07:00
Alex Crichton
174b60dcf7 Add *.wast support for invoking components (#4526)
This commit builds on bytecodealliance/wasm-tools#690 to add support to
testing of the component model to execute functions when running
`*.wast` files. This support is all built on #4442 as functions are
invoked through a "dynamic" API. Right now the testing and integration
is fairly crude but I'm hoping that we can try to improve it over time
as necessary. For now this should provide a hopefully more convenient
syntax for unit tests and the like.
2022-07-27 21:02:16 +00:00
Afonso Bordado
0508932174 cranelift: Align Scalar and SIMD shift semantics (#4520)
* cranelift: Reorganize test suite

Group some SIMD operations by instruction.

* cranelift: Deduplicate some shift tests

Also, new tests with the mod behaviour

* aarch64: Lower shifts with mod behaviour

* x64: Lower shifts with mod behaviour

* wasmtime: Don't mask SIMD shifts
2022-07-27 17:54:00 +00:00
Afonso Bordado
e121c209fc cranelift: Fix urem/srem in interpreter (#4532) 2022-07-27 10:47:08 -07:00
Trevor Elliott
7ac6134894 x64: Shrink Inst from 72 to 48 bytes (#4514)
https://github.com/bytecodealliance/wasmtime/pull/4514
2022-07-27 10:39:22 -07:00
Alex Crichton
285bc5ce24 Implement variant translation in fused adapters (#4534)
* Implement variant translation in fused adapters

This commit implements the most general case of variants for fused
adapter trampolines. Additionally a number of other primitive types are
filled out here to assist with testing variants. The implementation
internally was relatively straightforward given the shape of variants,
but there's room for future optimization as necessary especially around
converting locals to various types.

This commit also introduces a "one off" fuzzer for adapters to ensure
that the generated adapter is valid. I hope to extend this fuzz
generator as more types are implemented to assist in various corner
cases that might arise. For now the fuzzer simply tests that the output
wasm module is valid, not that it actually executes correctly. I hope to
integrate with a fuzzer along the lines of #4307 one day to test the
run-time-correctness of the generated adapters as well, at which point
this fuzzer would become obsolete.

Finally this commit also fixes an issue with `u8` translation where
upper bits weren't zero'd out and were passed raw across modules.
Instead smaller-than-32 types now all mask out their upper bits and do
sign-extension as appropriate for unsigned/signed variants.

* Fuzz memory64 in the new trampoline fuzzer

Currently memory64 isn't supported elsewhere in the component model
implementation of Wasmtime but the trampoline compiler seems as good a
place as any to ensure that it at least works in isolation. This plumbs
through fuzz input into a `memory64` boolean which gets fed into
compilation. Some miscellaneous bugs were fixed as a result to ensure
that memory64 trampolines all validate correctly.

* Tweak manifest for doc build
2022-07-27 09:14:43 -05:00
Jamey Sharp
799e8919fe Don't allocate in DataFlowGraph::block_param_types (#4538)
DHAT reports that when compiling the Spidermonkey Sightglass benchmark,
there are over 100k of these Vec allocations, averaging less than 4
bytes, and with an average lifetime of only about 500 instructions.

This function is only called from one place, which immediately converts
it into an iterator. So this commit just returns the iterator that was
previously being collected into a Vec. The iterator has to borrow from
the DataFlowGraph, so this would change borrow-check results, but in the
one caller that turns out to be okay.

(That sole caller is in cranelift/codegen/src/machinst/lower.rs, in
Lower::lower().)

According to Sightglass, this is a compile-time improvement of between
2% and 12% on the Spidermonkey benchmark:

instantiation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 14628.76 ± 10318.59 (confidence = 99%)

  main-0e6ffd024.so is 0.87x to 0.98x faster than no-small-vecs.so!
  no-small-vecs.so is 1.02x to 1.14x faster than main-0e6ffd024.so!

  [142023 187464.24 301522] main-0e6ffd024.so
  [103742 172835.48 263917] no-small-vecs.so

compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 362392705.93 ± 267070467.06 (confidence = 99%)

  main-0e6ffd024.so is 0.89x to 0.98x faster than no-small-vecs.so!
  no-small-vecs.so is 1.02x to 1.12x faster than main-0e6ffd024.so!

  [3655734131 5522594697.83 6471126699] main-0e6ffd024.so
  [3278129811 5160201991.90 5810600015] no-small-vecs.so
2022-07-27 01:59:18 +00:00
Nick Fitzgerald
50b9195882 cranelift-frontend: Reuse visited block sets in SSABuilder::can_optimize_var_lookup (#4536)
First, we switch from a `BTreeSet` to a `HashSet` because clearing a `BTreeSet`
will deallocate the btree's nodes but clearing a `HashSet` will not deallocate
the backing hash table, saving the space to reuse for future insertions.

Then, we reuse the same set (and therefore the same allocation) across every
call to `can_optimize_var_lookup`.

This results in a 1.22x to 1.32x speed up on various Sightglass benchmarks:

```
compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 39478181.76 ± 3441880.32 (confidence = 99%)

  main.so is 0.75x to 0.79x faster than reuse-set.so!
  reuse-set.so is 1.27x to 1.32x faster than main.so!

  [160128343 172174751.09 213325968] main.so
  [115055695 132696569.33 200782128] reuse-set.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 22576954.88 ± 1830771.68 (confidence = 99%)

  main.so is 0.77x to 0.81x faster than reuse-set.so!
  reuse-set.so is 1.25x to 1.29x faster than main.so!

  [100449245 106820149.65 118628066] main.so
  [77039172 84243194.77 128168647] reuse-set.so

compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 664533554.97 ± 22109170.05 (confidence = 99%)

  main.so is 0.81x to 0.82x faster than reuse-set.so!
  reuse-set.so is 1.22x to 1.23x faster than main.so!

  [3549762523 3640587103.35 3798662501] main.so
  [2793335181 2976053548.38 3192950484] reuse-set.so
```
2022-07-26 18:38:24 -07:00
Dan Gohman
0e6ffd0243 Don't try to report file size or timestamps for stdio streams. (#4531)
* Don't try to report file size or timestamps for stdio streams.

Calling `File::metadata()` on a stdio stream handle fails on Windows, where
the stdio streams are not files.

This `File::metadata()` call was effectively only being used to add file size
and timestamps to the result of `filestat_get`. It's common for users to
redirect stdio streams to interesting places, and applications
generally shouldn't change their behavior depending on the size or
timestamps of the file, if the streams are redirected to a file, so just
leave these fields to 0, which is commonly understood to represent
"unknown".

Fixes #4497.
2022-07-26 15:53:17 -07:00
Anton Kirilov
ead6edb0c5 Cranelift AArch64: Migrate Splat to ISLE (#4521)
Copyright (c) 2022, Arm Limited.
2022-07-26 17:57:15 +00:00
Alex Crichton
1321c234e5 Remove dependency on more-asserts (#4408)
* Remove dependency on `more-asserts`

In my recent adventures to do a bit of gardening on our dependencies I
noticed that there's a new major version for the `more-asserts` crate.
Instead of updating to this though I've opted to instead remove the
dependency since I don't think we heavily lean on this crate and
otherwise one-off prints are probably sufficient to avoid the need for
pulling in a whole crate for this.

* Remove exemption for `more-asserts`
2022-07-26 16:47:33 +00:00
Afonso Bordado
1183191d7d fuzzgen: Add i128 support (#4529) 2022-07-26 09:40:12 -07:00
Anton Kirilov
d041c4b376 Cranelift AArch64: Further integral constant fixes (#4530)
Copyright (c) 2022, Arm Limited.
2022-07-26 09:35:06 -07:00
Alex Crichton
1935428af7 Fill out some initial audit metadata (#4527)
This fills out a few items which come from the wasm-tools repository as
well as a few crates that I'm personally the author of.
2022-07-26 09:43:15 -05:00
Nick Fitzgerald
81bff71d11 wasmtime: let serialization mismatch tests pass with RUST_BACKTRACE=1 set (#4525) 2022-07-25 16:38:04 -07:00
Alex Crichton
97894bc65e Add initial support for fused adapter trampolines (#4501)
* Add initial support for fused adapter trampolines

This commit lands a significant new piece of functionality to Wasmtime's
implementation of the component model in the form of the implementation
of fused adapter trampolines. Internally within a component core wasm
modules can communicate with each other by having their exports
`canon lift`'d to get `canon lower`'d into a different component. This
signifies that two components are communicating through a statically
known interface via the canonical ABI at this time. Previously Wasmtime
was able to identify that this communication was happening but it simply
panicked with `unimplemented!` upon seeing it. This commit is the
beginning of filling out this panic location with an actual
implementation.

The implementation route chosen here for fused adapters is to use a
WebAssembly module itself for the implementation. This means that, at
compile time of a component, Wasmtime is generating core WebAssembly
modules which then get recursively compiled within Wasmtime as well. The
choice to use WebAssembly itself as the implementation of fused adapters
stems from a few motivations:

* This does not represent a significant increase in the "trusted
  compiler base" of Wasmtime. Getting the Wasm -> CLIF translation
  correct once is hard enough much less for an entirely different IR to
  CLIF. By generating WebAssembly no new interactions with Cranelift are
  added which drastically reduces the possibilities for mistakes.

* Using WebAssembly means that component adapters are insulated from
  miscompilations and mistakes. If something goes wrong it's defined
  well within the WebAssembly specification how it goes wrong and what
  happens as a result. This means that the "blast zone" for a wrong
  adapter is the component instance but not the entire host itself.
  Accesses to linear memory are guaranteed to be in-bounds and otherwise
  handled via well-defined traps.

* A fully-finished fused adapter compiler is expected to be a
  significant and quite complex component of Wasmtime. Functionality
  along these lines is expected to be needed for Web-based polyfills of
  the component model and by using core WebAssembly it provides the
  opportunity to share code between Wasmtime and these polyfills for the
  component model.

* Finally the runtime implementation of managing WebAssembly modules is
  already implemented and quite easy to integrate with, so representing
  fused adapters with WebAssembly results in very little extra support
  necessary for the runtime implementation of instantiating and managing
  a component.

The compiler added in this commit is dubbed Wasmtime's Fused Adapter
Compiler of Trampolines (FACT) because who doesn't like deriving a name
from an acronym. Currently the trampoline compiler is limited in its
support for interface types and only supports a few primitives. I plan
on filing future PRs to flesh out the support here for all the variants
of `InterfaceType`. For now this PR is primarily focused on all of the
other infrastructure for the addition of a trampoline compiler.

With the choice to use core WebAssembly to implement fused adapters it
means that adapters need to be inserted into a module. Unfortunately
adapters cannot all go into a single WebAssembly module because adapters
themselves have dependencies which may be provided transitively through
instances that were instantiated with other adapters. This means that a
significant chunk of this PR (`adapt.rs`) is dedicated to determining
precisely which adapters go into precisely which adapter modules. This
partitioning process attempts to make large modules wherever it can to
cut down on core wasm instantiations but is likely not optimal as
it's just a simple heuristic today.

With all of this added together it's now possible to start writing
`*.wast` tests that internally have adapted modules communicating with
one another. A `fused.wast` test suite was added as part of this PR
which is the beginning of tests for the support of the fused adapter
compiler added in this PR. Currently this is primarily testing some
various topologies of adapters along with direct/indirect modes. This
will grow many more tests over time as more types are supported.

Overall I'm not 100% satisfied with the testing story of this PR. When a
test fails it's very difficult to debug since everything is written in
the text format of WebAssembly meaning there's no "conveniences" to
print out the state of the world when things go wrong and easily debug.
I think this will become even more apparent as more tests are written
for more types in subsequent PRs. At this time though I know of no
better alternative other than leaning pretty heavily on fuzz-testing to
ensure this is all exercised.

* Fix an unused field warning

* Fix tests in `wasmtime-runtime`

* Add some more tests for compiled trampolines

* Remap exports when injecting adapters

The exports of a component were accidentally left unmapped which meant
that they indexed the instance indexes pre-adapter module insertion.

* Fix typo

* Rebase conflicts
2022-07-25 23:13:26 +00:00
Afonso Bordado
78d3e0b693 fuzzgen: Fuzz Switch API (#4502)
* fuzzgen: Use Switch interface

Turns out this is an interface that the frontend provides.
We should fuzz it.

* cranelift: Restrict index range in Switch emission on fuzzgen
2022-07-25 15:26:29 -07:00
Afonso Bordado
02c3b47db2 x64: Implement SIMD fma (#4474)
* x64: Add VEX Instruction Encoder

This uses a similar builder pattern to the EVEX Encoder.
Does not yet support memory accesses.

* x64: Add FMA Flag

* x64: Implement SIMD `fma`

* x64: Use 4 register Vex Inst

* x64: Reorder VEX pretty print args
2022-07-25 22:01:02 +00:00
Alex Crichton
4aaf7ff8d9 Bump wat/wast crates (#4524)
* Bump wat/wast crates

Pull in upstream updates, nothing major, just keeping up-to-date.

* Record audit log for new crates
2022-07-25 21:29:20 +00:00
Trevor Elliott
9e9e043174 x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518)
https://github.com/bytecodealliance/wasmtime/pull/4518
2022-07-25 21:28:52 +00:00
Bobby Holley
89f9de7cc3 Enable cargo-vet (#4444)
* Initialize cargo-vet on wasmtime.

* Add cargo-vet to CI.

* Add README.
2022-07-25 20:21:14 +00:00
Damian Heaton
3ef89b7787 Allow 64-bit vectors and implement for interpreter (#4509)
* Allow 64-bit vectors and implement for interpreter

The AArch64 backend already supports 64-bit vectors; this simply allows
instructions to make use of that.

Implemented support for 64-bit vectors within the interpreter to allow
interpret runtests to use them.

Copyright (c) 2022 Arm Limited

* Disable 64-bit SIMD `iaddpairwise` tests on s390x

Copyright (c) 2022 Arm Limited
2022-07-25 13:00:43 -07:00
Sam Parker
c5ddb4b803 [AArch64] Port SIMD narrowing to ISLE (#4478)
* [AArch64] Port SIMD narrowing to ISLE

Fvdemote, snarrow, unarrow and uunarrow.

Also refactor the aarch64 instructions descriptions to parameterize
on ScalarSize instead of using different opcodes.

The zero_value pure constructor has been introduced and used by the
integer narrow operations and it replaces, and extends, the compare
zero patterns.

Copright (c) 2022, Arm Limited.

* use short 'if' patterns
2022-07-25 12:40:36 -07:00
Ulrich Weigand
dd40bf075a s390x: Enable more runtests, and fix a few bugs (#4516)
This enables more runtests to be executed on s390x.  Doing so
uncovered a two back-end bugs, which are fixed as well:

- The result of cls was always off by one.
- The result of popcnt.i16 has uninitialized high bits.

In addition, I found a bug in the load-op-store.clif test case:
     v3 = heap_addr.i64 heap0, v1, 4
     v4 = iconst.i64 42
     store.i32 v4, v3
This was clearly intended to perform a 32-bit store, but
actually performs a 64-bit store (it seems the type annotation
of the store opcode is ignored, and the type of the operand
is used instead).  That bug did not show any noticable symptoms
on little-endian architectures, but broke on big-endian.
2022-07-25 12:37:06 -07:00
Joel Dice
7c67e620c4 support dynamic function calls in component model (#4442)
* support dynamic function calls in component model

This addresses #4310, introducing a new `component::values::Val` type for
representing component values dynamically, as well as `component::types::Type`
for representing the corresponding interface types. It also adds a `call` method
to `component::func::Func`, which takes a slice of `Val`s as parameters and
returns a `Result<Val>` representing the result.

Note that I've moved `post_return` and `call_raw` from `TypedFunc` to `Func`
since there was nothing specific to `TypedFunc` about them, and I wanted to
reuse them.  The code in both is unchanged beyond the trivial tweaks to make
them fit in their new home.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* order variants and match cases more consistently

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* implement lift for String, Box<str>, etc.

This also removes the redundant `store` parameter from `Type::load`.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* implement code review feedback

This fixes a few issues:

- Bad offset calculation when lowering
- Missing variant padding
- Style issues regarding `types::Handle`
- Missed opportunities to reuse `Lift` and `Lower` impls

It also adds forwarding `Lift` impls for `Box<[T]>`, `Vec<T>`, etc.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* move `new_*` methods to specific `types` structs

Per review feedback, I've moved `Type::new_record` to `Record::new_val` and
added a `Type::unwrap_record` method; likewise for the other kinds of types.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* make tuple, option, and expected type comparisons recursive

These types should compare as equal across component boundaries as long as their
type parameters are equal.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* improve error diagnostic in `Type::check`

We now distinguish between more failure cases to provide an informative error
message.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* address review feedback

- Remove `WasmStr::to_str_from_memory` and `WasmList::get_from_memory`
- add `try_new` methods to various `values` types
- avoid using `ExactSizeIterator::len` where we can't trust it
- fix over-constrained bounds on forwarded `ComponentType` impls

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* rearrange code per review feedback

- Move functions from `types` to `values` module so we can make certain struct fields private
- Rename `try_new` to just `new`

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

* remove special-case equality test for tuples, options, and expecteds

Instead, I've added a FIXME comment and will open an issue to do recursive
structural equality testing.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
2022-07-25 13:38:48 -05:00
Trevor Elliott
ee7e4f4c6b x64: Port func_addr and symbol_value to ISLE (#4485)
https://github.com/bytecodealliance/wasmtime/pull/4485
2022-07-25 11:11:16 -07:00
Afonso Bordado
446efd3e11 cranelift: Fix icmp_imm for small types in interpreter (#4506) 2022-07-23 00:26:56 +00:00
Afonso Bordado
af62037f62 cranelift: Restrict br_table to i32 indices (#4510)
* cranelift: Restrict `br_table` to `i32` indices

In #4498 it was proposed that we should only accept `i32` indices
to `br_table`. The rationale for this is that larger types lead the
users to a false sense of flexibility (since we don't support jump
tables larger than u32's), and narrower types are not well tested
paths that would be safer if we removed them.

* cranelift: Reduce directly from i128 to i32 in Switch
2022-07-22 23:32:40 +00:00
Damian Heaton
f1a0c40a53 Convert sqrt..nearest to ISLE (AArch64) (#4508)
Converted the existing implementations for the following opcodes to ISLE
on AArch64:
- `sqrt`
- `fneg`
- `fabs`
- `fpromote`
- `fdemote`
- `ceil`
- `floor`
- `trunc`
- `nearest`

Copyright (c) 2022 Arm Limited
2022-07-22 14:48:07 -07:00
Afonso Bordado
4720d09651 cranelift: Fix Switch bug when emitting indexes larger than val type (#4507)
In #4502 we discovered a bug in the switch api where it would emit
`icmp_imm`'s with types that were not able to fully represent the
destination index.

We now reject these inputs. The index val must always have a
type that is capable of addressing the entire range of inputs.
2022-07-22 10:55:36 -07:00
Afonso Bordado
d89c262657 cranelift: Implement {u,s}extend.i128 in interpreter (#4505) 2022-07-22 10:47:10 -07:00
TheGreatRambler
2ba3025e67 Add cmake compatibility to c-api (#4369)
* Add cmake compatibility to c-api

* Add CMake documentation to wasmtime.h

* Add CMake instructions in examples

* Modify CI for CMake support

* Use correct rust in CI

* Trigger build

* Refactor run-examples

* Reintroduce example_to_run in run-examples

* Replace run-examples crate with cmake

* Fix markdown formatting in examples readme

* Fix cmake test quotes

* Build rust wasm before cmake tests

* Pass CTEST_OUTPUT_ON_FAILURE

* Another cmake test

* Handle os differences in cmake test

* Fix bugs in memory and multimemory examples
2022-07-22 10:22:36 -07:00
Nick Fitzgerald
35b750ab9a Implement std::fmt::Pointer for ExternRef (#4504) 2022-07-21 16:16:23 -07:00
Nick Fitzgerald
b24c561ceb cranelift: Don't log CLIF and assembly at debug level (#4503)
Too verbose. Only log them at trace level.
2022-07-21 15:31:05 -07:00
Afonso Bordado
ca36ce57c2 fuzzgen: Generate const's (#4500)
We already have const support, but currently only use it when building
the pool of variables but never in the main instruction selection.
2022-07-21 19:01:44 +00:00
Jamey Sharp
7c3cc90c89 wiggle's atoms_async test requires wasmtime_async (#4499) 2022-07-21 19:01:00 +00:00
Jamey Sharp
02477988dd table_ops: allow 0-sized tables, locals, globals (#4495)
I noticed that `TableOp::insert` had assertions that `num_params` and
`table_size` were greater than 0, but no assert for `num_globals`. These
asserts couldn't be hit because the `*_RANGE` constants were all set to
a minimum of 1.

But the only reason I can see to prohibit 0-sized tables, locals, or
globals, was because indexes into those spaces were generated with the
`%` operator. Allowing 0-sized spaces requires not generating the
corresponding instructions at all when there are no valid indexes.

So I pushed the final selection of which table/local/global to access
earlier, to the moment when we're picking which TableOps to run. Then,
instead of generating a random u8 or u32 and taking the remainder to get
it into the right range, I can just ask `arbitrary` to generate a number
in the right range to begin with.

So this now explores some size-0 corners that it didn't before, and it
doesn't require reasoning about whether remainder can divide by zero.
Also I think it uses fewer bits of the `Unstructured` input to produce
the same cases, and I hope that lets libFuzzer more quickly find bits it
can mutate to get to novel coverage paths.
2022-07-21 18:25:23 +00:00
Ulrich Weigand
fd639dd044 s390x: Support preserve_frame_pointers flag (#4477)
On s390x, we do not have a frame pointer that can be used to chain
stack frames for easy unwinding.  Instead, our ABI defines a stack
"backchain" mechanism that can be used to the same effect.

This PR uses that backchain mechanism to implement the new
preserve_frame_pointers flags introduced here:
https://github.com/bytecodealliance/wasmtime/pull/4469
2022-07-21 10:09:16 -07:00
Benjamin Bouvier
4ce329d1eb Add a cranelift flag to enable/disable verbose logs for regalloc2 (#4481) 2022-07-21 09:12:13 +00:00