* Always preserve frame pointers in Wasmtime
This allows us to efficiently and simply capture Wasm stacks without maintaining
and synchronizing any safety-critical side tables between the compiler and the
runtime.
* wasmtime: Implement fast Wasm stack walking
Why do we want Wasm stack walking to be fast? Because we capture stacks whenever
there is a trap and traps actually happen fairly frequently with short-lived
programs and WASI's `exit`.
Previously, we would rely on generating the system unwind info (e.g.
`.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk
the full stack and filter out any non-Wasm stack frames. This can,
unfortunately, be slow for two primary reasons:
1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than
`O(wasm-frames)` work.
2. System unwind info and the system unwinder need to be much more general than
a purpose-built stack walker for Wasm needs to be. It has to handle any kind of
stack frame that any compiler might emit where as our Wasm frames are emitted by
Cranelift and always have frame pointers. This translates into implementation
complexity and general overhead. There can also be unnecessary-for-our-use-cases
global synchronization and locks involved, further slowing down stack walking in
the presence of multiple threads trying to capture stacks in parallel.
This commit introduces a purpose-built stack walker for traversing just our Wasm
frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore
non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit
frame pointer)` pairs. This linked list is maintained via Wasm-to-host and
host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use
frame pointers (which Cranelift preserves) to find the next older Wasm frame on
the stack, and we keep doing this until we reach the entry stack pointer,
meaning that the next older frame will be a host frame.
The trampolines need to avoid a couple stumbling blocks. First, they need to be
compiled ahead of time, since we may not have access to a compiler at
runtime (e.g. if the `cranelift` feature is disabled) but still want to be able
to call functions that have already been compiled and get stack traces for those
functions. Usually this means we would compile the appropriate trampolines
inside `Module::new` and the compiled module object would hold the
trampolines. However, we *also* need to support calling host functions that are
wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time
compiled module object to hold the appropriate trampolines:
```rust
// Define a host function.
let func_type = wasmtime::FuncType::new(
vec![wasmtime::ValType::I32],
vec![wasmtime::ValType::I32],
);
let func = Func::new(&mut store, func_type, |_, params, results| {
// ...
Ok(())
});
// Call that host function.
let mut results = vec![wasmtime::Val::I32(0)];
func.call(&[wasmtime::Val::I32(0)], &mut results)?;
```
Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline
in assembly that work for all Wasm and host function signatures. These
trampolines are careful to only use volatile registers, avoid touching any
register that is an argument in the calling convention ABI, and tail call to the
target callee function. This allows forwarding any set of arguments and any
returns to and from the callee, while also allowing us to maintain our linked
list of Wasm stack and frame pointers before transferring control to the
callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing
the host-Wasm boundary, so they do not impose overhead on regular calls. (And if
using one trampoline for all host-Wasm boundary crossing ever breaks branch
prediction enough in the CPU to become any kind of bottleneck, we can do fun
things like have multiple copies of the same trampoline and choose a random copy
for each function, sharding the functions across branch predictor entries.)
Finally, this commit also ends the use of a synthetic `Module` and allocating a
stubbed out `VMContext` for host functions. Instead, we define a
`VMHostFuncContext` with its own magic value, similar to `VMComponentContext`,
specifically for host functions.
<h2>Benchmarks</h2>
<h3>Traps and Stack Traces</h3>
Large improvements to taking stack traces on traps, ranging from shaving off 64%
to 99.95% of the time it used to take.
<details>
```
multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us]
thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s]
change:
time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05)
thrpt: [+560.90% +573.56% +585.84%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us]
thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s]
change:
time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05)
thrpt: [+1023.1% +1048.6% +1070.3%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us]
thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s]
change:
time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05)
thrpt: [+1503.1% +1542.0% +1578.0%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us]
thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s]
change:
time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05)
thrpt: [+1339.2% +1353.6% +1369.1%]
Performance has improved.
multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us]
thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s]
change:
time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05)
thrpt: [+1624.7% +1709.2% +1806.1%]
Performance has improved.
multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us]
thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s]
change:
time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05)
thrpt: [+2821.5% +3048.1% +3281.1%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
many-modules-registered-traps/1
time: [2.6278 us 2.6361 us 2.6447 us]
thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s]
change:
time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05)
thrpt: [+562.65% +571.51% +580.76%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
many-modules-registered-traps/8
time: [2.6294 us 2.6460 us 2.6623 us]
thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s]
change:
time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05)
thrpt: [+567.63% +588.95% +608.95%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
many-modules-registered-traps/64
time: [2.6218 us 2.6329 us 2.6452 us]
thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s]
change:
time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05)
thrpt: [+1431.4% +1450.6% +1469.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
many-modules-registered-traps/512
time: [2.6569 us 2.6737 us 2.6923 us]
thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s]
change:
time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05)
thrpt: [+13417% +13566% +13731%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
many-modules-registered-traps/4096
time: [2.7258 us 2.7390 us 2.7535 us]
thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s]
change:
time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
thrpt: [+221417% +223380% +224881%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
many-stack-frames-traps/1
time: [1.4658 us 1.4719 us 1.4784 us]
thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s]
change:
time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05)
thrpt: [+860.23% +894.72% +938.21%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
many-stack-frames-traps/8
time: [2.4772 us 2.4870 us 2.4973 us]
thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s]
change:
time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05)
thrpt: [+575.65% +583.51% +592.03%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
many-stack-frames-traps/64
time: [10.109 us 10.171 us 10.236 us]
thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s]
change:
time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05)
thrpt: [+341.22% +350.38% +357.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
many-stack-frames-traps/512
time: [126.16 us 126.54 us 126.96 us]
thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s]
change:
time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05)
thrpt: [+181.32% +185.17% +188.71%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
```
</details>
<h3>Calls</h3>
There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call
performance due the new trampolines. It seems to be on the order of about 2-10
nanoseconds per call, depending on the benchmark.
I believe this regression is ultimately acceptable because
1. this overhead will be vastly dominated by whatever work a non-nop callee
actually does,
2. we will need these trampolines, or something like them, when implementing the
Wasm exceptions proposal to do things like translate Wasm's exceptions into
Rust's `Result`s,
3. and because the performance improvements to trapping and capturing stack
traces are of such a larger magnitude than this call regressions.
<details>
```
sync/no-hook/host-to-wasm - typed - nop
time: [28.683 ns 28.757 ns 28.844 ns]
change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop
time: [42.515 ns 42.652 ns 42.841 ns]
change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop
time: [33.936 ns 34.052 ns 34.179 ns]
change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
sync/no-hook/host-to-wasm - typed - nop-params-and-results
time: [34.290 ns 34.388 ns 34.502 ns]
change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
sync/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [62.546 ns 62.721 ns 62.919 ns]
change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sync/no-hook/host-to-wasm - unchecked - nop-params-and-results
time: [42.609 ns 42.710 ns 42.831 ns]
change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop
time: [29.546 ns 29.675 ns 29.818 ns]
change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop
time: [45.448 ns 45.699 ns 45.961 ns]
change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop
time: [34.334 ns 34.437 ns 34.558 ns]
change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
sync/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [36.594 ns 36.763 ns 36.974 ns]
change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
sync/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [63.541 ns 63.831 ns 64.194 ns]
change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results
time: [43.968 ns 44.169 ns 44.437 ns]
change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
async/no-hook/host-to-wasm - typed - nop
time: [4.9612 us 4.9743 us 4.9889 us]
change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async/no-hook/host-to-wasm - untyped - nop
time: [5.0030 us 5.0211 us 5.0439 us]
change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
async/no-hook/host-to-wasm - typed - nop-params-and-results
time: [4.9273 us 4.9468 us 4.9700 us]
change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [5.1151 us 5.1338 us 5.1555 us]
change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/hook-sync/host-to-wasm - typed - nop
time: [4.9330 us 4.9394 us 4.9467 us]
change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop
time: [5.0073 us 5.0183 us 5.0310 us]
change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
async/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [4.9610 us 4.9839 us 5.0097 us]
change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [5.0995 us 5.1272 us 5.1617 us]
change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop
time: [2.4242 us 2.4316 us 2.4396 us]
change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop
time: [2.5102 us 2.5155 us 2.5210 us]
change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
async-pool/no-hook/host-to-wasm - typed - nop-params-and-results
time: [2.4203 us 2.4310 us 2.4440 us]
change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results
time: [2.5501 us 2.5593 us 2.5700 us]
change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop
time: [2.4135 us 2.4190 us 2.4254 us]
change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop
time: [2.5172 us 2.5248 us 2.5357 us]
change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results
time: [2.4214 us 2.4353 us 2.4532 us]
change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results
time: [2.5499 us 2.5607 us 2.5748 us]
change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
sync/no-hook/wasm-to-host - nop - typed
time: [6.6135 ns 6.6288 ns 6.6452 ns]
change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.930 ns 15.993 ns 16.067 ns]
change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
sync/no-hook/wasm-to-host - nop - untyped
time: [20.596 ns 20.640 ns 20.690 ns]
change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [42.659 ns 42.882 ns 43.159 ns]
change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
sync/no-hook/wasm-to-host - nop - unchecked
time: [10.671 ns 10.691 ns 10.713 ns]
change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
sync/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.136 ns 11.190 ns 11.263 ns]
change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop - typed
time: [6.7964 ns 6.8087 ns 6.8226 ns]
change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
4 (4.00%) high mild
10 (10.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.865 ns 15.921 ns 15.985 ns]
change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
sync/hook-sync/wasm-to-host - nop - untyped
time: [21.505 ns 21.587 ns 21.677 ns]
change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [44.018 ns 44.128 ns 44.261 ns]
change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
sync/hook-sync/wasm-to-host - nop - unchecked
time: [11.264 ns 11.326 ns 11.387 ns]
change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.816 ns 11.865 ns 11.920 ns]
change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop - typed
time: [6.6221 ns 6.6385 ns 6.6569 ns]
change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.884 ns 15.929 ns 15.983 ns]
change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
async/no-hook/wasm-to-host - nop - untyped
time: [20.615 ns 20.702 ns 20.821 ns]
change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.956 ns 42.207 ns 42.521 ns]
change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async/no-hook/wasm-to-host - nop - unchecked
time: [10.440 ns 10.474 ns 10.513 ns]
change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.476 ns 11.512 ns 11.554 ns]
change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
async/no-hook/wasm-to-host - nop - async-typed
time: [26.427 ns 26.478 ns 26.532 ns]
change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [28.557 ns 28.693 ns 28.880 ns]
change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe
async/hook-sync/wasm-to-host - nop - typed
time: [6.7488 ns 6.7630 ns 6.7784 ns]
change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.928 ns 16.031 ns 16.149 ns]
change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
async/hook-sync/wasm-to-host - nop - untyped
time: [21.930 ns 22.114 ns 22.296 ns]
change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.684 ns 42.858 ns 43.081 ns]
change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe
async/hook-sync/wasm-to-host - nop - unchecked
time: [11.026 ns 11.053 ns 11.086 ns]
change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.840 ns 11.900 ns 11.982 ns]
change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async/hook-sync/wasm-to-host - nop - async-typed
time: [27.601 ns 27.709 ns 27.882 ns]
change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
async/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [28.955 ns 29.174 ns 29.413 ns]
change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
async-pool/no-hook/wasm-to-host - nop - typed
time: [6.5626 ns 6.5733 ns 6.5851 ns]
change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - typed
time: [15.820 ns 15.886 ns 15.969 ns]
change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
async-pool/no-hook/wasm-to-host - nop - untyped
time: [20.481 ns 20.521 ns 20.566 ns]
change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped
time: [41.834 ns 41.998 ns 42.189 ns]
change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
async-pool/no-hook/wasm-to-host - nop - unchecked
time: [10.353 ns 10.380 ns 10.414 ns]
change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked
time: [11.123 ns 11.168 ns 11.228 ns]
change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
11 (11.00%) high mild
1 (1.00%) high severe
async-pool/no-hook/wasm-to-host - nop - async-typed
time: [27.442 ns 27.528 ns 27.638 ns]
change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed
time: [29.014 ns 29.148 ns 29.312 ns]
change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - typed
time: [6.7916 ns 6.8116 ns 6.8325 ns]
change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed
time: [15.917 ns 15.975 ns 16.051 ns]
change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - untyped
time: [21.558 ns 21.612 ns 21.679 ns]
change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped
time: [42.475 ns 42.614 ns 42.775 ns]
change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - unchecked
time: [11.150 ns 11.195 ns 11.247 ns]
change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked
time: [11.639 ns 11.695 ns 11.760 ns]
change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
async-pool/hook-sync/wasm-to-host - nop - async-typed
time: [27.480 ns 27.712 ns 27.984 ns]
change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed
time: [29.218 ns 29.380 ns 29.600 ns]
change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
```
</details>
* Add s390x support for frame pointer-based stack walking
* wasmtime: Allow `Caller::get_export` to get all exports
* fuzzing: Add a fuzz target to check that our stack traces are correct
We generate Wasm modules that keep track of their own stack as they call and
return between functions, and then we periodically check that if the host
captures a backtrace, it matches what the Wasm module has recorded.
* Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code
* Add doc comment with stack walking implementation notes
* Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods
* Add extensive comments for stack walking function
* Factor architecture-specific bits of stack walking out into modules
* Initialize store-related fields in a vmctx to null when there is no store yet
Rather than leaving them as uninitialized data.
* Use `set_callee` instead of manually setting the vmctx field
* Use a more informative compile error message for unsupported architectures
* Document unsafety of `prepare_host_to_wasm_trampoline`
* Use `bti c` instead of `hint #34` in inline aarch64 assembly
* Remove outdated TODO comment
* Remove setting of `last_wasm_exit_fp` in `set_jit_trap`
This is no longer needed as the value is plumbed through to the backtrace code
directly now.
* Only set the stack limit once, in the face of re-entrancy into Wasm
* Add comments for s390x-specific stack walking bits
* Use the helper macro for all libcalls
If we forget to use it, and then trigger a GC from the libcall, that means we
could miss stack frames when walking the stack, fail to find live GC refs, and
then get use after free bugs. Much less risky to always use the helper macro
that takes care of all of that for us.
* Use the `asm_sym!` macro in Wasm-to-libcall trampolines
This macro handles the macOS-specific underscore prefix stuff for us.
* wasmtime: add size and align to `externref` assertion error message
* Extend the `stacks` fuzzer to have host frames in between Wasm frames
This way we get one or more contiguous sequences of Wasm frames on the stack,
instead of exactly one.
* Add documentation for aarch64-specific backtrace helpers
* Clarify that we only support little-endian aarch64 in trampoline comment
* Use `.machine z13` in s390x assembly file
Since apparently our CI machines have pretty old assemblers that don't have
`.machine z14`. This should be fine though since these trampolines don't make
use of anything that is introduced in z14.
* Fix aarch64 build
* Fix macOS build
* Document the `asm_sym!` macro
* Add windows support to the `wasmtime-asm-macros` crate
* Add windows support to host<--->Wasm trampolines
* Fix trap handler build on windows
* Run `rustfmt` on s390x trampoline source file
* Temporarily disable some assertions about a trap's backtrace in the component model tests
Follow up to re-enable this and fix the associated issue:
https://github.com/bytecodealliance/wasmtime/issues/4535
* Refactor libcall definitions with less macros
This refactors the `libcall!` macro to use the
`foreach_builtin_function!` macro to define all of the trampolines.
Additionally the macro surrounding each libcall itself is no longer
necessary and helps avoid too many macros.
* Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new`
* Move `backtrace` module to be submodule of `traphandlers`
This avoids making some things `pub(crate)` in `traphandlers` that really
shouldn't be.
* Fix macOS aarch64 build
* Use "i64" instead of "word" in aarch64-specific file
* Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions
Also clean up assertions surrounding our saved entry/exit registers.
* Put "typed" vs "untyped" in the same position of call benchmark names
Regardless if we are doing wasm-to-host or host-to-wasm
* Fix stacks test case generator build for new `wasm-encoder`
* Fix build for s390x
* Expand libcalls in s390x asm
* Disable more parts of component tests now that backtrace assertions are a bit tighter
* Remove assertion that can maybe fail on s390x
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
535 lines
18 KiB
Rust
535 lines
18 KiB
Rust
//! WebAssembly trap handling, which is built on top of the lower-level
|
|
//! signalhandling mechanisms.
|
|
|
|
mod backtrace;
|
|
|
|
use crate::{VMContext, VMRuntimeLimits};
|
|
use anyhow::Error;
|
|
use std::any::Any;
|
|
use std::cell::{Cell, UnsafeCell};
|
|
use std::mem::{self, MaybeUninit};
|
|
use std::ptr;
|
|
use std::sync::Once;
|
|
use wasmtime_environ::TrapCode;
|
|
|
|
pub use self::backtrace::Backtrace;
|
|
pub use self::tls::{tls_eager_initialize, TlsRestore};
|
|
|
|
#[link(name = "wasmtime-helpers")]
|
|
extern "C" {
|
|
#[allow(improper_ctypes)]
|
|
fn wasmtime_setjmp(
|
|
jmp_buf: *mut *const u8,
|
|
callback: extern "C" fn(*mut u8, *mut VMContext),
|
|
payload: *mut u8,
|
|
callee: *mut VMContext,
|
|
) -> i32;
|
|
fn wasmtime_longjmp(jmp_buf: *const u8) -> !;
|
|
}
|
|
|
|
cfg_if::cfg_if! {
|
|
if #[cfg(all(target_os = "macos", not(feature = "posix-signals-on-macos")))] {
|
|
mod macos;
|
|
use macos as sys;
|
|
} else if #[cfg(unix)] {
|
|
mod unix;
|
|
use unix as sys;
|
|
} else if #[cfg(target_os = "windows")] {
|
|
mod windows;
|
|
use windows as sys;
|
|
}
|
|
}
|
|
|
|
pub use sys::SignalHandler;
|
|
|
|
/// Globally-set callback to determine whether a program counter is actually a
|
|
/// wasm trap.
|
|
///
|
|
/// This is initialized during `init_traps` below. The definition lives within
|
|
/// `wasmtime` currently.
|
|
static mut IS_WASM_PC: fn(usize) -> bool = |_| false;
|
|
|
|
/// This function is required to be called before any WebAssembly is entered.
|
|
/// This will configure global state such as signal handlers to prepare the
|
|
/// process to receive wasm traps.
|
|
///
|
|
/// This function must not only be called globally once before entering
|
|
/// WebAssembly but it must also be called once-per-thread that enters
|
|
/// WebAssembly. Currently in wasmtime's integration this function is called on
|
|
/// creation of a `Engine`.
|
|
///
|
|
/// The `is_wasm_pc` argument is used when a trap happens to determine if a
|
|
/// program counter is the pc of an actual wasm trap or not. This is then used
|
|
/// to disambiguate faults that happen due to wasm and faults that happen due to
|
|
/// bugs in Rust or elsewhere.
|
|
pub fn init_traps(is_wasm_pc: fn(usize) -> bool) {
|
|
static INIT: Once = Once::new();
|
|
INIT.call_once(|| unsafe {
|
|
IS_WASM_PC = is_wasm_pc;
|
|
sys::platform_init();
|
|
});
|
|
}
|
|
|
|
/// Raises a trap immediately.
|
|
///
|
|
/// This function performs as-if a wasm trap was just executed. This trap
|
|
/// payload is then returned from `catch_traps` below.
|
|
///
|
|
/// # Safety
|
|
///
|
|
/// Only safe to call when wasm code is on the stack, aka `catch_traps` must
|
|
/// have been previously called. Additionally no Rust destructors can be on the
|
|
/// stack. They will be skipped and not executed.
|
|
pub unsafe fn raise_trap(reason: TrapReason) -> ! {
|
|
tls::with(|info| info.unwrap().unwind_with(UnwindReason::Trap(reason)))
|
|
}
|
|
|
|
/// Raises a user-defined trap immediately.
|
|
///
|
|
/// This function performs as-if a wasm trap was just executed, only the trap
|
|
/// has a dynamic payload associated with it which is user-provided. This trap
|
|
/// payload is then returned from `catch_traps` below.
|
|
///
|
|
/// # Safety
|
|
///
|
|
/// Only safe to call when wasm code is on the stack, aka `catch_traps` must
|
|
/// have been previously called. Additionally no Rust destructors can be on the
|
|
/// stack. They will be skipped and not executed.
|
|
pub unsafe fn raise_user_trap(data: Error) -> ! {
|
|
raise_trap(TrapReason::User(data))
|
|
}
|
|
|
|
/// Raises a trap from inside library code immediately.
|
|
///
|
|
/// This function performs as-if a wasm trap was just executed. This trap
|
|
/// payload is then returned from `catch_traps` below.
|
|
///
|
|
/// # Safety
|
|
///
|
|
/// Only safe to call when wasm code is on the stack, aka `catch_traps` must
|
|
/// have been previously called. Additionally no Rust destructors can be on the
|
|
/// stack. They will be skipped and not executed.
|
|
pub unsafe fn raise_lib_trap(trap: TrapCode) -> ! {
|
|
raise_trap(TrapReason::Wasm(trap))
|
|
}
|
|
|
|
/// Carries a Rust panic across wasm code and resumes the panic on the other
|
|
/// side.
|
|
///
|
|
/// # Safety
|
|
///
|
|
/// Only safe to call when wasm code is on the stack, aka `catch_traps` must
|
|
/// have been previously called. Additionally no Rust destructors can be on the
|
|
/// stack. They will be skipped and not executed.
|
|
pub unsafe fn resume_panic(payload: Box<dyn Any + Send>) -> ! {
|
|
tls::with(|info| info.unwrap().unwind_with(UnwindReason::Panic(payload)))
|
|
}
|
|
|
|
/// Stores trace message with backtrace.
|
|
#[derive(Debug)]
|
|
pub struct Trap {
|
|
/// Original reason from where this trap originated.
|
|
pub reason: TrapReason,
|
|
/// Wasm backtrace of the trap, if any.
|
|
pub backtrace: Option<Backtrace>,
|
|
}
|
|
|
|
/// Enumeration of different methods of raising a trap.
|
|
#[derive(Debug)]
|
|
pub enum TrapReason {
|
|
/// A user-raised trap through `raise_user_trap`.
|
|
User(Error),
|
|
|
|
/// A trap raised from Cranelift-generated code with the pc listed of where
|
|
/// the trap came from.
|
|
Jit(usize),
|
|
|
|
/// A trap raised from a wasm libcall
|
|
Wasm(TrapCode),
|
|
}
|
|
|
|
impl TrapReason {
|
|
/// Is this a JIT trap?
|
|
pub fn is_jit(&self) -> bool {
|
|
matches!(self, TrapReason::Jit(_))
|
|
}
|
|
}
|
|
|
|
impl From<Error> for TrapReason {
|
|
fn from(err: Error) -> Self {
|
|
TrapReason::User(err)
|
|
}
|
|
}
|
|
|
|
impl From<TrapCode> for TrapReason {
|
|
fn from(code: TrapCode) -> Self {
|
|
TrapReason::Wasm(code)
|
|
}
|
|
}
|
|
|
|
/// Catches any wasm traps that happen within the execution of `closure`,
|
|
/// returning them as a `Result`.
|
|
///
|
|
/// Highly unsafe since `closure` won't have any dtors run.
|
|
pub unsafe fn catch_traps<'a, F>(
|
|
signal_handler: Option<*const SignalHandler<'static>>,
|
|
capture_backtrace: bool,
|
|
caller: *mut VMContext,
|
|
mut closure: F,
|
|
) -> Result<(), Box<Trap>>
|
|
where
|
|
F: FnMut(*mut VMContext),
|
|
{
|
|
let limits = (*caller).instance().runtime_limits();
|
|
|
|
let old_last_wasm_exit_fp = mem::replace(&mut *(**limits).last_wasm_exit_fp.get(), 0);
|
|
let old_last_wasm_exit_pc = mem::replace(&mut *(**limits).last_wasm_exit_pc.get(), 0);
|
|
let old_last_wasm_entry_sp = mem::replace(&mut *(**limits).last_wasm_entry_sp.get(), 0);
|
|
|
|
let result = CallThreadState::new(
|
|
signal_handler,
|
|
capture_backtrace,
|
|
old_last_wasm_exit_fp,
|
|
old_last_wasm_exit_pc,
|
|
old_last_wasm_entry_sp,
|
|
*limits,
|
|
)
|
|
.with(|cx| {
|
|
wasmtime_setjmp(
|
|
cx.jmp_buf.as_ptr(),
|
|
call_closure::<F>,
|
|
&mut closure as *mut F as *mut u8,
|
|
caller,
|
|
)
|
|
});
|
|
|
|
*(**limits).last_wasm_exit_fp.get() = old_last_wasm_exit_fp;
|
|
*(**limits).last_wasm_exit_pc.get() = old_last_wasm_exit_pc;
|
|
*(**limits).last_wasm_entry_sp.get() = old_last_wasm_entry_sp;
|
|
|
|
return match result {
|
|
Ok(x) => Ok(x),
|
|
Err((UnwindReason::Trap(reason), backtrace)) => Err(Box::new(Trap { reason, backtrace })),
|
|
Err((UnwindReason::Panic(panic), _)) => std::panic::resume_unwind(panic),
|
|
};
|
|
|
|
extern "C" fn call_closure<F>(payload: *mut u8, caller: *mut VMContext)
|
|
where
|
|
F: FnMut(*mut VMContext),
|
|
{
|
|
unsafe { (*(payload as *mut F))(caller) }
|
|
}
|
|
}
|
|
|
|
/// Temporary state stored on the stack which is registered in the `tls` module
|
|
/// below for calls into wasm.
|
|
pub struct CallThreadState {
|
|
unwind: UnsafeCell<MaybeUninit<(UnwindReason, Option<Backtrace>)>>,
|
|
jmp_buf: Cell<*const u8>,
|
|
handling_trap: Cell<bool>,
|
|
signal_handler: Option<*const SignalHandler<'static>>,
|
|
prev: Cell<tls::Ptr>,
|
|
capture_backtrace: bool,
|
|
pub(crate) old_last_wasm_exit_fp: usize,
|
|
pub(crate) old_last_wasm_exit_pc: usize,
|
|
pub(crate) old_last_wasm_entry_sp: usize,
|
|
pub(crate) limits: *const VMRuntimeLimits,
|
|
}
|
|
|
|
enum UnwindReason {
|
|
Panic(Box<dyn Any + Send>),
|
|
Trap(TrapReason),
|
|
}
|
|
|
|
impl CallThreadState {
|
|
#[inline]
|
|
fn new(
|
|
signal_handler: Option<*const SignalHandler<'static>>,
|
|
capture_backtrace: bool,
|
|
old_last_wasm_exit_fp: usize,
|
|
old_last_wasm_exit_pc: usize,
|
|
old_last_wasm_entry_sp: usize,
|
|
limits: *const VMRuntimeLimits,
|
|
) -> CallThreadState {
|
|
CallThreadState {
|
|
unwind: UnsafeCell::new(MaybeUninit::uninit()),
|
|
jmp_buf: Cell::new(ptr::null()),
|
|
handling_trap: Cell::new(false),
|
|
signal_handler,
|
|
prev: Cell::new(ptr::null()),
|
|
capture_backtrace,
|
|
old_last_wasm_exit_fp,
|
|
old_last_wasm_exit_pc,
|
|
old_last_wasm_entry_sp,
|
|
limits,
|
|
}
|
|
}
|
|
|
|
fn with(
|
|
self,
|
|
closure: impl FnOnce(&CallThreadState) -> i32,
|
|
) -> Result<(), (UnwindReason, Option<Backtrace>)> {
|
|
let ret = tls::set(&self, || closure(&self));
|
|
if ret != 0 {
|
|
Ok(())
|
|
} else {
|
|
Err(unsafe { self.read_unwind() })
|
|
}
|
|
}
|
|
|
|
#[cold]
|
|
unsafe fn read_unwind(&self) -> (UnwindReason, Option<Backtrace>) {
|
|
(*self.unwind.get()).as_ptr().read()
|
|
}
|
|
|
|
fn unwind_with(&self, reason: UnwindReason) -> ! {
|
|
let backtrace = self.capture_backtrace(None);
|
|
unsafe {
|
|
(*self.unwind.get()).as_mut_ptr().write((reason, backtrace));
|
|
wasmtime_longjmp(self.jmp_buf.get());
|
|
}
|
|
}
|
|
|
|
/// Trap handler using our thread-local state.
|
|
///
|
|
/// * `pc` - the program counter the trap happened at
|
|
/// * `call_handler` - a closure used to invoke the platform-specific
|
|
/// signal handler for each instance, if available.
|
|
///
|
|
/// Attempts to handle the trap if it's a wasm trap. Returns a few
|
|
/// different things:
|
|
///
|
|
/// * null - the trap didn't look like a wasm trap and should continue as a
|
|
/// trap
|
|
/// * 1 as a pointer - the trap was handled by a custom trap handler on an
|
|
/// instance, and the trap handler should quickly return.
|
|
/// * a different pointer - a jmp_buf buffer to longjmp to, meaning that
|
|
/// the wasm trap was succesfully handled.
|
|
#[cfg_attr(target_os = "macos", allow(dead_code))] // macOS is more raw and doesn't use this
|
|
fn jmp_buf_if_trap(
|
|
&self,
|
|
pc: *const u8,
|
|
call_handler: impl Fn(&SignalHandler) -> bool,
|
|
) -> *const u8 {
|
|
// If we hit a fault while handling a previous trap, that's quite bad,
|
|
// so bail out and let the system handle this recursive segfault.
|
|
//
|
|
// Otherwise flag ourselves as handling a trap, do the trap handling,
|
|
// and reset our trap handling flag.
|
|
if self.handling_trap.replace(true) {
|
|
return ptr::null();
|
|
}
|
|
let _reset = ResetCell(&self.handling_trap, false);
|
|
|
|
// If we haven't even started to handle traps yet, bail out.
|
|
if self.jmp_buf.get().is_null() {
|
|
return ptr::null();
|
|
}
|
|
|
|
// First up see if any instance registered has a custom trap handler,
|
|
// in which case run them all. If anything handles the trap then we
|
|
// return that the trap was handled.
|
|
if let Some(handler) = self.signal_handler {
|
|
if unsafe { call_handler(&*handler) } {
|
|
return 1 as *const _;
|
|
}
|
|
}
|
|
|
|
// If this fault wasn't in wasm code, then it's not our problem
|
|
if unsafe { !IS_WASM_PC(pc as usize) } {
|
|
return ptr::null();
|
|
}
|
|
|
|
// If all that passed then this is indeed a wasm trap, so return the
|
|
// `jmp_buf` passed to `wasmtime_longjmp` to resume.
|
|
self.jmp_buf.get()
|
|
}
|
|
|
|
fn set_jit_trap(&self, pc: *const u8, fp: usize) {
|
|
let backtrace = self.capture_backtrace(Some((pc as usize, fp)));
|
|
unsafe {
|
|
(*self.unwind.get())
|
|
.as_mut_ptr()
|
|
.write((UnwindReason::Trap(TrapReason::Jit(pc as usize)), backtrace));
|
|
}
|
|
}
|
|
|
|
fn capture_backtrace(&self, pc_and_fp: Option<(usize, usize)>) -> Option<Backtrace> {
|
|
if !self.capture_backtrace {
|
|
return None;
|
|
}
|
|
|
|
Some(unsafe { Backtrace::new_with_trap_state(self, pc_and_fp) })
|
|
}
|
|
|
|
pub(crate) fn iter<'a>(&'a self) -> impl Iterator<Item = &Self> + 'a {
|
|
let mut state = Some(self);
|
|
std::iter::from_fn(move || {
|
|
let this = state?;
|
|
state = unsafe { this.prev.get().as_ref() };
|
|
Some(this)
|
|
})
|
|
}
|
|
}
|
|
|
|
struct ResetCell<'a, T: Copy>(&'a Cell<T>, T);
|
|
|
|
impl<T: Copy> Drop for ResetCell<'_, T> {
|
|
#[inline]
|
|
fn drop(&mut self) {
|
|
self.0.set(self.1);
|
|
}
|
|
}
|
|
|
|
// A private inner module for managing the TLS state that we require across
|
|
// calls in wasm. The WebAssembly code is called from C++ and then a trap may
|
|
// happen which requires us to read some contextual state to figure out what to
|
|
// do with the trap. This `tls` module is used to persist that information from
|
|
// the caller to the trap site.
|
|
mod tls {
|
|
use super::CallThreadState;
|
|
use std::ptr;
|
|
|
|
pub use raw::Ptr;
|
|
|
|
// An even *more* inner module for dealing with TLS. This actually has the
|
|
// thread local variable and has functions to access the variable.
|
|
//
|
|
// Note that this is specially done to fully encapsulate that the accessors
|
|
// for tls may or may not be inlined. Wasmtime's async support employs stack
|
|
// switching which can resume execution on different OS threads. This means
|
|
// that borrows of our TLS pointer must never live across accesses because
|
|
// otherwise the access may be split across two threads and cause unsafety.
|
|
//
|
|
// This also means that extra care is taken by the runtime to save/restore
|
|
// these TLS values when the runtime may have crossed threads.
|
|
//
|
|
// Note, though, that if async support is disabled at compile time then
|
|
// these functions are free to be inlined.
|
|
mod raw {
|
|
use super::CallThreadState;
|
|
use std::cell::Cell;
|
|
use std::ptr;
|
|
|
|
pub type Ptr = *const CallThreadState;
|
|
|
|
// The first entry here is the `Ptr` which is what's used as part of the
|
|
// public interface of this module. The second entry is a boolean which
|
|
// allows the runtime to perform per-thread initialization if necessary
|
|
// for handling traps (e.g. setting up ports on macOS and sigaltstack on
|
|
// Unix).
|
|
thread_local!(static PTR: Cell<(Ptr, bool)> = const { Cell::new((ptr::null(), false)) });
|
|
|
|
#[cfg_attr(feature = "async", inline(never))] // see module docs
|
|
#[cfg_attr(not(feature = "async"), inline)]
|
|
pub fn replace(val: Ptr) -> Ptr {
|
|
PTR.with(|p| {
|
|
// When a new value is configured that means that we may be
|
|
// entering WebAssembly so check to see if this thread has
|
|
// performed per-thread initialization for traps.
|
|
let (prev, initialized) = p.get();
|
|
if !initialized {
|
|
super::super::sys::lazy_per_thread_init();
|
|
}
|
|
p.set((val, true));
|
|
prev
|
|
})
|
|
}
|
|
|
|
/// Eagerly initialize thread-local runtime functionality. This will be performed
|
|
/// lazily by the runtime if users do not perform it eagerly.
|
|
#[cfg_attr(feature = "async", inline(never))] // see module docs
|
|
#[cfg_attr(not(feature = "async"), inline)]
|
|
pub fn initialize() {
|
|
PTR.with(|p| {
|
|
let (state, initialized) = p.get();
|
|
if initialized {
|
|
return;
|
|
}
|
|
super::super::sys::lazy_per_thread_init();
|
|
p.set((state, true));
|
|
})
|
|
}
|
|
|
|
#[cfg_attr(feature = "async", inline(never))] // see module docs
|
|
#[cfg_attr(not(feature = "async"), inline)]
|
|
pub fn get() -> Ptr {
|
|
PTR.with(|p| p.get().0)
|
|
}
|
|
}
|
|
|
|
pub use raw::initialize as tls_eager_initialize;
|
|
|
|
/// Opaque state used to help control TLS state across stack switches for
|
|
/// async support.
|
|
pub struct TlsRestore(raw::Ptr);
|
|
|
|
impl TlsRestore {
|
|
/// Takes the TLS state that is currently configured and returns a
|
|
/// token that is used to replace it later.
|
|
///
|
|
/// This is not a safe operation since it's intended to only be used
|
|
/// with stack switching found with fibers and async wasmtime.
|
|
pub unsafe fn take() -> TlsRestore {
|
|
// Our tls pointer must be set at this time, and it must not be
|
|
// null. We need to restore the previous pointer since we're
|
|
// removing ourselves from the call-stack, and in the process we
|
|
// null out our own previous field for safety in case it's
|
|
// accidentally used later.
|
|
let raw = raw::get();
|
|
if !raw.is_null() {
|
|
let prev = (*raw).prev.replace(ptr::null());
|
|
raw::replace(prev);
|
|
}
|
|
// Null case: we aren't in a wasm context, so theres no tls
|
|
// to save for restoration.
|
|
TlsRestore(raw)
|
|
}
|
|
|
|
/// Restores a previous tls state back into this thread's TLS.
|
|
///
|
|
/// This is unsafe because it's intended to only be used within the
|
|
/// context of stack switching within wasmtime.
|
|
pub unsafe fn replace(self) {
|
|
// Null case: we aren't in a wasm context, so theres no tls
|
|
// to restore.
|
|
if self.0.is_null() {
|
|
return;
|
|
}
|
|
// We need to configure our previous TLS pointer to whatever is in
|
|
// TLS at this time, and then we set the current state to ourselves.
|
|
let prev = raw::get();
|
|
assert!((*self.0).prev.get().is_null());
|
|
(*self.0).prev.set(prev);
|
|
raw::replace(self.0);
|
|
}
|
|
}
|
|
|
|
/// Configures thread local state such that for the duration of the
|
|
/// execution of `closure` any call to `with` will yield `ptr`, unless this
|
|
/// is recursively called again.
|
|
#[inline]
|
|
pub fn set<R>(state: &CallThreadState, closure: impl FnOnce() -> R) -> R {
|
|
struct Reset<'a>(&'a CallThreadState);
|
|
|
|
impl Drop for Reset<'_> {
|
|
#[inline]
|
|
fn drop(&mut self) {
|
|
raw::replace(self.0.prev.replace(ptr::null()));
|
|
}
|
|
}
|
|
|
|
let prev = raw::replace(state);
|
|
state.prev.set(prev);
|
|
let _reset = Reset(state);
|
|
closure()
|
|
}
|
|
|
|
/// Returns the last pointer configured with `set` above. Panics if `set`
|
|
/// has not been previously called.
|
|
pub fn with<R>(closure: impl FnOnce(Option<&CallThreadState>) -> R) -> R {
|
|
let p = raw::get();
|
|
unsafe { closure(if p.is_null() { None } else { Some(&*p) }) }
|
|
}
|
|
}
|